Difference between revisions of "Measuring Massive Multitask Language Understanding (MMLU)"
Jump to navigation
Jump to search
Line 8: | Line 8: | ||
* [[TriviaQA]] (5-shot) | * [[TriviaQA]] (5-shot) | ||
* [[TruthfulQA]] | * [[TruthfulQA]] | ||
+ | * [[GSM8K]] | ||
+ | * [[MATH]] | ||
+ | * [[HumanEval]] | ||
== See also == | == See also == |
Revision as of 09:57, 3 October 2024
- hellaswag (10-shot)
- winograde (5-shot)
- arc challenge (25-shot)
- TriviaQA (5-shot)
- TruthfulQA
- GSM8K
- MATH
- HumanEval
See also
Advertising: