Difference between revisions of "Measuring Massive Multitask Language Understanding (MMLU)"
Jump to navigation
Jump to search
(One intermediate revision by the same user not shown) | |||
Line 8: | Line 8: | ||
* [[TriviaQA]] (5-shot) | * [[TriviaQA]] (5-shot) | ||
* [[TruthfulQA]] | * [[TruthfulQA]] | ||
+ | * [[GSM8K]] | ||
+ | * [[MATH]] | ||
+ | * [[HumanEval]] | ||
== See also == | == See also == | ||
+ | * {{MATH}} | ||
* {{LLM}} | * {{LLM}} | ||
[[Category:LLM]] | [[Category:LLM]] |
Latest revision as of 09:58, 3 October 2024
- hellaswag (10-shot)
- winograde (5-shot)
- arc challenge (25-shot)
- TriviaQA (5-shot)
- TruthfulQA
- GSM8K
- MATH
- HumanEval
See also[edit]
Advertising: