Difference between revisions of "Measuring Massive Multitask Language Understanding (MMLU)"
Jump to navigation
Jump to search
Line 13: | Line 13: | ||
== See also == | == See also == | ||
+ | * {{MATH}} | ||
* {{LLM}} | * {{LLM}} | ||
[[Category:LLM]] | [[Category:LLM]] |
Latest revision as of 09:58, 3 October 2024
- hellaswag (10-shot)
- winograde (5-shot)
- arc challenge (25-shot)
- TriviaQA (5-shot)
- TruthfulQA
- GSM8K
- MATH
- HumanEval
See also[edit]
Advertising: