Measuring Massive Multitask Language Understanding (MMLU)
Jump to navigation
Jump to search
- hellaswag (10-shot)
- winograde (5-shot)
- arc challenge (25-shot)
- TriviaQA (5-shot)
- TruthfulQA
- GSM8K
- MATH
- HumanEval
See also[edit]
Advertising: