Difference between revisions of "Measuring Massive Multitask Language Understanding (MMLU)"

Latest revision as of 09:58, 3 October 2024

@@ Line 2: / Line 2: @@
+* [[hellaswag]] (10-shot)
+* [[winograde]] (5-shot)
+* [[arc challenge]] (25-shot)
+* [[TriviaQA]] (5-shot)
+* [[TruthfulQA]]
+* [[GSM8K]]
+* [[MATH]]
+* [[HumanEval]]
 == See also ==
+* {{MATH}}
 * {{LLM}}
 [[Category:LLM]]