Difference between revisions of "Measuring Massive Multitask Language Understanding (MMLU)"

Latest revision as of 09:58, 3 October 2024

@@ Line 1: / Line 1: @@
 [[wikipedia:MMLU]]
+* [[hellaswag]] (10-shot)
+* [[winograde]] (5-shot)
+* [[arc challenge]] (25-shot)
+* [[TriviaQA]] (5-shot)
+* [[TruthfulQA]]
+* [[GSM8K]]
+* [[MATH]]
+* [[HumanEval]]
+== See also ==
+* {{MATH}}
+* {{LLM}}
+[[Category:LLM]]