LLM Benchmark Categories


1. Text

a. Language Understanding

b. Language Generation

c. Factuality and Truthfulness

d. Summarization

e. Translation

f. Question Answering

g. Common Sense Reasoning

h. Conversational AI

i. Dialogue Systems

j. Sentiment Analysis

k. Named Entity Recognition (NER)

l. Document Understanding

m. Machine Reading Comprehension (MRC)

n. Natural Language Inference (NLI)

o. Text Classification

p. Text Generation

q. Paraphrase Identification

2. Audio

a. Audio Generation

3. Video

a. Video Computer Vision and Video Generation

4. Image

a. Image Computer Vision

b. Image Computer Vision, Instruction Following

c. Image Computer Vision, Editing

d. Image Computer Vision, Segmentation

e. Image Computer Vision, 3D Reconstruction from Images

5. Reasoning

a. Mathematical Reasoning

b. General Reasoning

c. Visual Reasoning

d. Moral Reasoning

e. Causal Reasoning

6. Other

a. General Agents

b. Robotics

c. Reinforcement Learning from Human Feedback

d. Code Generation

