GSM8k (0-Shot CoT)

GSM8k (0-Shot CoT) in LLM comparisons refers to the performance of an LLM on the GSM8k benchmark using zero-shot Chain of Thought (CoT) prompting.

GSM8k is a large-scale arithmetic reasoning benchmark that consists of 8,000 problems, including addition, subtraction, multiplication, division, and more complex operations.

Zero-shot CoT prompting is a technique for eliciting complex multi-step reasoning from LLMs by simply adding the prompt "Let's think step by step" before each answer.

GSM8k (0-Shot CoT) is a useful metric for comparing the zero-shot reasoning capabilities of different LLMs. It is also a useful metric for evaluating the effectiveness of different CoT prompting techniques.

In the paper "Large Language Models are Zero-Shot Reasoners," the authors evaluated the performance of several LLMs on GSM8k (0-Shot CoT). They found that LLMs with zero-shot CoT prompting significantly outperformed LLMs without CoT prompting. For example, the InstructGPT model achieved an accuracy of 40.7% on GSM8k (0-Shot CoT), compared to 10.4% without CoT prompting.

The authors also found that the performance of LLMs on GSM8k (0-Shot CoT) increased with their size. This suggests that larger LLMs have better zero-shot reasoning capabilities.

Overall, GSM8k (0-Shot CoT) is a useful metric for comparing the zero-shot reasoning capabilities of different LLMs and for evaluating the effectiveness of different CoT prompting techniques.

Last updated