Codex P@1 (0-Shot)

Codex P@1 (0-Shot)

Codex P@1 (0-Shot) is a metric used to evaluate the performance of large language models (LLMs) in code generation. It measures the percentage of times that the LLM generates the correct code for a given prompt, without any prior training on that specific prompt.

The metric is calculated by evaluating the LLM on a set of code generation prompts. For each prompt, the LLM is given a natural language description of the desired code output. The LLM then generates the code, and the output is compared to the ground truth code. If the LLM's output matches the ground truth code, then the prompt is considered to be correct.

The P@1 score is calculated by dividing the number of correctly generated code outputs by the total number of prompts.

Codex P@1 (0-Shot) is a useful metric for evaluating the ability of LLMs to generate code without any prior training. It is a challenging task, as it requires the LLM to understand the natural language description of the code, and to generate the correct code syntax.

Here is an example of how Codex P@1 (0-Shot) might be used:

Prompt: Write a function in Python that reverses a string.

Ground truth code:

Copy codedef reverse_string(string):
  reversed_string = ""
  for i in range(len(string)-1, -1, -1):
    reversed_string += string[i]
  return reversed_string

LLM output:

Copy codedef reverse_string(string):
  reversed_string = ""
  for i in range(len(string)-1, -1, -1):
    reversed_string += string[i]
  return reversed_string

Codex P@1 score: 100% (correct)

In this example, the LLM generated the correct code for the given prompt, without any prior training. Therefore, the Codex P@1 (0-Shot) score is 100%.

Last updated