QuALITY (5-Shot)
QuALITY (5-Shot) is a metric used to compare the performance of large language models (LLMs) on a variety of tasks, including question answering, summarization, and translation. It is based on the idea that a good LLM should be able to generate high-quality responses to prompts, even if it has only been trained on a small number of examples.
The 5-Shot part of the metric refers to the fact that the LLM is given five examples of each task before it is evaluated. This is a relatively small number of examples, which is why the metric is a good test of the LLM's ability to generalize to new tasks.
The QuALITY metric is calculated by averaging the scores of the LLM on each task. The scores are calculated using a variety of methods, depending on the task. For example, for question answering, the score may be the percentage of questions that the LLM answers correctly. For summarization, the score may be the BLEU score, which measures the similarity between the LLM's summary and a human-generated summary.
QuALITY (5-Shot) is a relatively new metric, but it has quickly become one of the most popular metrics for comparing LLMs. This is because it is a comprehensive and well-rounded metric that tests the LLM's ability to perform on a variety of tasks.
Here is an example of how QuALITY (5-Shot) might be used to compare two LLMs:
Based on this result, we can conclude that Model 2 is a better LLM than Model 1. This is because Model 2 has a higher QuALITY (5-Shot) score, which means that it performs better on a variety of tasks.
Last updated