Phind Technical Specs

Phind: The Best Free Code Generation Model (Better Than ChatGPT-4?)

By Farhan Hussain November 2, 2023

– Phind is a new code generation model that is on par with ChatGPT v4. – Phind has released an open source version of their model. – You can try out Phind for free.

The race to build state-of-the-art machine learning models is in full swing. Large Language Models (LLMs) are taking the world by storm, with GPT-4 from OpenAI leading the charge. However, in the realm of coding, a new contender has emerged. A model named Phind is not just matching but, in some contexts, even exceeding the capabilities of GPT-4. If these claims hold true, Phind could be a game-changer

Phind, a fine-tuned model based on CodeLlama-34B, earlier achieved breakthrough on HumanEval dataset, a benchmark dataset consisting of programming problems that are designed to test the capabilities of code generation models. The first version achieved 67.6% for the base model and 69.5% for its Python-specific iteration. That was on par with GPT-4, which achieved 67%. But with the recent release of their newest model, ‘Phind-CodeLlama-34B-v2’, it surpasses ChatGPT-4 capabilities, scoring 74.7%.

Phind vs ChatGPT

Let’s start with the most obvious question, is Phind better than ChatGPT v4? If we are to believe the benchmarks published by the team behind Phind then the answer is yes. But like most things in the world, the answer is not a straightforward one. HumanEval dataset is a good benchmark but does not compare well in the real world.

In the early stages of testing—bearing in mind that it’s only been a few days since its release—Phind has demonstrated the ability to keep pace with GPT-4. Its responses are notably helpful, providing links for further reading on specific topics and recommending libraries, which are significant aids for developers. Another impressive feature is the ‘Pair Programmer,’ which enables Phind to ask follow-up questions for debugging purposes, making the interaction more conversational. This aspect is quite similar to how ChatGPT operates.

One approach I take when comparing code generation models involves presenting them with code that contains incorrect syntax—errors that are easily noticeable to a human but might be interpreted differently by machine learning models. While this method may not be the best indicator of a model’s advancement, it offers insight into the model’s ‘thought process.

/* incorrect code syntax to trick the model */

#include<iostream>
using namespace std;
int main()
{
enum { blue, green = 5, GREAT };
coutく<blueくく”"<<GREAT;
}

Both models correctly identified that it contains syntax error, which is pretty straightforward but then they corrected the code to how they thought the correct version should be.

#include<iostream>
using namespace std;
int main()
{
    enum { blue, green = 5, GREAT };
    cout << blue << " " << GREAT;
}

Output: 0 6
#include<iostream>
using namespace std;
int main()
{
   enum { blue, green = 5, GREAT };
   cout << blue << " " << green << " " << GREAT << endl;
   return 0;
}

Output: 0 5 6

As demonstrated by this small experiment, ChatGPT and Phind arrived at different conclusions. A human correcting this code would likely agree with the solution proposed by ChatGPT, as it appears to be the more obvious correction (at least to me). Now, does this mean ChatGPT is superior to Phind? Not necessarily, but it does suggest that ChatGPT’s approach may be more akin to human reasoning, or in this instance, more aligned with the thought process of a human developer.

From the early testing done by the tech community, many have come to the conclusion that although Phind is a great free tool that will allow developers to more easily debug their code. The answers provided by ChatGPT-4 are generally better.

While ChatGPT may still hold an edge in certain aspects, it’s important to consider a few key distinctions:

  1. Phind is freely accessible to everyone, in contrast to ChatGPT-4, which requires a subscription of $20 per month.

  2. Phind boasts a significantly larger context window of 16,000 tokens—12,000 for input and 4,000 for web results—compared to ChatGPT-4’s 8,000 tokens. A larger context window means the model can retain more information about the conversation. However, it’s worth noting that the next version of ChatGPT is rumored to support up to 32,000 tokens. Meanwhile, Phind has also announced plans to introduce a model with a 100,000-token capacity.

  3. In terms of speed, Phind operates five times faster than GPT-4, matching the speeds of ChatGPT 3.5.

Phind Technical Details

The Phind models, including the latest Phind-CodeLlama-34B-v2, were fine-tuned on a proprietary dataset of approximately 80,000 high-quality programming problems and solutions. These were presented in an instruction-answer format, which is distinct from conventional code completion datasets. The fine-tuning leveraged state-of-the-art technologies such as DeepSpeed ZeRO 3 and Flash Attention 2, enabling efficient training. While initial models were trained rapidly, the Phind-CodeLlama-34B-v2 required a more extensive training period, utilizing 32 A100-80GB GPUs over 15 hours, totaling 480 GPU-hours, to accommodate its larger sequence length of 4096 tokens and the additional 1.5 billion tokens it was trained on.

This extensive training on over 70 billion tokens has resulted in Phind-CodeLlama-34B-v2 achieving a HumanEval score of 74.7%, reflecting its advanced code generation capabilities. Furthermore, Phind has optimized its performance, achieving a fivefold speed increase over GPT-4. This is achieved by running the model on NVIDIA’s H100 GPUs and utilizing the TensorRT-LLM library, which allows for processing speeds of up to 100 tokens per second in a single stream

Phind-CodeLlama-34B-v2 can be downloaded via HuggingFace but keep in mind that this version is not the same as the one on the official site. As the more advanced version called Phind version 7 is not open-sourced and is available only via the official site. But the team did mention that they will eventually release the version 7 once they have even higher state-of-the-art model.

What’s next?

Phind currently tops the leaderboard of code generation models on HuggingFace, which is already a fast evolving section. Given the pace of innovation we’re witnessing, the horizon looks promising for even more remarkable breakthroughs in machine learning models.

The advent of coding capabilities in Large Language Models (LLMs) has been a game-changer, offering significant assistance to both novice programmers and seasoned developers alike. With the emergence of models like Phind that are not only free but also highly effective, there’s never been a better time to explore the potential of AI-assisted coding.

Last updated