Technology Behind Copilot: The Coding Assistant from OpenAI

publive-image

OpenAI and Microsoft have come together to launch the mechanical preview of GitHub Copilot. It is an AI-based tool that helps programmers to write better code. The task of the Copilot is to take context from the code that is being worked on and advise whole lines and functions.

Advertisment

Copilot is based on OpenAI Codex which is an AI system that is trained on a dataset made up of a sizable lump of public source code. The technical screening for copilot is ideal for languages like JavaScript, Python, TypeScript, Ruby, and Go.

The AI pair programmer, GitHub Copilot works with any type of new structure or collection. The programmer defines a function in a comment with plain English language and the Copilot translates it into a definite code. Copilot helps the programmer to rapidly determine substitute ways for writing tests, discovering new APIs, and also problem-solving. As compared to the existing code assistance, Copilot seems to be far more advanced.

Copilot works at its best when the programmer uses meaningful names for function parameters, writes good comments, and also when the code is divided into small functions.

Advertisment

Codex, a deep learning model is working behind Copilot. Codex is a special version of GPT-3 finetuned for programming tasks. The working of this tool that is codex is similar to that of GPT-3. It takes a context as input and produces a sequence of bytes as output. The context here is the source code file that a programmer works on and the output is the code suggests that the programmer receives.

In the beginning, during the first training of OpenAI, the company had no intention of training it how to help code, rather it was more as a language model used for general purposes like producing articles, translating from one language to another, fixing incorrect grammar, etc. However, this general-purpose language model has paved the way for a broader environment. Therefore, first applications for GPT-3 have been broader than general-purpose language tasks.

Language models execute well when they are served with the accurate context and their application is tapered down to a single or few tasks. Codex has been trained with a selection of English language and source code that is available from public repositories which also includes code in public stores on GitHub. If the model is given the right context or prompt, it comes up with a chunk of code that is similar to what other programmers have written to solve a similar problem. Providing it with more detailed comments and definitions results in getting a rational output from Copilot.

Advertisment

The work of GitHub Copilot is to understand the intention of the programmer and to produce the best code possible, but it is important to note that the code it suggests may not work always. Copilot does not test the code it suggests. More often than not, the code may fail to compile or run. As the Copilot holds limited context or prompt, therefore, a single source file that is longer than 100 lines is trimmed.

GitHub Copilot is a code creator and not a search engine. Therefore, a majority of code it suggests is exclusive and has not been used before. However, suggestions might involve exact snippets of code from the training set. It only happens when the developer has not provided adequate context.