GitHub Previews Copilot, an OpenAI-Powered Coding Assistant
JUL 07, 2021 3 MIN READ
- Renato LosioCloud architect, remote work enthusiast and speaker.FOLLOW
GitHub recently announced Copilot, an AI-powered pair programmer designed to help developers write code faster and with less effort. The service learns from comments and existing code, suggesting new lines and the implementation of whole functions.
A GitHub Copilot implementation of a sortByKey function in Python.
The Visual Studio Code sends comments and code typed by the developer to the GitHub Copilot service, which synthesizes and suggests the implementation. According to GitHub, the service is optimized for small functions with meaningful names for parameters, as for the sortByKey example above:
We recently benchmarked against a set of Python functions that have good test coverage in open source repos. We blanked out the function bodies and asked GitHub Copilot to fill them in. The model got this right 43% of the time on the first try, and 57% of the time when allowed 10 attempts. And it is getting smarter all the time.
A GitHub Copilot implementation of a returnRandomElement function in Python.
In the article “Bugs Faster than the Speed of Thought”, Maxim Khailo comments:
Copilot is not magic and will perform worse than a human coder on average. If it is trained on the gigantic, 100 million project corpus of Github projects, it will most certainly have more than 50 bugs per 1000 lines of code. (…) With Copilot, bugs will be transmitted faster than the speed of thought.
While James Governor, RedMonk analyst and co-founder, thinks that “GitHub Copilot is the new StackOverflow”, Graham Lea, co-founder at Archium, believes there is a difference between the two services:
StackOverflow is like on-demand ensemble programming with 2-30 other people who have had a similar problem to you but in different contexts. GitHub Copilot looks more like pairing with someone who takes over your keyboard and smashes out code without explaining what it does or how.
GitHub states that “Copilot is a tool, like a compiler or a pen” but many experts are concerned on the implications for licensing, thinking that open source projects and many companies might ban AI-powered coding. Simon de la Rouviere tweets:
Another interesting question about prior rights & derivative works vs generative creations. Should the machine learning output take into account the original work’s rights going into the model? Where’s the line? (…) Will big corporations ban GitHub Copilot for fear of it churning out code that already exists?
This has not been thought through. Not all repositories are public. Not all public repositories are open source licensed. Not all open source repositories are permissively licensed. And most permissively licensed repositories aren’t public domain but have notice requirements.
Copilot has been trained on English language and source code from publicly available sources and does not reference private code. GitHub acknowledges the challenge:
Training machine learning models on publicly available data is considered fair use across the machine learning community (…) But this is a new space, and we are keen to engage in a discussion with developers on these topics and lead the industry in setting appropriate standards for training AI models.
The service is in technical preview with a commercial version expected in the future. It is currently possible to join the GitHub Copilot waitlist.