Since the release of GitHub Copilot, there has been a lot of conversation about Machine-Learning Coding Assistants and how they could replace software engineers. In this article, we explain the main issue of such assistants and how we envision the future of Coding Assistants for Enterprises.
The rise of Machine Learning for code
We have seen significant progress recently in technologies that help developers write code. Since the release of Codex, many products or research initiatives started to appear (PaLM at Google, CodeT5 at Salesforce). We have all seen videos of developers writing code in seconds using GitHub Copilot and the PR campaign was so good that people started to question if developers will be replaced anytime soon (spoiler alert: no).
There is no question that Machine Learning can help to write code and in fact, it’s been used for years without fanfare or heavy PR-marketing campaigns in popular products such as JetBrains. But a Machine-Learning system will not be able to replace developers anytime soon.
Developing software is a very hard task. It requires understanding requirements and translating them into code and deploying them in production. There are a lot of undefined or implicit requirements and machines cannot deal with undefined specifications. Writing code is way harder than a lot of tasks where Machine Learning has not been proven yet. For example, writing code is way harder than driving and up to this day, there is no Machine-Learning system that can fully drive a car (and there is massive investment in that domain).
Do ML-based Coding Assistants work for the Enterprise?
There are multiple reasons why such ML-based approaches do not work in the enterprise. Some are transient and can be solved with more improvements while others have larger issues that need to be addressed.
First, these approaches are error-prone and generate bugs or vulnerabilities. An initial review of GitHub Copilot mentions that it generates vulnerable code 40% of the time. But even recent technologies such as PaLM from Google still generate vulnerabilities when fixing code.
Second, such technologies are trained on public source code and do not adapt to specific code patterns used in enterprises. Companies have existing code patterns they reuse across their code base and patterns scrapped from open source repositories are not what their developers use.
Last, there is a question about the copyright of the code generated by such systems. As ML-based Coding Assistants learn from public code often released under open-source licenses, the code they suggest may not be inserted into your codebase and raise issues when being audited. This is an important legal blocker for the adoption of Coding Assistants in the enterprise world, one that may not be solved by solely improving the product performance.
Our bet at Codiga
At Codiga, we have been working on a Coding Assistant that solves a core problem: find and import a reusable code pattern in your environment. Our product helps you search and imports reusable code blocks (e.g. smart code snippets) based on your environment, exactly like AI-based Coding Assistant, except that:
- All suggestions from our Coding Assistant are safe to use. Suggestions from public code can be used in commercial projects and suggestions coming from private code are only suggested to users that have access to this code
- Our systems adapt to specific code patterns used in Enterprises. If developers within a company use a specific code pattern, they feed it into Codiga system, restrict it to their team and it will be recommended only for the developers working at this company.
- Codiga does not complete all your code. It imports a template (called a Smart Code Snippet) that the user needs to complete, avoiding potential vulnerabilities introduced by a Machine Learning-based approach
We would love to hear your thoughts about our Coding Assistant. If you manage or lead a team of engineers and want to try our Coding Assitant, we would love to hear your feedback.