Rethinking Static Code Analysis

Over the past few months, we have talked to customers about static code analysis and how we could improve our product. Our conversations led us to identify the current issues in the static code analysis space and propose a solution for them.

In the following post, I explain the problems reported by our current user base and at the end, what solution we are working on to address these issues.

Problem #1: Static Code Analysis Tools are not extensible

Current static analysis tools are not extensible: they expose a set of predefined rules that the end-user cannot modify. Developers can filter rules or ignore them, but they cannot extend the list of rules raised by their static analyzer.

There are a few exceptions, such as :

eslint lets you write custom rules you can publish as npm packages. Still, you need to publish your package on npm and add a CI/CD pipeline to update your rules and publish them. Writing and publishing a custom rule will take between one hour and one day.
sonarqube can be extended with custom rules, but it takes weeks to learn the Sonarqube API, and publishing new rules is not easy. Writing a custom rule will take anywhere between one day and three weeks.

Lesson learned: developers want an extensible static code analysis that is easy and quick. New rules should not take more than 5 to 10 minutes to write, test, and publish.

Problem #2: Static Code Analysis Tools show the problem, not the solution

Many static code analysis tools flag errors in code but do not provide a solution. This is especially critical for junior developers that do not understand why an issue is raised and how to fix it. By associating solutions (e.g., code fix) with an error, developers not only save time in fixing their code but also learn good coding practices.

Still, there is an issue with providing solutions. First, there is not always a “one size fits all” solution, and different solutions may fix a single problem. Second, the solution may depend on the context of the code. To overcome this issue, the static code analyzer must provide multiple fixes.

Lesson learned: Developers want static code analysis tools to not only flag errors but also suggest fixes to solve the error.

Problem #3: Static Code Analysis Tools report too many false positives

Existing static code analysis tools report too many false positives (e.g., an error that is not an error). Too many false positives lower the trust in such tools, and developers no longer use them. One example is a static analyzer for C that reports an error for every use of strcpy: the user might be correct, depending on the context. And flagging every function use, even when the call is correct, adds too much noise to the end-user.

Lessons learned: Static code analysis tools must have a very low (<1%) false positives rate.

Problem #4: Static Code Analysis Tools are used too late in the development process

Static code analysis tools are often run after developers write code, either during the pull request or, even worse, after developers push the code into production. Existing code analysis platforms integrate with Git platforms (such as GitHub, GitLab, or Bitbucket) but do not report issues where the developer actually writes code. Some tools are integrated into the developer ecosystem (such as eslint), but this is rather the exception and not the norm.

Lessons learned: Static Analysis Tools must integrate in the developer environment (e.g. the IDE). Errors and their associated fixes must be shown as developers write code.

Rosie, a new Static Code Analysis Tools

We are now actively working on Rosie, a new static code analysis tool that aims at addressing all the issues mentioned above. We started writing Rosie from scratch and plan to publish it under an open-source license.

Rosie is designed with the following principles in mind: extensible: any developer can extend the analyzer with user-defined rules. provides fixes: any problem reported by Rosie is associated with one or several fixes accurate: report no false positive real-time: static code analysis in less than 500 ms, allowing users to integrate Rosie in their IDE

We started working on Rosie over the last few months and will be releasing the first beta version in the next two months.

Conclusion

The static code analysis has plenty of actors but not a lot of differentiators. Developers want real-time feedback and the ability to customize code analysis rules. By empowering developers, we believe that we will make static code analysis more popular and increase the overall code quality.

Last, we thank all our users for sharing their insights with us. With more than 20,000 users on our platform, we gathered diverse perspectives on our product, which helped us refine our product and deliver a service closer to developers' needs. Thank you, existing or future Codiga users!