WHAT IS STATIC ANALYSIS

Exakat is a static analysis engine for PHP. Exakat audits PHP code without execute it : it reads the code, assess the situation and report violations to rules and indicators for further diagnostic and fixing.

Those three phases come one after the other, and works on the code source, without any configuration.

Read the code

Turn PHP code into millions of semantic terms

 

<?php
$x = source();
if ($x < 10) {
  $y = $x + 1;
  $x = corrige($y);
} else {
  $y = $x;
}

The discovery is the phase where exakat deconstruct the code and prepares it for analysis. Initially, the code is a vast amount of text file. What developpers consider code, is actually nothing more than text file. And a lot of them. Exakat needs to make extract meaning from those files first.

Exakat makes use of the tokenizer of PHP : this is the part of the Zend Engine responsible to turn the above text files into tokens. Tokens are like atoms for PHP : it will combine them to execute the code as intended. They are also the smallest unit of meaningful text : the tokenizer knows the difference between the string ‘die’ and the function name die.

Create a syntax Tree

 

Then, the tokens are organized one in relation with the other. One token represents Addition (T_PLUS), and it requires at least two other tokens to execute correctly. Besides, T_PLUS is also used for specifying the sign of a number (+ 1), so this token may also appear alone. On top of that, some more remote connexion must be established in the code, like the link between a class and its instantiations : this depends on the namespace and on use expressions.

After that, the tokens are loaded in a graph database. It provides a wide range tools to search for specific tokens and navigate the various links they have with each other. Such syntax network fits extremely well code source representation.

Assess the situation

Artificial intelligence at work

 

At that  point, this is the secret sauce of Exakat. The analysis uses a lot of different patterns to search for specific situations. It relies on databases to identify extensions, PHP classes or php directives : for example, ftp_connect() is a native PHP function, that denote the usage of the ext/ftp extension. Identifying extensions is useful for other analysis, and makes an ‘inventory’ : it’s a type of analysis that create automated documentation.

At the end of the analysis, all the results are extract into a smaller database, for the reporting phase. This database acts as a compact cache system. The graph database is not needed anymore, as the reports can all be built without it.

Report

Serve yourself : from Dashboard to API

The report phase is the moment where you and I actually see the results. Obviously, we are nervous about them. So, the report has different meaning for each of us, so there are a lot of different reports.

Ambassador reporting

The easiest path for human processing is the Ambassador report. This is a the largest report, in interactive HTML format. It provides all the available analysis and reports available, in one place.
In this format, you can find :
+ Issues for several recipes : clean code, security, performances.
+ Migration recommendations, and compilation problems, from PHP 5.3 to PHP 7.3 (yes, dev version).
+ Appinfo(), which lists PHP features in use in the code, like magic methods or recursive functions
+ PHP directive list, which provides a recommended list of directive to review while setting up the production server, based on the features used in the code
+ Inventories of extensions, global variables and exceptions. This gives a great look over the defined structures in the code, and often prompt extra investigations.
+ Recommendations for property visibilities.

Open Data

The fastest path is the text version. This are the raw results, with the reason of the diagnostic, a file name and a line number, one by line. It is possible to get results analysis by analysis, or, en masse, for a recipe : analysis are grouped by aspects, for easier access.
Those results are suitable for other machine readable format, like JSON, XML or CSV. You can then reuse those results for your own PKI and indicators.

And more

Exakat is able to auto-document your code. The UML report is the UML schema of all the classes, traits and interfaces, in their namespace, with links between them. This report is actually a visual graph.
Inventories extract every piece of information for the code, and sort them by type. There are the whole liste of classes, interfaces, properties. There are also all the dynamic calls of functions or variables, which are very useful when refactoring. There are also the emails, URL, internet port or IP that were hardcoded in the files : they are both important for review, or for update.