Exakat 1.1.1 reviewExakat 1.1.1 review

With the new year 2018, we though it was time to dive into a significant upgrade, and so we did. We changed the internal storage of tokens from strings to dictionary: that means less memory consumption, less token manipulations et more speed. It also shifts complexity from one part of the application to the next, so we’re going to share it for everyone in this blog.

Of course, that meant two very busy weeks, changing grounging features of the exakat engine, and making sure everything is running smoothly. I’m glad to report that all is stable. In the same time, we fixed some bugs, and added new analysis. Let’s see that Exakat 1.1.11 review.

Using a dictionary for internal storage

Exakat relies on PHP and its tokenizer to extract the tokens from a source files. This is the most reliable way to read semantically a file. Then, the tokens are organized in the AST, the Abstract Syntactic Tree. This tree ignores all whitespace and comments; it also removes delimiters, like ” ‘ or {, }, but preserves them in the Tree branches.

Until now, the AST would get the code as a string. For example, a variable ‘$aVariable’ would be represented in the AST with a ‘$aVariable’ string. This is convenient and straightforward from the source.

Know your data

With the large amount of code auditing we do, we noticed that such strings are actually not so diverse. A small number of those code are repeated multiple times, leading to a lot of repetitions. This is the case with variable names, which are usually used several times : at least one write, and one read. The same applies to any defined structures, like methods, functions, constants, etc. They are defined, then called, and thus, repeated.

Repetition actually shrinks the corpus of tokens by 50%, and often, up to 90%, for the largest. Drupal, for example, has a little more than 2000k tokens, and a little less than 200k different codes. Some of them are repeated up to 3412 times. With such stats, we can reduce the size of the database by federating the repeated strings.

Reduced dataset size

So, with that in mind, we upgraded the platform to build a dictionary. Now, the strings are replaced by an integer at build time. This shrinks the size of the graph database two folds: strings, which may be up to 5k in size, are now replaced by integer; and strings are not repeated anymore. The size of the load and the working database are reduced, making the processing faster.

In terms of processing, some updates were required. Replacing strings by integers was seamless as long as the strings are compared as a whole to each other. For example, searching for a class constant definition by its name is straightforward: both are the same, and they end up having the same dictionary index.

More complex queries

On the other hand, any operation that required altering the original string cannot be done anymore in the database. This is the case for method definition search: since methods are case-insensitive, the method name has to be extracted first, then turned into lowercase version, and then, queried again.

Other operations, like selecting a string with a prefix have to be preprocessed outside the database. Since those operations are only a handful, this is worth doing.

Faster, leaner exakat

All in all, the new mechanism is faster, and leaner. You should see it in action with the new version.

Fixed call to exakat.phar

Thanks to Frederic Hardy’s feedback, we have fixed a bug that prevents calling exakat.phar from outside its installation folder. When you follow the documentation’s instruction, all is installed in a nice folder, so all is concentrated in one please for easy maintenance. However, it was impossible to call Exakat in its right folder, while being in another one. This is now fixed.

+ is accepted as a regex delimiter

That is the kind of bug that few of us ever though to fix: we found some PHP code that used + as a regex delimiter. This is totally legit, and it works, as long as you don’t need the + quantifier in the regex itself:

Actually, any non-alphanumeric and non-backslash character is possible as a delimiter, so many variations are possible.

Improve your class visibilities

By default, PHP uses the public visibility when creating a method or a property. Since PHP 7.1, it also accepts visibilities for constants, so your code is probably making constant public without knowing it.

A general recommendation is to keep classes are closed as possible: this means using the private option as often as possible, and as the default choice unless specified otherwise. This way, the internals of the class are kept secret, and are less prone to external and unexpected manipulation.

During development, it is tempting to keep those properties public, to avoid going back often to a class, relaxing the access rights. Then, as the code mature, such access are removed, and finally the property or the method is only used by its defining class, and by nobody else.

However, it is rare that anyone notice such a situation. The contrary is usually easy to spot : make a property private, and use it in a public way, and PHP will immediately remind you that this is not possible. On the other hand, keep a property public, but use it privately and no one will notice.

Exakat report actual usage of class structures

Well, this is not the case anymore. Exakat checks the code and any usage of class elements, and it provides you with a report about it. Properties are presented with their actual definition, and an upgrade suggestion is offered.

This makes it easy to sift through the classes structures, and decide if any element can be closed. Any property that is still used publicly will be mentioned as well defined and used, with a green star: there is no improvement possible, unless you modify other parts of the code.

On the other hand, a public method that can be upgraded to protected, gets a red star, and the protected column, as the best target, gets the green star.

 

Review your classes, and ask yourself if you really need that public access to so many properties or methods. Many are valid, but when possible, just close the access. This will help you in the long run.

Turn a property a constant

The same report also spots properties that may be turned into a constant. The same strategy applies to properties than for visibility: properties are easy to change, when needed. But sometimes, they are defined at definition time (sic), or at constructor time, and never modified anywhere else. Such properties should be turned into a constant.

Exakat also reports those properties for consideration. Using constants makes it clear that the value is not changing: many times, token values are easier to spot, and to understand.

A word of caution for this report: scalars and arrays are good candidates for constant creation. On the other hand, objects and resources cannot be made read-only, so just ignore those. Also, class constants cannot replace an object property when used inside a string: this requires some changes in the code. Be careful.

Suggestions help quality

Upgrading visibility in a class, or changing properties to constants usually requires a good level of maturity for both the code and the development team. This is why such suggestions are important: take a little time to read the recommendations, assess the situation and if you feel confident, do the right thing.

Happy PHP code reviews

Exakat 1.1.1 is now taking better advantage of the improved engine. Almost all unit tests have been checked, and new ones are in the process. If you meet false positive that can be removed from our reports, feel free to fill a bug report on the github https://github.com/exakat/exakat repository, on slack or on twitter.

All the 320+ analyzers are presented in the docs, including the faithful ‘Old Style __autoload(): Avoid _autoload(), only use splregister_autoload(). ‘  Rare bug (4%)

Download Exakat on exakat.io, upgrade it with ‘exakat.phar upgrade -u’ and like us on github: .