Exakat 1.1.4 review

A new week, and an incredible harvest of PHP tricks. Some made it to the Exakat engine, so as to support the language handling; some made it to the analysis, so as to help us develop better. Tricks include array_keys’s extra parameters, unicode codepoint, constant scalar expressions and arrays (sic), dir . Let’s review Exakat 1.1.4.

PHP Tips and tricks

One challenge when working on a static analyzer, is to keep one’s knowledge of the underlying platform as deep as possible. Sometimes, this means meeting incredibly horrible pieces of code, and others, it’s a real surprise.

Constant scalar expressions with arrays

Take constants and arrays. For a long time, we have been able to define constants as arrays. Then, with the advent of constant scalar expressions (sic), we can do this;


<?php

   const A = [1, 2]; 
   const B = [3, 4, 5]; 
   const C = A + B; 
?>

+ is one of the valid operators for constant scalar expressions: those expressions only use constant values (literals, constants, static constants), and are processed at compile time. This works for + and arrays, so, eventually, we end up with: C === [1,2,5];

At first, this is surprising: aren’t we supposed to get a 5 elements array? then, after remembering that + with arrays, in PHP, is actually overwriting existing keys, it makes sense. This is the same behavior than with variables.

Codepoints

PHP 7.0 introduced the usage of Unicode Codepoint in strings, to make all unicode characters available. Codepoints are code position, a way to reference a character (letter, ideogram, hieroglyph, symbol, etc.). Unicode has one, and PHP handles it this way:

<?php

echo "\u{4eba}"; //人
echo "\u{04eba}"; //人
echo "\u{004eba}"; //人
echo "\u{0004eba}"; //人
?> 

The leading 0 are ignored, but still processed. That trick may be used to hide letters in strings, and evading unicode codepoint escape sequence detections. Exakat has a security analysis that reports roman letters that are encoded in strings: this is a red flag as those are usually never encoded.

__dir__ is case insensitive

One of those cases of not-reading the documentation even though I translated it three times: all the magic constants are case-insensitive (http://php.net/manual/en/language.constants.predefined.php). I learnt that the hard way by reading code that used it, thinking it was a mistake. Then, realizing that the mistake was on me. Coding should keep you humble…

<?php
  echo __DIR__; // /tmp/test.php
  echo __dir__; // /tmp/test.php
?>

 

Array_keys() to search for keys

array_keys() accepts three parameters. That is rarely known, as array_keys() is mostly used to extract the list of keys from an array. But it actually includes extra features worth knowing.

The two other arguments are search_value and strict. This makes array_keys() the big brother of array_search: while array_search() stops as soon as it finds a searched value, array_keys() returns the keys for them in an array.

<?php

$array = [1,2,3,4,3];

print array_search(3, $array); // 2
print_r(array_keys($array, 3)); // [ 0 => 2, 1 => 4]

?>

array_search() is a slow function, as it actually loops over the array to find the value. Whenever possible, in particular when the values are all unique, it is recommended to turn that array in its flipped version, and use isset() on the key: that way, PHP uses a hash-table and the result is much faster.

On the other hand, when several identical values are stored in the array, array_flip() can’t be used, and a loop-search is necessary. Then, array_keys() is a better choice. Even on small scale arrays, array_keys/array_search outperform the loop array by a factor of 10.

Reuse cached values

Caching values is definitely a good way to speed up processing. This is true at the architecture and conception level, and it is also true at the method level. Basically situations like this:

<?php

function foo($a) {
    $b = strtolower($a);
    
    // more code
    
    if (strtolower($a) === 'c') {
       // doSomething()
    }
}

?>

The comment //more code is here to shrink the size of the example, but you may easily understand that forgetting that a value is already processed in a variable is actually very simple when the method is large: this may also be a side effect of a work-in-progress refactoring. Indeed, exakat reports expressions that are identical across a scope, for them to be simplified in one call.

Although strtolower($a) is a fast native call, it still is functioncall, with scope change, and all the extra overhead. Reducing the number of such functioncall by caching the results in another variable, and reusing that variable is a good way to wasting cycles.

The problem with caching is that it requires some planning: store the value, and then, reuse it. So, this tends to happens after the initial code has been written, and it is best done in a code review.

Exakat has your back on this one : it spots any expressions that has already been stored in a variable, and suggest its reuse. The impact on the code should be little, besides the performances improvement. Except some false positive is the original value is modified in the scope.

Double array_flip is slow

array_flip(), as we discussed earlier, swaps keys and values in an array. It’s a fairly slow call, as PHP checks that the values are unique during the swap. As you know, keys are unique, while values are arbitrary.

array_flip() is useful in situations where a dictionary is collected in one way, and must be used in the reserve way. At that point, PHP never knows that the values are unique or not, so it does the work anyway.

I ran into a piece of code like this one: note that the code was adapted to fit this article, and keep the context out of the way.

<?php

function foo($array, $value) {
    $tmpArray = array_flip($array);
    unset($tmpArray[$value]);
    return array_flip($tmpArray);
}

?>

array_flip() is used here to swap keys and values in the array, and easily remove the value. Here, array_flip() is used twice, and that’s a red flag : it is easy to replace the search of the key of $value with array_search() and array_keys(), avoiding the double-flipping.

This simple code snippet was made into an analysis. I don’t expect many results to be reported, but when it does, it will be excellent feedback.

All your Regex are belong to us

While almost every PHP application makes more or less usage of regular expressions, the extend of the usage of regex is less known. Usually, the choice of regex usage is left to the developer. This is a local decision : the regex is tailored for a method, and its specific situation.

Or it is? Being a simple (sic) string, and elaborated in a specific context, leads to the easy mistake that it is unique. It rarely is: it’s just unchecked. Nobody check the rest of the code to see if another similar looking regex has been created for the same purpose. Ever.

Exakat brings consistency to application by building the regex inventory : it collects all the regex it can find in the code, and present them in a simple list. All those regex, that have been scattered across the code, are now in the same file : think about it as an alumni meeting. Suddenly, they can be compared and tell different stories of the code.

Check it here with phpmailer.

Here are some questions that the regex inventory raise:

  • Are there identical regex?
  • Are there similar regex?
  • Are there huge regex?
  • Could some regex be replaced by filter_* or else PHP function?
  • Are there the classic regex that should be used?
  • Are there the classic regex that should not be used?
  • Are there dynamically build regex?

Regex inventories are a good start to start refactoring the code: make some expression in a common library, remove excessive code or simply simplify the regex.

Happy PHP code reviews

Exakat 1.1.4 brings you the benefits of a field trip to PHP coding : precise mistakes that may stay in the code for long, just because they are not check. An automated tool may be loaded with low yielding analysis : the critical value of a code audit is be a sentry in the code and report interesting tidbits, while we focus on the rest of the application.

Do you want our feedback on your code? Drop us a note on twitter (@exakat) or on the slack channel: we’ll be happy discuss it.

All the 320+ analyzers are presented in the docs, including the untrained ‘Use Path info(), instead of string manipulations’: a rare issue (5%)

Download Exakat on exakat.io, install it with Docker, upgrade it with ‘exakat.phar upgrade -u’ and like us on github: https://github.com/exakat/exakat.

 

Tweet about this on TwitterShare on RedditShare on LinkedInEmail this to someone