Study hard that codeLate, later and latest PHP checks

PHP applies various checks when processing source code. When those checks fail, they might generate a Fatal errorand stop execution. They are the bane of production servers, and every PHP developper tries to avoid them as much as possible (unlike static code analyser authors which try to generate them). The goal is ‘no error on production’.

Yet, this legendary goal is still eluding us. PHP is far from being exception free, like Pony language or Agda. Those languages come with a full proof of their code, or even a termination checker. It is both scary and enlightening!

One root for the late discovery of any error is the moment when PHP actually applies a check and discovers the problem. So, in order to realize how some errors show up only during execution, let’s review a few of PHP phases of execution and code linting.

Grammar phase

The first phase is when php -l is used in command line. This phase is only available in command line. No PHP function offers this feature, and the closest we have is eval(), which will run actually the code immediately, and tokengetall(), which will leave us with the token list.

PHP reads the source file, breaks it into tokens, and tries to makes sense of them. For that, it uses a grammar file, hence this phase name, although it is quite unofficial. Note that only one file is being checked at that moment.

When PHP checks for the grammar of the tokens, it is the moment where you discover an embarassing typo that produces a syntax error. The good part of this is that it is detected immediately, and PHP won’t run anything with such an error. And unless this is commited, it is not a problem. Just don’t commit PHP syntax errors.

Linting phase

Then, right after this, linting phase kicks in, and PHP applies the first checking rules. Indeed, even if the code has a correct syntax, it may run into some higher structuring rules that prevent it from running. It is akin to the following sentence : The stone eats the stone. It is syntactly correct, but it doesn’t make much sense, except in rare situations.

For example, PHP has to check that a function is declared only once. Since functions may only be declared once, it is not possible to create a second one with the same name. There, it is not a syntax error, but a Fatal error.

<?php

function foo($a) {}
function foo($a) {}
// Cannot redeclare foo() (previously declared in Command line code:1)
?>

Linting consistence

This is nice, and it prevents unsuspecting copy/paste or major typos to creep up in the code. Also, this also comes with a caveat. The following and very similar code does lint, yet doesn’t execute as one would expect :

<?php

class foo {}
class foo {}
// Fatal error: Cannot declare class foo, because the name is already in use
?>

This means that linting doesn’t check classes names; it also doesn’t check any other naming conflict with global contants, interfaces, traits or enumerations. All those are resolved at execution time. Which means that this code has now to be executed.

Execution phase

At execution time, PHP is done parsing and linting locally any files. Yet, parsing and linting are not finished : during execution time, one may include extra files, or eval() extra code. So, this will go through again through the parse / lint phase for the file, then it will add an extra layer of checks when merging both code bases.

In the following example, two files declare the same class. Both lint individually, and the conflict arises at execution time only.

<?php
class foo {}
include 'B.php';
?&gt;

file B.php
&lt;?php
class foo {}
?>

Execution path

Execution phase linting is all the more difficult to comprehend that it actually depends on the execution path. Which expressions are called, and in which order. This may have a huge impact on those naming conflicts, since some path may be very rare.

For example, the above code may be rewritten with an obvious (and not recommended) reduction in bug frequency, just like this :

<?php
class foo {}
if (rand(0, 10000)) {
    include 'B.php';
}
?>

file B.php
<?php
class foo {}
?>

The error will only appear when the include is called, which is once every ten thousand times. So, this is rare, but it will eventually happen.

Latest PHP checks

If the last exemple is too academic, let’s take a look at this one.

<?php

function foo(int $a = "abc") {}

?>

This time, the linting process identifies immediately the discrepancy between the argument typehint and the literal string "abc".

As we have seen it, we can postpone this by using intermediate structures, like a constant.

<?php

const D = 'abc';

function foo(int $a = D) {}

?>

By making the default value, we actually pushed the checking to much later, in the execution phase. This error will be dectected, after two conditions are fulfilled :

  • foo() is actually called (aka, no call, no check)
  • foo() is called without parameter.
<?php

const D = 'abc';

function foo(int $a = D) {}

foo();
//Fatal error: Uncaught TypeError: foo(): Argument #1 ($a) must be of type int, string given
?>

The curse of the optimization of the inclusion

Two roots appears from this example : first, the curse of inclusion and second the effect of optimisation.

Include() and its related cousin autoload(), means that the PHP code is broken in multiple files, which are only merged together at execution time. Thus, some tests cannot be run until the code is executed. Even when inlining all PHP code into one file, some of the checks would be postponed to execution, as with the constant example.

Optimisation is the second root for these late reports. PHP has a lean engine, with high optimisation toward execution. In this example, skipping the validation of the type of the default value until it is actually used is certainly the most sensible way to do it : most of the cases will save a few checks. And we want that performance in production.

Static analysis has time, PHP has data

Static analysis easily review the above code, and report the error before execution. The secret ? Static analysis has a lot more time to build a comprehensive view over the code. It also has time to check multiple possibilities, even when they are rare or borderline impossible.

On the other hand, PHP has the actual data to process, and it doesn’t need to review all the execution branches : just the one needed for the current data. Nine time out of ten, there are no problem. And the tenth time, there is a problem.

Happy auditing!