PDFF : PHP Document File Format

PDFF, or internally named PHP Document File Format, has been at the center of the development of Exakat since late 2021. It comes from a specific problem : describing dependencies, for the static analysis engine to understand and to take advantage of them, without auditing their source at the same time.

This means simplifying the current observations of PHP code, while keeping enough details to allow advanced analysis. And so, PDFF were born.

PHP Document File Format is a format specification, both readable by machines and humans alike. It takes its root in stubs files, provides support for static analysis and versioning, and open field to new applications.

In this serie of articles, we’ll cover the origins of the PDFF format, its current description, and some of the future applications. Let’s start with the origins.

A name and a version

Dependencies are a staple of PHP coder’s life, and thanks to Composer, from Nils Adermann, Jordi Boggiano and many contributors, this is quite an enjoyable life. After checking packagist, a simple composer require a/b command is sufficent to get started.

In fact, a version is not even needed, as composer will decide it itself. And later, it is possible to cling to a specific version by mentioning it in the composer.jsonfile. That way, new and breaking versions will not interfere with the current code.

This is the first phase of any development : the phase where one need something to be done, preferably fast and with as little decision taking as possible. Thankfully, with hundred’s of thousands of components, there is a lot of choice.

Stub files

The next phase of the development is maintenance. It may actually happen pretty fast, and it starts with a simple question : why is it, that something that worked yesterday, is breaking today? So, now, the task to keep up with the dependency, and its own lifecycle. New versions happen everyday.

To reduce the interferences, one often stick to a specific version. Indeed, this works in the short term, while simply creating a growing future maintenance task. Depending on the situation, it may be good or bad.

The second option is to update the dependency and the impacted code, as soon as possible, and hopefully, detect any discrepancy between the current version and the previous one. This means, reading the docs and the changelog, and believing that everything is exhaustive.

This frequent and repetitive works is definitely a task for an automated piece of software : namely, static analysis. To improve the understanding of the dependencies, static analysis need to have a description of the dependency, at the interface : interface, here, is an abstract term to describe everything provided by the dependency. Basically, any PHP structure may be included in this interface : functions, constants, classes, methods, syntax quirks, features, interfaces (sorry for the pun), name spaces, arguments, default values, properties, visibilities, etc.

And so, stub files were created. Stub files are a simplification of the PHP code, down to its shell : the signatures. A signature describes a function, and the body of the function itself is not important. All we need is the way to call it, and under which constraints.

<?php

// This is a real function
function addition(int $a, int $b) : int {
    return $a + $b;
}

// This would be the stub function
function addition(int $a, int $b) : int {}

?>

Since stub files are providing actual and simplified PHP code, static analysis engines read those files, and audit out argument lists, parameter names, typehints, phpdocs, etc. By skipping the body of the function, they aslo skip long and hard AST analysis.

Basically, stub files are a simplification of PHP code.

PHP Stubs files

Stub files are already available online, both as tools and repositories.

Stub repositories, provide the stub files for later configuration in the SCA : PHPStorm stubs, Phan stubs, PHPstan stubs.

Those repositories are create by their respective authors. They are usually adapted to their specific use, with extra information and dedicated attributes. Here, phpstorm offers a way to mention how the property signature changes with PHP versions.

<?php

    /**
     * @link https://www.php.net/manual/en/zip.constants.php#ziparchive.constants.opsys.default
     * @since 5.6
     */
    public const OPSYS_DEFAULT = 3;

    /**
     * Status of the Zip Archive
     * @var int
     */
    #[LanguageLevelTypeAware(['8.1' => 'int'], default: '')]
    public $status;
    
?>

When the stubs are not available for your framework or your own piece of code, you can make them with one of the several stub generator, such as Stub generator and PHP extension stub generator. Of course, Exakat also has a stub report, which produces the stub files from any audited code.

Stubs advantages and limitations

Stubs provide a good level of summary for dependencies. With signatures, at classes, functions and traits levels, they offer quite a lot of information. Just consider that a method may have visibility, static or not, final or not, return reference or not, variadic or not, return typehint, attributes and phpdoc. A few years ago, it was not possible to have such a level of information in a simple PHP signature.

<?php

    /**
     * @since 5.6
     */
    #[Attribute]
    function &foo(array|string $s, ...$b) : void {}
    
?>

This is a summary of the dependency code, with a good level of information. And also, as a summary, it set aside some details. Which details are important will depend on the actual application of the stub.

And, on the other hand, the current stubs are quite close to the analysis engine they serve. So far, innovation in the static analysis field is still going strong. The need for collaboration is not yet there. Thus, there are multiple stubs versions.

Another limitation is the history of the stubs. They are produced with the most recent versions, in an effort to keep track of evolution. Older versions may be tricky to find, let alone adapt.

One standard to

Keeping track of dependencies is a fact of modern developer life. Automating that work should be offloaded to static analysis engine, which have the power to review both the code and their dependencies. They only lack the knowledge of the dependencies, to be able to apply them.

In the next part, we’ll introduce the PDFF, which modernize and simplify the manipulation of the stub file, in a human and machine friendly way.

Until then, keep auditing your code!