PDFF : PHP Document File Format details

In the first episode, we have presented the origin of the PDFF: how it emerges to be a convenient format to describe PHP component, with a good level of details, some versioning and a dual-readability human/machine.

In this second episode, we’ll introduce the format and the content of such a file. In the end, there is a repository with a copious amount of PHP components and extensions, ready to be tested. You can get a few of them there to illustrate this document.

In the meantime, let’s do a in-depth look at the PDFF.

How : machine and human readable

To keep things simple and processable, the underlying format is JSON.

This is a format flexible enough for the varied structures that will be stored. The parsers and encoders are wildly available. And the PRETTY/COMPACT presentations provide different presentation for different usages.

Inside the PDFF : main branches

The dataset is represented as a tree. Indeed, this is going to be a big tree. Let’s climb the tree. Here are the first levels

  • name
  • vcs
  • handle
  • versions
    • 0.3.0
      • \
        • constants
        • functions
        • traits
        • classes
        • interfaces
        • enums
      • \Zttp\
        • constants
        • functions
        • traits
        • classes
        • interfaces
        • enums

At the top, there are some administrative information, including the name of the component, the way it was cloned vcs and the actual URI to clone it. handle may be a URL for git or a component identifier for composer, for example.

Then, the largest field is the versions field, which contains the code details. This field contains one object per version, with the version name as property name. Here, it is 0.3.0. This hash structure allows for several versions in the same file, although one is only displayed.

Inside a specific version, the next level are the namespaces. The global namespace \ is always available, then all the namespaces from the component are detailled, one after the other. The namespaces are not nested, so \A\, \A\B\ and \A\B\C\ are all distinct entries, at the same level. This is close to PHP’s handling of them, and also, different from the storage of files in a file system.

For each namespace, all the declared structures are listed, by category. Namely, constants, functions, classes, enumerations, interfaces and traits.

This is already quite a large tree. For a small component, there might a few namespaces, but for large frameworks, there may be a over a thousan : Akeneo, Symfony and Shopware, all clock over 1300 namespaces.

Now, let’s review the different elements. We’ll go gradually, introducing the general and specific elements of each category. By the end, there will be some repetition, that will allow us to speed up.

Global constants

Constants are a hash, based on the name of the constant. They have a name, and a value property, as expected. They also have an array phpdoc, for all the phpdoc comments. There are no attributes, as they are not supported by PHP.

The expression property is a boolean : it is true when the definition of the constant is a static constant expression, or false when it is a literal value. For example, const A = B + 1; has a true expression, and a piece of PHP code for value.

  • constants
    • NAME :
    • name
    • phpdoc []
    • expression
    • value
        "constants": {
          "WP_DEFAULT_THEME": {
            "name": "WP_DEFAULT_THEME",
            "phpdoc": [],
            "expression": false,
            "value": "'twentytwentytwo'"
          },
          "WP_DEBUG": {
            "name": "WP_DEBUG",
            "phpdoc": [],
            "expression": false,
            "value": "false"
          },...

Functions

Functions’ description is a bit more complex than constants. In particular, there is a second layer, with parameters.

  • functions
    • name :
    • name
    • returntype
    • reference
    • returntypes []
    • parameters []
    • totalParameters
    • optionalParameters
    • variadic
    • attributes []
    • phpdoc []

The name used for index is in lowercase format : it makes it easier to look up functions that way. The actual name, with its casing, is stored in the name property.

returntype is the list of types, returned by the function: they are provided as fully qualified names, all in lower case. This might be an empty array, when no returntype is provided. The type of the returned typehint is stored in the returntype property. It may be one (single or no type), or (union type) or and (intersectional type).

Parameters are stored in an array of objects, with another level of details. We’ll see them in the next section. That array is complemented with the number of totalParameters and the number of optionalParameters.

Functions are also augmented with a variadic property: this one is not explicitely expressed in PHP code. It means that one of the arguments (the last, for sure), is a variadic argument, making the whole function callable with an arbitrary number of elements.

Finally, phpdoc and attributes, which collect the corresponding structures from the source code. The attributes are actual PHP code.

        "functions": {
          "tap": {
            "name": "tap",
            "returntype": "one",
            "reference": false,
            "returntypehints": [],
            "parameters": [... ...],
            "totalParameters": 2,
            "optionalParameters": 0,
            "variadic": false,
            "attributes": [],
            "phpdoc": []
          }

Parameters

Parameters are an extra level of description. They have their own options and descriptions.

Parameters are stored as an array. The positions are the actual rank in the function signature, unlink constants and function which use their name as index.

The actual description of each parameter has obvious options : name, rank, reference, variadic, phpdoc, attributes, typehinttype and typehints. Typehints follow the same organisation than for the function return typehints, except for the name itself.

Default values for parameters are build around three entries : hasDefault, which defines if there is actually a default value or not; that prevents confusion between null (no default value) and null (default value is null). As for constants, there is an expression entry to identify constant static expression in default values. Lastly, default is the default value.

  • parameters
    • name
    • rank
    • variadic
    • reference
    • hasDefault
    • default
    • expression
    • typehinttype
    • typehints []
    • phpdoc []
    • attributes []
            "parameters": [
              {
                "name": "$value",
                "rank": 0,
                "variadic": false,
                "reference": false,
                "hasDefault": false,
                "default": "",
                "expression": false,
                "typehinttype": "one",
                "phpdoc": [],
                "typehints": [],
                "attributes": []
              },
              {
                "name": "$callback",
                "rank": 1,
                "variadic": false,
                "reference": false,
                "hasDefault": false,
                "default": "",
                "expression": false,
                "typehinttype": "one",
                "phpdoc": [],
                "typehints": [],
                "attributes": []
              }
            ],

Classes

Classes have the largest amount of data : some of them are already described in the previous structures, which we will mention and skip.

  • classes
    • name
    • name
    • final
    • abstract
    • readonly
    • extends
    • implements []
    • traits []
    • attributes []
    • phpdoc []
    • constants []
    • properties []
    • methods []

Classes are indexed by their name, in lowercase, for easy look up. Their actual name and case are stored in the name property. Each class has abstract, readonly and final as boolean attributes; phpdoc and attributes are similar to the one in functions or parameters.

Then extends as a single fully qualified name, and implements and uses as arrays of fully qualified names. All those are the dependencies of the class. uses includes the conflict resolutions details (not described here).

Then, a class holds arrays of constants, properties and methods.

        "classes": {
          "zttp": {
            "name": "Zttp",
            "abstract": false,
            "final": false,
            "extends": "",
            "implements": [],
            "uses": [],
            "usesOptions": [],
            "attributes": [],
            "phpdoc": [],
            "constants": [...],
            "properties": [...],
            "methods": [...]

The constants array is very similar to the one for global constants, except for the final and visibility properties. The latter one is a string, with private, protected, public and none.

The methods array is similar to the functions one, except for the static and visibility properties.

Properties

The property array is indexed by the property name. There are booleans for static, readonly; the couple typehints and typehinttype for typehints; visibility string and the triplet init, hasDefault and expression for the initialisation value; and finally the phpdoc and attributes entries.

          "$request_type": {
            "name": "$request_type",
            "visibility": "protected",
            "init": "",
            "static": false,
            "readonly": false,
            "hasDefault": true,
            "expression": false,
            "typehinttype": "one",
            "typehints": [],
            "phpdoc": [
              {
                "phpdoc": "\\/**\n\t * Action name for the requests this table will work with.\n\t *\n\t * @since 4.9.6\n\t *\n\t * @var string $request_type Name of action.\n\t *\\/"
              }
            ],
            "attributes": []

Traits

Traits are a simpler version of classes. The uses entry lists the other traits that are used by the current one, as an array of fully qualified names.

  • traits
    • name
    • name
    • uses
    • properties
    • methods
    • phpdoc

Interfaces

Interfaces are a simpler version of classes. The extends entry lists the other interface that is extended by the current one, as a fully qualified name.

  • interfaces
    • name
    • name
    • extends
    • constants
    • methods
    • phpdoc

Enums

Enumerations are a similar to classes, except for the cases and typehints. Typehint are either string, int or empty. Cases are build similarly to constants.

  • enums
    • name
    • name
    • typehint
    • constants
    • methods
    • cases

Conclusion

This quick presentation of of the PDFF format introduced the organisation and the different levels of information stored there. Most of the entries come naturally from the source code, with two exceptions : some extra entries are needed to keep the description acurate, like typehinttype, which would be presented as |, & or “ (nothing). The second difference is that all options are always presented, while PHP code would simply skip them and keep it the source uncluttered.

To take a look at this format in more detail, go to the public repository exakat/pdff. In the vcs folder, there are frameworks and libraries; in the packagist folder, there are components, and in the ext folder, there are PHP extensions. Each are detailed per versions. You can download them, and use them as you like.

In the next part, we’ll review where the PDFF format can help, both for machines and humans.

Until then, keep auditing your code!