Smooth migration from array to object in PHPSmooth migration from array to object

I still need a smooth migration from array to object. There are a good number of arrays that are acting like objects in my source code. I have read (here, here) and written about the advantages of replacing arrays with objects in PHP. They are significant: better performance, less memory usage, and improved readability.

Syntax change

One of the main obstacle to migration is the syntax change. PHP doesn’t like to access an object with an array syntax, and yields a warning error, while returning NULL. The opposite is also true: don’t use the array syntax on an object, though it raises a Fatal Error.

<?php

$a = array('b' => 1, 'c' => 2);

echo $a['b'];
echo $a->c;
//Warning: Attempt to read property "c" on array

?>

PHP is touted for its dynamic syntax, so there must be something in its tools belt. And there is: PHP is able to handle an object with the array syntax. You may have heard about the ArrayObject PHP native class, that makes an object behave like a array.

<?php

$a = new ArrayObject(array('b' => 1, 'c' => 2));

echo $a['b'];
echo $a->c;
//Warning: Undefined property: ArrayObject::$c

?>

What’s missing is the interaction with properties. By default, the property syntax $object->property sets a property, and not an entry in the array. And the array synax $object['property'] sets an entry in the array. Here, we need both syntaxes to be directed to the array, so we need a bit of extension.

<?php

$a = new dualArrayObject(array('b' => 1, 'c' => 2));

class dualArrayObject extends ArrayObject {
    
    function __get($name) {
        return $this[$name] ?? null;
    }
    
    function __set($name, $value) {
        return $this[$name] = $value;
    }
}

echo $a['b'];
echo $a->c;
echo $a->d = 3;

?>

Note that ArrayObject stores the array in the storage property, which is private (not shown in the code above). This makes interaction with this property forbidden. At the same time, the array syntax is already available with $this, so we can use it.

$this['index'] may be surprising to discover in the source. It is an old behavior from #PHP 4, that was forbidden, by default, later. And here it is again, coming back by the window, with more explicit code. This is nice.

Use ArrayObject for migration

ArrayObject makes both array and object syntaxes available for the same data. It implements by default IteratorAggregate, ArrayAccess, Serializable, Countable interfaces. That makes this object usable with foreach(), the array syntax (already seen), serialize() and count().

These are the most common usage of arrays. That simple conversion covers a lot of use cases. This now means that the rest of the code can move freely from one syntax to the other. It paves the way for a migration period.

Once the code has been migrated to the new syntax, this patch can be removed progressively. The ArrayObject becomes a simple object, and all previous array syntaxes are now not valid anymore. In case of any left over issues, there will be an entry in the logs: at that point, they should be rare. They may even be fixed by reintroducing the migration code.

Sunsetting the array syntax

With this approach, and thanks to object programming, it is possible to add a warning for whoever is using the old array syntax.

<?php

$a = new dualArrayObject(array('b' => 1, 'c' => 2));

class dualArrayObject extends ArrayObject {
        // ArrayObject with warning
    function offsetGet(mixed $offset) {
            trigger_error("Avoid using array syntax, and use the object one.", E_USER_DEPRECATED);
            parent::offsetGet($offset);    
    }

    function __get($name) {
            // No need for trigger here, because it is the target syntax
       return $this[$name] ?? null;
    }
}

echo $a['b'];
echo $a->c;
echo $a->d = 3;

?>

The E_USER_DEPRECATED error level is dedicated to these migration. It shall pop up in development code, and later, be logged on production system. With an explicit message, it gives anyone with editing rights the opportunity to modernize the code. Besides removing the error message, changing the code will also speed it up, so it is a great incentive.

Use ArrayAccess instead of ArrayObject

ArrayObject is convenient, though it also provides a lot of features, via the implemented interfaces. When the code is simple enough, it is recommended to implements only the needed features.

For example, Exakat makes use of the token_get_all() function, to collect all the PHP tokens from the tokenizer. The result of that function is an array of arrays or strings. The main array is the ordered list of tokens, while each entry is an array describing the token. Sometimes, it is a string.

<?php
$tokens = token_get_all('<?php echo; ?>');

foreach ($tokens as $token) {
    print_r($token);
}

/*
Array
(
    [0] => 394
    [1] => <?php 
    [2] => 1
)
Array
(
    [0] => 328
    [1] => echo
    [2] => 1
)
Array
(
    [0] => 397
    [1] =>  
    [2] => 1
)
more tokens ... 
*/

?>

These tokens are used as a Value Object. There is no other fancy operation on them than accessing the index 0, 1 or 2. So, ArrayAccess is sufficient here. Here is a simplified version of that object:

<?php

class Token implements ArrayAccess {
    public int $token;
    public string $code;
    public int $line;
    
    private const OFFSETS = array(
        0 => 'token',
        1 => 'code',
        2 => 'line',
    );
    
    function __construct($token, $code, $line) {
        $this->token     = $token;
        $this->code      = $code;
        $this->line      = $line;
    }

    public function offsetExists(mixed $offset): bool {
        return in_array($offset, array_keys(self::OFFSETS));
    }
    
    public function offsetGet(mixed $offset): mixed {
        if (!isset(self::OFFSETS[$offset])) {
            debug_print_backtrace();
            die('No such offset as '.$offset);
        }
        $property = self::OFFSETS[$offset];

        return $this->$property;
    }

    public function offsetSet(mixed $offset, mixed $value): void {
        die(__METHOD__);
    }
    
    public function offsetUnset(mixed $offset): void {
        die(__METHOD__);
    }
}
?>

It includes a constant to convert the offsets into properties. This will be removed later, when the array syntax is not used anymore.

Two of the methods of ArrayAccess are unused, so they are implemented with a die(). offsetUnset and offsetSet are never called, as exakat only reads the information about the tokens, and does not assign nor change them. If die is too harsh for your coding style, you may also trigger or log such usage for later processing.

Sometimes, it is worth keeping these methods implemented: they might unearth special and rare usages that really needed a refactor. It is a good probing system.

Other common pitfalls

We have just shown that some of the array features don’t have to be ported to the object. This is an optimization for the privileged that have knowledge of the code.

Besides the simple change of $array['index'] to $object->property, there are some other side effects that are worth mentioning.

ArrayObject is not array-type compatible

Anything that was typed with array must now be updated. It should be array|MyNewObject, as least. This means that PHP 8.0 is needed for that.

<?php

function foo(array|MyNewObject $array) {
   return $array[0];
}
?>

Of course, it is always possible to drop the typing during the migration, but it’s a lot of work to bring it back again later.

On the other hand, it is possible to replace array by iterable, in the case the object is reviewed with a foreach(). iterable is the equivalent to array|Traversable so when the new class is implementing that interface, it is safe to replace array with iterable.

Type checks with is_array() are to be upgraded

Besides the types, consider also that checks with is_array() is a show stoppers in the code. And an (array) call might break the migration to objects. The first one may be replaced with is_iterable() or is_array($x) or $x instanceof myNewObject, and the second needs a rewrite.

Array functions need a detour

Lastly, array functions are not usable anymore, at least directly.

With ArrayObject, some functions are still usable, such as the family of *sort(). They have been ported as methods to the ArrayObjectclass. Just don’t look for the sort() and rsort() method itself, they don’t exist (Apparently, they have too much impact on the indexes). But the others do: asort(), uksort(), ksort(), natcasesort(), etc.

<?php

$ao = new ArrayObject(['a', 'z', 4=> 'f']);
$ao[] = 'd';
$ao->asort();
print_r($ao);

?>

Otherwise, an error is displayed : Fatal error: Uncaught TypeError: sort(): Argument #1 ($array) must be of type array, ArrayObject given

Fetch the array for any array function calls

On the other hand, array_keys() or array_column() won’t work anymore, at least directly. There are workarounds.

It is possible to fetch the array version of the new object with functions like iterator_to_array(), the ArrayObject::getArrayCopy()method, or the (array) cast operator. Basically, they convert the object back to an array.

<?php

$ao = new ArrayObject(['a', 'b', 4=> 'c']);
$ao[] = 'd';
print_r(array_keys((array) $ao));

/*
Array
(
    [0] => 0
    [1] => 1
    [2] => 4
    [3] => 5
)
*/

print_r(array_values(iterator_to_array($ao)));
/*
Array
(
    [0] => a
    [1] => b
    [2] => c
    [3] => d)
*/

print_r(implode('.', $ao->getArrayCopy($ao)));
//a.b.c.d

?>

Internalize the array functions

While migrating to a OOP syntax, if any of the array function is missing for your code, you should consider making it an extra method.

With an ArrayObject extension, you’ll have to fetch the array from the parent class with a call to ArrayObject::getArrayCopy, as the storage is a private property. With a custom object, that access might be easier as you control the visibility.

<?php

class myNewObject extends ArrayObject {
   function array_column(string $column) : array {
        return array_column($this->getArrayCopy(), $column);
   }
}


?>

Migration at different scales

PHP flexible syntax allows for using an object with the array syntax. With the help of ArrayObject class and several other interfaces such as ArrayAccess or Traversable, it is possible to migrate smoothly from using an array to a new object.

This approach is overkill when the migration can be run as a one time refactorisation. For example, when you have control over the code from beginning to the end, then using PHP dynamic syntax as migration tool will lengthen the time of rewrite.

On the other hand, several situations take advantage of this approach. In particular, when the refactorisation is very large and is getting too risky to do in a one time patch. Replacing the array with a compatible object keep the rest running, and help spot incompatibilities.

It is also a good approach for backward compatibility. The extra array layer is slower, which makes a good incentive to migrate to the new syntax, while providing support for untouched code. And the new object code is a good place to add E_USER_DEPRECATED warnings to signal the evolution to the unsuspecting.

Smooth migration from array to object

Object representation has gained in speed and efficiency in the recent PHP versions, and it is making the code more readable than arrays. There are opportunities to modernize one’s code.

Not all arrays are meant to be turned into an objects. The one that are the most interesting are array of arrays, just like with token_get_all() or preg_match(). Many PHP native functions still produce arrays, and it would be nice to have more methods like mysqli_fetch_object, which allows producing directly a custom object from a database call.

In the mean time, forcing the conversion of long arrays to objects is an operation that costs processing time. While the memory gain is real, the initial transformation has to be measured to ensure it provide a good return on investment. In the long run, it always does.