Speeding up array_merge()

While doing a crowd review of naval battle code at @afup_rennes (in French), it appeared that the ‘no array_merge() in loops’ rule was known but not clear. Indeed, why is it that this function in particular, should be avoided in loops. Hence, this article, with a journey to memory management, coding and classic PHP features. Here we go, speeding up array_merge().

The array_merge() speeding bump

Everyone of us has written a piece of PHP code similar to this one : $x is a multi-dimensional array of arrays, that needs to be merged all together. With an arbitrary size incoming array, the loop seems a good idea, and, more importantly, it works as expected.

<?php

    $flat = array();
    foreach($array_of_arrays as $a) {
        $flat = array_merge($flat, $a);
    }
?>

 

Now, when you look on knowledge bases, you’ll realize that array_merge() is finally mentionned in many places, related to slow performances and memory consuption. And, indeed, it is visible in the code above, although not for small arrays.

Copying in memory

The above loop merges the arrays at each loop. On the first loop, PHP measures that it needs, say, slots of an array 3 and 4 sizes, which means a new array allocation of 3 + 4 = 7. Then, it does the copy of the elements, and move on.

Now, on the second loop, we now have an array of 7 (the previous one), and a new array of 2 (from the blind variable). PHP now allocates 7 + 2 = 9 and makes the copy. Note that the initial values are now copied twice so far.

On the third loop, we now have an array of 9 (the previous one), and a new array of, say, 5 (from the blind variable). PHP now allocates 9 + 5 = 14 and makes the copy.

This process repeats each time, since PHP is doing piecemeal mergings : each time, it merges the temporary array in $flat, with a new one. And each time, it needs to allocate memory for the new one.

To be fair, this was the exact process in PHP 7.2-, before memory management was overhalled. It is probably better optimized now, as we’ll see later. Yet, this explains the situation.

In the end, the low performances comes from PHP having to copy multiple times the same values, from temporary structure to the next. The initial arrays will be copied as many as the number of elements in the source array (minus one).

Secondly, memory has to be allocated each time to create the new arrays. This shall be collected later, though it is still quite a lot of work for PHP, which is eventually ditched, since we only care for the final result.

array_merge() and the arbitrary number of arguments

How to reduce that load of work for PHP? It is by taking advantage of the arbitrary number of arguments of array_merge() : indeed, one may stuff as many arguments as possible in one call to array_merge().

At that time, PHP will do the final allocation one, by sizing each array and reserving the right amount of memory. Then, it will do the copies, and once again, this will be a one-time copy.

Now, the last obstacle is to provide array_merge() with an arbitrary number or arguments. There are several solutions there :

  • Using the spread operator: et voila!
<?php
    $flat = array_merge(...$array_of_arrays);
?>

The spread operator turns the array of arrays into a long list of parameters, and give them to array_merge(), which process them in one call. I suspect PHP to not even ‘spread’ them, but process this array directly, though I am not enough familiar with the internals at that point.

  • Using the call_user_func_array() : et PHP 7.3! This syntax is valid for when you don’t have a recent PHP version, and the … operator is not available.
<?php    
   $flat = call_user_func_array('array_merge', $array_of_arrays);
?>

Real coding

Now, I have to say that the code above is a bit of a cheat, since the incoming array is already in a great shape for array_merge() usage. So, if you have to collect things before, the tip is still build that structure. It will keep the loop in your code :

<?php

    $tmp = array();
    while($array = source_of_array()) {
       $tmp[] = $array;
    }
    $flat = array_merge(...$tmp);
?>

I turned the loop into a potentially infinite loop. The extraction method is a functioncall here, though you may replace it with anything else. The important thing is to collect the intermediate arrays in the $tmp one and finally, flatten it all in one call.

Speed measures

This simple speed trick is small enough to be testable after writing a short script. I suggest you give it a try.

The gains are different depending on the size of the arrays and their number, inside the initial multi-dimensional array.

On PHP 8.1, merging 2 arrays of 1 elements yields a 12% increase. Strangely enough, my local tests show some degradation with the size of the array for that number of arrays.

Then, performances progressively increase until a plateau of 90% increase (with lots of long arrays). So, as expected, this is getting better with size of the source.

Also, the same appears across PHP version. In particular, PHP 7.3- shows a faster increase of speed in favor of calling array_merge() very fast.

Last, the collection time do shaves of a bit of performances. If the data has to be collected in a nice array first, it may make the trick worse for merging one or two arrays of small sizes. However, this immediately benefit with 3 or more arrays.

Arbitrary number of arguments to functions

There are some PHP functions which accept an arbitrary number of arguments, and they may be open to speed improvement just like for array_merge().

implode()

This is the case for implode(), which you could recognize better as a str_merge() function : it does merge multiple strings into one. So, basically, turning an loop of concatenations into a one-call to implode() shall speed up your text building.

This was true in PHP 7.3-, and now a days, it requires huge amounts of strings to be actually profitable. Upgrading PHP version is actually a good move there.

array_sum()

array_sum() is also a good candidate, though this time, the amount of memory copy at addition (or substraction) time is zero : the integers are combined, not copied. So, this will not bring any benefit in terms of speed.

Arbitrary number of elements in custom methods

This strategy of PHP is available to any code. You may spot some of your methods that are repeatedly called inside loops : why not transform the method to accept an array, or an arbitrary number of arguments, and move the loop inside it ?

The main consideration here is that processing the load as one batch should provide some kind of again. For example, batching insertions in the database is a good idea, as it will make one call to the database, with multiple rows. Or, turning an array append into an array_merge() may bring some speed up. On the other hand, calling array_sum() would not be critical.

Conclusion

This should explain why array_merge() should be outside loops. It is difficult to PHP to anticipate how to optimize this call, although the improvement of performances for an array_merge() loops since PHP 7.0 is impressive so there is a lot going on behind the scenes, for our own good.

In the mean time, I suggest that you pull those array_merge() out of the loop, and refactor it to reduce resource consuptions.