PHP likes sorting

PHP likes sorting a lot

PHP likes to sort. Of course, there is sort(), ksort() and all the cousins. But, PHP actually sorts too much. My first encounter with the problem is the infamous array_unique(). Now, this is also affecting glob() and scandir(). I’m looking for others. Until then, check your code.

array_unique() is also sorting

array_unique() collects distinct values from an array. However, its performances degrades quite quickly with the size of the array. This is quite strange : with 100 elements, array_unique() is 20 times slower than array_keys/array_count_values, and with 1000 elements, it is actually 130 slower. From the manual, one may realize that array_merge() does some sorting. The 2nd argument is indeed an option to change the sorting in array_merge().

<?php print_r(array_unique([2,3,1,2,3]));?>

Array
(
[0] => 2
[1] => 3
[2] => 1
)

The irony is that the resulting array in never sorted in anyway.

Glob() and scandir() are sorting

Other functions that sorts too much are glob() and scandir(). Glob() is a system call to the glob() function (sick, isn’t it?). It’s a convenient function, that allows wild-carded listing of files. It accepts a GLOB_NOSORT flag that prevents the sort. By default, the listed files are sorted. The impact of the execution time is lower than the one from array_merge().

Listing 28k files :
glob() with default values : 16s
glob() with NOSORT         : 12s

In fact, the alternative to glob() is scandir(). scandir() also listing files, though it doesn’t handle wild-cards. Scandir(), on the other hand, also has some default sorting. It is always possible to SCANDIR_SORT_NONE, which is not sort.

How to speed up your code ?

As often stated, use functions that do only what they are supposed to do and not more. Unless it is a needed feature for your code, you may gain performances by simply using the following :

Automated check your code

All those three are reported by exakat in its ‘Performances‘ section, among others. Simply run the default analysis and spot performances potentials. Array_unique() is quite common, with roughly 1 project out of 3 (36 % ) of code source using it at some point. Glob() and scandir() are scarcer, especially used on large folders : 10 %.

And remember, read the docs, once in a while, just to keep you updated. Or use a static analyzer, that reads the docs often.