Filtering Empty Arrays Before array_merge()?

Picture this: you’re looping through something, calling a function each time. It always returns an array, and, sometimes it gives back an empty array. At the end, all these array will be array_merge(...$results) together. The first point is to avoid array_merge() inside the loop, as it is one of the classic PHP optimisation. And, since we are on the subject of performance, a question pops up: should you bother filtering empty arrays first? Let’s review all of that.

No `array_merge()` In A Loop

array_merge() merges two or more arrays into one. When the list of merged arrays is known before the merge, it is possible to do it in one call. But when that list is random, it is natural to put it in the loop.

<?php
$results = [];
foreach ($source as $item) {
    $results = array_merge($results, getArray($item));
}
?>

With such a structure, array_merge() is called each cycle of the loop, and, every time, it allocates the current size of the array, plus the size of the new array, and then, copies everything. All these allocations are eventually recovered later, by the garbage collector. But, in between, the loop has been requesting more and more memory.

The trick is to collect all these intermediate arrays, and then, merge them all at once. If you remember the beginning of this section, array_merge() merges two or more arrays into one: more is the important word here.

<?php
$results = [];
foreach ($source as $item) {
    $results[] = getArray($item);
}
$merged = array_merge(...$results);
?>

Here, storing the arrays is very cheap, as only a reference to the data is stored: the whole array is not copied. Then, array_merge()makes the allocation once, based on the count of the elements in all of these arrays, and then, performs the copy of elements once. It saves memory, allocation and processing time. The performance gains are huge.

And the above code may be reduced even further, as long as $source is an array, with array_map().

<?php
$merged = array_merge(...array_map(getArray(...), $source));
?>

This fits well the illustration code, where a function is called to process the elements of $source. This solution is slightly better than array_merge() inside the loop, and still much slower than outside the loop.

Of course, leave it to your static analyser to report it and suggest an short and painful update.

Optimizing the empty arrays

After array_merge() performances, a second question arise: is there any gain by skipping empty arrays from getArray()function. In the end, it will be less work, and we already know the result of merging any array with zero, one or any number of empty arrays: it is the original one. There are a few ways to write this, and they all look reasonable:

<?php

// Just merge everything — empties and all
$results = [];
foreach ($source as $item) {
    $results[] = getArray($item);
}
$merged = array_merge(...$results);
?>

<?php

// Check each result before collecting it
$results = [];
foreach ($source as $item) {
    $arr = getArray($item);
    if ($arr !== []) {
        $results[] = $arr;
    }
}
$merged = array_merge(...$results);
?>

<?php

// Collect everything, then filter with array_filter()
$results = [];
foreach ($source as $item) {
    $results[] = getArray($item);
}
$filtered = array_filter($results);
$merged   = array_merge(...$filtered);
?>

The third one looks clean: one idiomatic PHP call, tidy code. The second one feels disciplined, like you’re not letting garbage in. The first one feels lazy. But lazy is sometimes the smart move.

Let’s review what actually happens.

Speed

To assess the situation, we’re going to do benchmarks on PHP 8.5.6, 2,000 iterations per case, usig array with sizes from 100 to 10 000 arrays, and ratios or empty arrays from 1% to 50%. The function getArray() itself is fixed, and does not matter: the only thing changing is what we do with the returned value.

array_filter() is the slowest of the three configuration. Not occasionally: always. At n=10,000, it’s about 25% slower than a raw merge.

n=10 000, ~10% empty
  no filter (raw merge)          637.9 µs/call
  filter at collection           695.3 µs/call
  array_filter() before merge    797.0 µs/call

Why? array_filter() walks the whole collected list a second time. That’s a full extra traversal: a C-level loop, a callback invocation, even the default truthy check costs something, a new array being built. This raises the cost of processing, whether there are two empty arrays or two thousand.

Between the other two scenarii, it depends on how many empties you’re actually dealing with.

When 50 % or more of the arrays are empty, filtering at collection time wins: modestly, around 5 to 10 %. Fewer entries means a smaller argument list for array_merge(), which means less work.

n=10 000, ~50% empty
  no filter (raw merge)          439.8 µs/call
  filter at collection           433.0 µs/call   ← winner
  array_filter() before merge    530.4 µs/call

When less than about 30–40 % are empty, the result flips. That if ($arr !== []) runs on every single iteration, even the ones that already give useful data. A comparison and a condition, both are cheap, yet they produce enough overhead to cost more than the savings from passing a shorter list.

n=10 000, ~1% empty
  no filter (raw merge)          655.6 µs/call   ← winner
  filter at collection           735.9 µs/call
  array_filter() before merge    824.5 µs/call

At more realistic ratios, a function that occasionally returns empty arrays, say under 20 % of the time, then a simple raw merge is the fastest. array_merge() is a tight C loop; empty sub-arrays cost it basically nothing to skip.

Memory

Memory is another aspect of performance, and it is interestingly weird here.

After the call, all three strategies use the same memory. The GC cleans up, the heap goes back to baseline, the merged result is the same size either way. Empty arrays contribute zero elements. This is quite expected.

Peak memory during execution is where they separate.

Filtering at collection has zero peak overhead. The empty arrays are never stored into $results in the first place. The collection list stays smaller from the start. This yield a lower maximum memory peak.

array_filter() is the worst for peak memory: again and always. This is the part that surprises people. When you call array_filter($results), PHP builds a brand-new array, $filtered, while $results is still alive. Both exist at the same time until the code hits array_merge(). At 50 % empty and n=10 000, that’s about 330 KB of extra needed allocation.

Step-by-step trace, n=10 000, 50% empty:

  After collection (all arrays, with empties)  : 1 346 296 B
  After array_filter() copy                    : 1 674 032 B   ← +328 KB
  After merge (unfiltered)                     : 1 940 328 B
  After merge (filtered)                       : 1 940 328 B   ← same result

The merged result is byte-for-byte identical. The array_filter()copy was pure overhead.

For memory, the verdict is raw merge is in the middle. array_merge() needs working space proportional to the argument list. Empty slots add a small per-entry cost, maybe 22–67 bytes each, but it’s bounded and disappears as soon as the function returns.

So Who Is Filtering Empty Arrays Before array_merge()?

Approach	CPU	Peak memory
Raw `array_merge()`	Best when < 40% empty	Moderate
Filter at collection (`!== []`)	Best when ≥ 50% empty	Best always
`array_filter()`before merge	Worst always	Worst always

Drop array_filter() entirely. It loses on both fronts. Its only appeal is that it looks readable in a code review. It’s tidy but greedy: it’s a hidden double-traversal with a temporary allocation attached.

In fact, if the source can handle the weeding out of the empty arrays by not returning them, it is even better. Think about adding a WHEREfilter in a SQL query, or skipping empty entries at decoding time. If it is not possible, just ignore it altogether.

Between the other two, the honest answer is: you probably don’t know your empty-array ratio in production, and measuring it is more trouble than it’s worth. In that case, filter at collection. It’s never the worst option on CPU, and it’s always the best on memory. The code also says what you mean: “I’m building a list of non-empty results.”

<?php
$results = [];
foreach ($source as $item) {
    $arr = getArray($item);
    if ($arr !== []) {
        $results[] = $arr;
    }
}
$merged = array_merge(...$results);
?>

If you have profiler data showing fewer than 5 % empties and CPU is your bottleneck, switch to the raw merge and drop the check. Otherwise, ship this one. It’s readable, it never surprises you on memory, and it does exactly one pass.

The broader takeaway: PHP is smart enough to do the right thing, and an extra userland pass rare beats letting the engine do a bit more work in a single C-level call. When in doubt, resist the urge to pre-clean data before handing it to a built-in. When there is an opportunity, measure real world data and adapt the code to it. Knowing the dataset is always a major advantage.

Want to Keep in touch with us, subscribe to our newsletter !

Code auditing

Filtering Empty Arrays Before array_merge()?

Filtering Empty Arrays Before array_merge()?

No array_merge() In A Loop

Optimizing the empty arrays

Speed

Memory

So Who Is Filtering Empty Arrays Before array_merge()?

Login

No `array_merge()` In A Loop