String to number comparison with PHP 8.x

@FredBouchery reported one tiny update from PHP 8.0 : some of the comparisons have been modified between PHP 7 and PHP 8. This is pretty old news, since PHP 8.0 is not even the latest and greatest version of PHP, and it is also quite benign :

<?php
   var_dump(0 == 'a');
?>

For fun, Fred made it into a meme about non-strict equality.

String to number comparison

The origin of this backward incompatible evolution is in the changelog. The way PHP converts values when comparing boolean, integers and strings have changed.

Comparison Before After
0 == "0" true true
0 == "0.0" true true
0 == "foo" true false
0 == "" true false
42 == " 42" true true
42 == "42foo" true false

One of the related advantage is that 0 == "not-a-number" is now false, instead of true. This will prevent some Hash security issues.

The original post hints at using only the strict comparison operator, such as === and !==, which checks the type of compared data before actually doing the comparison. This prevents PHP from activating its type-juggling routines, and avoid confusion. This is definitely a best practice for PHP.

The juggling is in the details

Using the strict comparison is not always obvious in PHP code. Type juggling is endemic in PHP, and it actually ends up in situation that are easy to understand and even easier to forget.

This means that there are backward incompatibilities between PHP 7 and PHP 8

  • with the inegalities, such as >, < or <=
  • with in_array()
  • with switch() {}

And, as a bonus, we’ll also take a quizzical look at

  • switch() and its cases ordering

Here we go!

Inegalities

Let’s remind ourselves the context : how does PHP 7 and PHP 8 differ when comparing strings and numbers.

<?php

                     // PHP 8.x  PHP 7.x
var_dump(0 < 'a');   //  true    false
var_dump(0 > 'a');   //  false   false

var_dump(1 < 'a');   //  true    false
var_dump(1 > 'a');   //  false   true

var_dump(-1 < 'a');   // true    true
var_dump(-1 > 'a');   // false   false

?>

As you can see, in PHP 8.x, an integer is systematically smaller than a string, which is not a number. Whatever value ‘a’ is converted to, internally, it will always be greater than any integer.

In fact, it is possible to have a integer larger than PHP_INT_MAX, with PHP 8.0 :

<?php

                               // PHP 8.x  PHP 7.x
var_dump(PHP_INT_MAX < 'a');   //  true    false

?>

On the other hand, PHP 7.x used to convert the string to an integer, of value 0, and then, do the comparison.

The safe way with integer inequality

There are two ways to do keep your code intact while migrating from PHP 7.x to PHP 8.x. Either stick to PHP 7.x behavior, or adopt PHP 8.x one. Just pick up your fight : against the future coders or against the legacy code.

The legacy option is to force the juggling before the comparison, by adding the (int) operator. This will turn the string to an integer, and emulate PHP 7.x behavior. This will also baffle any newcomer in your project : there is now a compelling need to cast anything when using inegalities. There will also be smart auditors, who will point that those casts are useless, of course.

<?php

if ((int) $a > (int) $b) {
    // doSomething
}

?>

The future option is to check the types before the inegality, so as to know how to branch toward the right processing. This is pretty inconvenient (though, you can ping me with a better solution anytime).

<?php

if (is_string($a)) {
    if (is_string($b)) {
        // $a &amp;gt; $b is an ascii comparison
    } elseif (is_int($b) {
      // $a is bigger than $b
    } else {
        trigger_error('Wrong types : $b must be integer or string');
    }
} elseif (is_int(a)) {
    if (is_string($b)) {
      // $a is smaller than $b
    } elseif (is_int($b) {
        // $a &amp;gt; $b is an integer comparison
    } else {
        trigger_error('Wrong types : $b must be integer or string');
    }
} else {
    trigger_error('Wrong types : $a must be integer or string');
}

?>

in_array()

Unbeknownst to most of us, in_array() checks for values inside an array by performing a weak-type comparison. So, this is affected by the PHP 8.x evolution :

<?php

$a = ['a'];
$b = [0];   
                             // PHP 8.x  PHP 7.x
var_dump(in_array(0,   $a)); // false     true
var_dump(in_array('a', $a)); // true      true

var_dump(in_array(0,   $b)); // true      true
var_dump(in_array('a', $b)); // false     true

?>

the safe way with in_array()

Here, the safe way is a lot easier than with inequalities : in_array() has a third boolean argument, which makes it uses the strict comparison.

Just add ‘true’ to as the third argument, and the behavior will be the same between PHP 7.x and 8.x.

Also, if some php-src contributor is reading this page, can we have a native constant for this? Something like that :

<?php

const STRICT_COMPARISON = true;
const LOOSE_COMPARISON  = false;

if (in_array($value, $array, STRICT_COMPARISON)) {
// doSomething
}

?>

It is not the only named constant missing, but that is probably the most used.

switch()

switch() is another place where type-juggling comparison happens, even though the == operator is not explicit. In fact, it happens between the expression in the switch and the values in the cases.

<?php

switch(0) {
    case 0   : print "case 0\n"; break;
    case 'a' : print "case A\n"; break;
}

switch('a') {
    case 0   : print "case 0\n"; break;
    case 'a' : print "case A\n"; break;
}
?>

In PHP 7.x, only case 0 will come from that switch syntax, while in PHP 8.x, it will be both case A and case 0.

the safe way with switch()

One of the safe way with switch() would be to move to PHP 8.x’s match() expression, which only uses the strict operator. The syntax are very close, and that would be an easy upgrade. The main problem is that this syntax is not available in PHP 7.x.

So, the solution is to skip PHP’s type juggling by … juggling it yourself. Note how the cast is only done at switch()’s time, so that the actual value is available inside the case. This is clever or not, depending on your sensitivity.

<php

$a = 'a';
switch((int) $a) {
    case 0   : print "case $a\n"; break;
}

?>

Another option is to avoid mixing integers and strings in the list of case’s values.

switch() and the order of the cases

Take a look at this piece of code, where only the order of the cases differ between the set of switch() at the top, and the one at the bottom.

<?php

switch(0) {
    case 'a' : print "case A\n"; break;
    case 0   : print "case 0\n"; break;
}

switch('a') {
    case 'a' : print "case A\n"; break;
    case 0   : print "case 0\n"; break;
}

switch(0) {
    case 0   : print "case 0\n"; break;
    case 'a' : print "case A\n"; break;
}

switch('a') {
    case 0   : print "case 0\n"; break;
    case 'a' : print "case A\n"; break;
}
?>

In PHP 7.x, this script displays the following :

case A
case A
case 0
case 0

This is the first case in each of the switches, since ‘a’ == 0. The first matching case is the right one, and PHP 7.x happily reports the value and stops.

In PHP 8.x, the same script displays the following :

case 0
case A
case 0
case A

Now, the behavior is consistent in PHP 8.0 : the right value is channeled to the right case, and the display is good.

This means that any incoming string will be caught by a ‘case 0:’ in the middle of the cases list, and act as a default value : any other case on a string, that is coded afterwards, will be ghosted.

the safe way

The safe way here is to again to split the switch depending on the type of the incoming string.

It is also possible to mitigate this problem by moving the case 0 to the end of the list of cases, as the last one. This will ensure that the incoming value may have a chance to check all the other values before finding the hidden catch-all sequence. May be, adding a cast to (string) will also avoid that an incoming 0 trip over the actual strings.

Conclusion

The string to number comparison evolution is quick to learn. It look very inocuoous in the changelog. Yet, there are many implications, as it has been used and reused everywhere in the code. And the silent nature of this evolution makes it very difficult to track : who will question in_array() or switch to branch to the wrong number?

Exakat will include a dedicated rule in the upcoming 2.3.7 version to cover this problem, and help with PHP 8 code migration.

Follow us on tweeter for more #php coding reviews.