Confusing variable namesConfusing variable names

Did you ever meet that awful situation, where you try to understand why the SQL query fails, until you realize that $res and $req are simply too confusing ? There are confusing variable names, notably those that are one letter close from each other. Problems ranges from $data and $date in the same function to $this and $tis.

Spotting confusing variable names

Variable names are merely a string, prefixed with $. Since $ is compulsory, it bears no meaning in the variable name, and may be omitted. We are left with analyzing a simple string. At that point, language is important.

To spot variable names that are too close, the exakat engine searches for variables that are in the same scope : in the same function or method. This way, variables with close names, but which are used in remote part of the code are ignored.
Two letters variables are ignored, as too short. They may have to be widened, but that’s another problem.

Then, exakat uses the levensthein distance : this algorithm provides the ‘ minimal number of characters you have to replace, insert or delete to transform a string into another string’ (Loosely quoted from PHP manual). So, to convert $foo_bar into $foobar, one just need to drop the _. That’s one alteration. From $data to $data2, adding a figure is sufficient. Swapping letters is also a valid alteration.
Let see what kind of confusion are available in PHP code.

Lack of imagination

Adding a figure at the end of a variable name, 0, 1, 2, 3 (sometimes up to 22), or the first letters of the alphabet, a, b, c, is the most common way to create a new name of variable. It’s also the least creative way.

This happens often when a SQL query must be built on top of a previous one, then usually there is a $res and $req for the first query, and then, a $req2 and $res2 for the second one. In such case, splitting the code in two methods will make it clearer and safer.

Plural and singular

Lots of variables get an extra ‘s’ when they are an array of similar objects. $objects tends to be an array of « $object »s. Or foreach($letters as $letter) {… ; Those are usually easy enough to understand as long as the code is short enough.

Sometimes, this is made into a coding standard rules. Two specific problems arise from it : exceptions in the supporting language, like $index and $indexs ($indices), $axis and $axiss, or $datum and $data (and, yes, $datas is wrong on several levels) ; the second problem arises when ‘s’ is added inside the variable : $fieldsDataByKey, $fieldDataByKey.

Grammar variation

Similar to the previous rule, there are a number of variations based on grammar. Notably, when both the infinitive and past participle are in the same method : $cache and $cached, $find and $found. Or, $parser and $parsed, which share the same root. Here, grammar provides a significative distinction between the two meaning, but keep the words very close.

In English, this may affect verbs, but other languages may have variations for other type of words : in French, gender and number variations lead to $fort, $forte, $forts and $fortes (‘strong’, male, female, and then in plural form).

Close words

$data and $date, $uid and $gid, $max and $map, $excerpt and $except are confusing in plain English. Though, they may not be confusing in other languages, they may still be confusing outside a specific matter. $ipv4 and $ipv6, $uri and $url are common in web context. Those pair should be white-listed, context by context.

Real problems

Finally, one letter difference leads to real problems. Here are some of situations that were detected and lead to fixing undetected bugs in the code :

  • $artlist / $attlist
  • $this / $tis
  • $data / $dat
  • $file_name / $filename / $fileName
  • $wheream / $whereami
  • Reporting confusing variables names

Confusing names is a new analyzer available in Exakat 0.5.2. Other analysis target variables with short names, variables used once or written only. All of them should be reviewed carefully for bugs.