Semantic typing

Semantic typing is an old practice, where the name of the parameter would also tell what its type is. It is typing, because a $string is supposed to be typed, and it is semantic, because only the human reader is actually using the meaning : PHP doesn’t really care.

The interesting part is that the practice of semantic typing still exists nowadays. Obviously, it has taken the backseat to actual typing, for one good reason : naming a parameter with its type is akin to passive documentation, and those who read documentation are too rare.

Funny typing

Funny typing happens when a parameter has a type name, but its type is different. For example,

<?php

function foo(string $array) { /**/ }

?>

While this is totally legit PHP code, it is also quite weird. Who would call their parameter with one type, yet type it with another one?

As usual, when there is a mean, there is a will. So, I ran an audit over 2700+ PHP open source projects, and collected stats about parameters with a scalar name ($string, $int…), and check their related type. Not PHPDoc type, but an actual associated type.

For example, $string is typed string over 98.3% of the time, but sometimes, just sometimes, it is also typed array (1.6%) or bool(0.1%). Interestingly, a $string is never a float.

All other scalar types have the same behavior : they usually bear their eponymous type ($array is most often an array), but they also carry a different type, such as bool or string.

Interestingly, a $float is never an array, while a $bool is never an array (the opposite is not true). Apparently, there are subjective limits to stretching types.

This has been made an exakat rule, with extension to properties. If you are afraid your code might sport such typo, you can run an audit.

Common parameter names

While reviewing those funny typed parameters, I also extracted the most common names for typed parameters. Here are the first 100:

  1. $postBody
  2. $event
  3. $request
  4. $command
  5. $node
  6. $config
  7. $query
  8. $subject
  9. $parent
  10. $item
  11. $requestBody
  12. $result
  13. $context
  14. $value
  15. $entity
  16. $object
  17. $a
  18. $type
  19. $user
  20. $factory
  21. $model
  22. $b
  23. $data
  24. $message
  25. $source
  26. $options
  27. $e
  28. $repository
  29. $response
  30. $client
  31. $field
  32. $other
  33. $manager
  34. $filter
  35. $warning
  36. $container
  37. $collection
  38. $provider
  39. $service
  40. $cache
  41. $extensionAttributes
  42. $file
  43. $action
  44. $element
  45. $dao
  46. $c
  47. $handler
  48. $image
  49. $parser
  50. $validator
  51. $configuration
  52. $resource
  53. $params
  54. $builder
  55. $exception
  56. $services
  57. $target
  58. $metadata
  59. $storage
  60. $connection
  61. $id
  62. $component
  63. $form
  64. $child
  65. $req
  66. $loader
  67. $status
  68. $logger
  69. $entry
  70. $token
  71. $page
  72. $group
  73. $inst
  74. $definition
  75. $document
  76. $input
  77. $template
  78. $table
  79. $error
  80. $rule
  81. $settings
  82. $generator
  83. $registry
  84. $class
  85. $stmt
  86. $repo
  87. $from
  88. $key
  89. $formatter
  90. $reader
  91. $resolver
  92. $category
  93. $controller
  94. $instance
  95. $property
  96. $expected
  97. $helper
  98. $n
  99. $session
  100. $name

$a, $b, $c, $n, and $e are the most common one letter name. $postBody is the most common parameter name of all, though being typed does help its ranking : it is not the most common parameter name. Note also that $requestBody is ranking high too.

Later, $repository and $repo are both quite often used, and representing the same reality : it’s just that the last one is shorter than the former. Also, quite some vague names, such as $params, $value, $data, $message or $source are used, and type.

Common varied types

Once type is added to a parameter, there is now a new couple in town : the parameter name, with its meaning, and the type itself. As such, it is interesting to look at 2 populations of typed parameters : the one that get a lot of different types, and the one that gets always the same type.

Always typed the same

When a parameter gets the same name and type, across 100 method definitions or more, you can expect semantic typing to be at the root of the behavior: everyone recognize that value, and how it should be represented.

Take a look at the list below, which shows the name of a method parameter, and its expected type : can you guess what is that type, simply reading the variable name?

Is it obvious that $weak should be a boolean?

  1. $allWords (\bool)
  2. $sqlWalker (\doctrine\orm\query\sqlwalker)
  3. $isRoot (\bool)
  4. $replaceextrasymbols (\bool)
  5. $weak (\bool)
  6. $isLower (\bool)
  7. $scriptProperties (\array)
  8. $pathinfo (\string)
  9. $use_transliterate (\bool)
  10. $isVariadic (\bool)
  11. $altNumbers (\bool)
  12. $fkConstraint (\doctrine\dbal\schema\foreignkeyconstraint)
  13. $asOrigReplaceArray (\bool)
  14. $prenormalizeds (\array)
  15. $savePath (\string)
  16. $sessionName (\string)
  17. $httpsPort (\int)
  18. $useAttachment (\bool)
  19. $useShortAttachment (\bool)
  20. $showArguments (\bool)
  21. $httpPort (\int)
  22. $rdata (\array)
  23. $internalErrors (\bool)
  24. $emailLexer (\egulias\emailvalidator\emaillexer)
  25. $arrayAdapter (\symfony\component\cache\adapter\arrayadapter)
  26. $other_keys (\array)
  27. $other_members (\array)
  28. $codePaths (\array)
  29. $enableIfStandalone (\callable)
  30. $extra_args (\array)
  31. $localVault (\symfony\bundle\frameworkbundle\secrets\abstractvault)
  32. $other_values (\array)
  33. $other_args (\array)
  34. $reverseContainer (\symfony\component\dependencyinjection\reversecontainer)
  35. $testMethod (\string)
  36. $wrappedDumper (\symfony\component\vardumper\dumper\datadumperinterface)
  37. $storageKey (\string)
  38. $transportName (\string)
  39. $preloaded (\array)
  40. $hasChild (\bool)
  41. $inlineServices (\array)
  42. $invalidBehavior (\int)
  43. $isNested (\bool)
  44. $byConstructor (\bool)
  45. $lille (\symfony\component\dependencyinjection\tests\compiler\lille)
  46. $maxlifetime (\int)
  47. $maxItemsPerDepth (\int)
  48. $metaBag (\symfony\component\httpfoundation\session\storage\metadatabag)
  49. $cloneArguments (\bool)
  50. $realInstantiator (\callable)
  51. $callAutoload (\bool)
  52. $dumpKeys (\bool)
  53. $callOriginalConstructor (\bool)
  54. $callOriginalClone (\bool)
  55. $hashedPassword (\string)
  56. $utf8 (\bool)
  57. $srcContext (\int)
  58. $sessionOptions (\array)
  59. $keepArgs (\bool)
  60. $autoEtag (\bool)
  61. $autoLastModified (\bool)
  62. $endOfValue (\bool)
  63. $returnResult (\bool)
  64. $isConstructorArgument (\bool)
  65. $includeContextAndExtra (\bool)
  66. $isMatch (\bool)
  67. $pathSeparator (\string)
  68. $noBuiltin (\bool)
  69. $remoteAddr (\string)
  70. $requestUid (\string)
  71. $rightTrimString (\bool)
  72. $refs (\array)
  73. $convertEmptyStringToNull (\bool)
  74. $vault (\symfony\bundle\frameworkbundle\secrets\abstractvault)
  75. $getEnv (\closure)
  76. $dbi (\phpmyadmin\databaseinterface)
  77. $dbForProject (\utopia\database\database)
  78. $willBeAvailable (\callable)
  79. $joinPoint (\neos\flow\aop\joinpointinterface)
  80. $watcherId (\string)
  81. $baseApiUri (\oauth\common\http\uri\uriinterface)
  82. $bookSlug (\string)
  83. $betterNodeFinder (\rector\core\phpparser\node\betternodefinder)
  84. $nodeNameResolver (\rector\nodenameresolver\nodenameresolver)
  85. $nodeTypeResolver (\rector\nodetyperesolver\nodetyperesolver)
  86. $phpDocInfo (\rector\betterphpdocparser\phpdocinfo\phpdocinfo)
  87. $phpDocInfoFactory (\rector\betterphpdocparser\phpdocinfo\phpdocinfofactory)
  88. $reflectionResolver (\rector\core\reflection\reflectionresolver)
  89. $typeKind (\string)
  90. $aliased_classes (\array)
  91. $authComponent (\authcomponent)
  92. $suppressed_issues (\array)
  93. $uniqueName (\string)
  94. $handler_id (\string)
  95. $a_adt (\iladt)
  96. $default_renderer (\ilias\ui\renderer)
  97. $coreRegistry (\magento\framework\registry)
  98. $fetchStrategy (\magento\framework\data\collection\db\fetchstrategyinterface)
  99. $moduleDataSetup (\magento\framework\setup\moduledatasetupinterface)
  100. $telemetryInfo (\phpunit\event\telemetry\info)

Parameters with an ending ‘s’ usually leads to an array ($aliased_classes, $sessionOptions), when the parameter name is a noun. When the parameter name includes a verb, then it is a boolean ($cloneArguments, $dumpKeys).

boolean are related to intend, with usage of small words : $isAbsolute, $forConstructor, $noBuiltin). That way, $willBeAvailable stands as an exception, being a callable.

string covers a lot of nouns : $handler_id, $uniqueName, $bookSlug, $requestUid, $hashedPassword (for that last one, both password and hashed would also hint at string).

A total of 258 parameter names were detected.

You never know what is in there

On the other side of the spectrum, there are parameters which may be, well, basically anything. Some of them have been detected with over a thousand different types, across all their usages. Here is their ranking, by number of different type detected.

  1. $postBody (2352)
  2. $event (1654)
  3. $request (853)
  4. $command (717)
  5. $node (550)
  6. $config (462)
  7. $query (395)
  8. $subject (347)
  9. $parent (302)
  10. $item (300)
  11. $requestBody (300)
  12. $result (299)
  13. $context (294)
  14. $value (290)
  15. $entity (283)
  16. $object (270)
  17. $a (261)
  18. $type (256)
  19. $user (251)
  20. $factory (248)
  21. $model (247)
  22. $b (234)
  23. $data (232)
  24. $message (231)
  25. $source (231)
  26. $options (229)
  27. $e (212)
  28. $repository (211)
  29. $response (207)
  30. $client (198)
  31. $field (197)
  32. $other (191)
  33. $manager (186)
  34. $filter (172)
  35. $warning (171)
  36. $container (169)
  37. $provider (168)
  38. $collection (168)
  39. $service (163)
  40. $cache (162)
  41. $extensionAttributes (161)
  42. $file (154)
  43. $action (153)
  44. $element (152)
  45. $dao (152)
  46. $handler (150)
  47. $c (150)
  48. $image (149)
  49. $parser (146)
  50. $validator (145)
  51. $configuration (144)
  52. $resource (143)
  53. $params (141)
  54. $builder (140)
  55. $exception (137)
  56. $services (135)
  57. $target (134)
  58. $metadata (131)
  59. $storage (130)
  60. $connection (128)
  61. $id (124)
  62. $form (117)
  63. $component (117)
  64. $child (116)
  65. $req (113)
  66. $logger (111)
  67. $status (111)
  68. $loader (111)
  69. $token (110)
  70. $page (110)
  71. $entry (110)
  72. $group (108)
  73. $inst (108)
  74. $definition (107)
  75. $document (107)
  76. $input (106)
  77. $template (106)
  78. $table (103)
  79. $error (102)
  80. $rule (102)
  81. $settings (100)
  82. $generator (97)
  83. $registry (96)
  84. $class (95)
  85. $stmt (95)
  86. $repo (93)
  87. $from (93)
  88. $key (92)
  89. $reader (92)
  90. $resolver (92)
  91. $category (92)
  92. $formatter (92)
  93. $controller (88)
  94. $instance (88)
  95. $expected (88)
  96. $n (88)
  97. $helper (88)
  98. $property (88)
  99. $session (88)
  100. $name (85)

To reach 85 types, $name had to use more than just scalar types : string is expected (at least, by me), but many other classes and interfaces are used, to encapsulate what is a name. Since names are quite a common concept, used to distinguish people, services, brands, models, Debian versions, and else, it is a common that $name require disambiguation (dixit Wikipedia).

Such parameter names should be avoided: they are quite generic, and may raise questions like : how to I concatenate this name in a string? or other common obvious usage.

Indeed, some parameter names are very generic, and lead naturally to many types : $name, $param (sic), $instance, $collection, $factory, $entity, $other. $a, $b, $c, and $n are back in the list, just like $e : this last one is used for exceptions catching, which leads to many different types when propagated to other methods, as a parameter.

Naming, types and semantics

Semantic type establishes a direct relation between types and the words used to build a parameter name. This is an old practice, coming from the early ages of PHP : to keep the code readable, the type is ingrained in the variable name. This was an age with less features than today.

Nowadays, types have a life of their own, and yet, this antiquated behavior is still alive. It is possible to guess the type of a variable, simply by reading it aloud. It is also possible to recognize that another variable might have a very wide range of types, and should not be expected to be one or the other. Surely, after discovering which type is actually used, it will be important to read the rest of the documentation to know how to display a simple $path: it is not a string.

The most common types are definitely the scalar, which are native to every recent PHP version. This analysis covers a vast array of PHP projects, with various backgrounds, including underlying frameworks. Each of them introduce specific classes to support specific concepts, such as URL or names. It would be interesting to see how this semantic typing apply, depending on each communities.