Refactoring strings to enumsRefactoring Strings to Enums

This article describes the journey of refactoring a not-so-old piece of code from using strings to using enumerations.

Original Reasons

When Exakat was started in 2014, enumerations were not even on the radar. Strings, and sometimes constants, were the obvious solution. As the original concept evolved quickly, strings were used everywhere, and they naturally remained in place.

Modernizing PHP Code

Converting those strings to constants was considered several times, but never implemented. Writing VARIABLE or Atom::VARIABLE instead of 'Variable' felt mostly cosmetic. Readability might have improved slightly, but even that was debatable.

More importantly, there was no way to enforce the use of constants. PHP replaces constants with their values, so both foo('Variable') and foo(Atom::VARIABLE) remain valid. Only static analysis tools could reliably detect misuse.

Smaller Footprint

Enumerations change this dynamic. Instead of crowding classes with constants, enum cases live in their own dedicated structure. More importantly, enums introduce typing: Atomname::VARIABLE is an object of type Atomname, while 'Variable' is just a string. Correct values are now enforced at the engine level.

There was also an expectation of performance improvements. Enumeration cases are single instances reused across the application, while string literals are repeated. In practice, however, I never managed to produce a meaningful benchmark showing a real speed gain. In fact, in very large loops, strings may even be slightly faster.

Easier Analysis

Writing a static analysis tool also gives a particular perspective. Strings represent about 11% of the source code in Exakat. Comparable figures appear in other projects: WordPress (~13%), Typecho (~14%), Laravel (~35%), Leaf (~8%), Contao (~10%), and Slim (~2%).

By comparison, class constants and enum cases exist in far smaller numbers, making them faster to process. Beyond quantity, constants and enum cases are easier and safer to locate and analyze than strings.

The strings targeted by this refactor represent only a fraction of all strings in the codebase. Previously, all strings were analyzed together: array indexes, values, interpolated fragments, default arguments, and so on. By isolating atom names into a dedicated enum, analysis becomes both faster and more precise.

Before moving to the refactor itself, it is worth emphasizing that static analysis tools should adapt to the code, not the other way around. Still, modernizing code can make tooling more efficient. It is not mandatory, but it is certainly beneficial.

Refactoring to Enums

The Obvious Parts

What Was Refactored?

Among the many strings in the source code, atom names stood out as ideal candidates.

PHP converts source code into tokens, removing whitespace and structural tokens such as () or {}. Roughly one third of the remaining tokens are used to rebuild the AST. In Exakat, these tokens are called atoms, each representing a unit of information. Examples include Class, Interface, Addition, Power, Shell, File, Void, and Sequence.

There are currently 132 atom names. The list is known at startup, and if one is missing, it usually means work has already started toward supporting a newer PHP version.

A backed string enum was therefore an obvious choice. To minimize disruption, the existing naming convention, Uppercase First, was preserved.

<?php

enum Atomname: string {
    case Addition = 'Addition';
    case Array = 'Array';
    case Arrayappend = 'Arrayappend';
    case Arrayliteral = 'Arrayliteral';
    case Arrowfunction = 'Arrowfunction';
    // ...
}
?>

Cleaning Previous Definitions

There was nothing to clean up: the values were literals, not constants. So, no classes lost 132 constants. That was a pleasant surprise.

Handling a previosu migration from literal strings to constants is left to the reader. One may replace such constants with their enumeration, or also, replace the value in the constant by the enumeration case. Fun and joy, working with the code.

Code Updates

Most updates consisted of straightforward replacements. Not exciting, but effective. In total, slightly over five hundred string replacements were performed.

<?php

// before
} elseif ($name->isA(array('Static', 'Self'))) {

// after
} elseif ($name->isA(array(Atomname::Static, Atomname::Self))) {

?>

A few type declarations were also updated. In practice, only one property and a small number of related methods ended up using the enumeration directly.

The Less Obvious Parts

This is where the unexpected appeared. Some issues were anticipated, but the real surprises were, by definition, the ones that were not.

Not All Strings Should Be Converted

This was expected from the beginning, and in fact one of the goals: separating atom names from all other strings.

Replacing patterns such as /^[A-Z][a-z]+$/ with enum cases would have been far too broad. It would also match unrelated strings such as Damien or Seguy, which are clearly not AST elements.

Even legitimate-looking words occasionally belonged to different contexts. Those cases were handled manually, or detected through static analysis and tests.

Finding Non-Existent Cases

One major benefit of enums is that they form a closed set. Any value outside that set becomes an error.

This immediately revealed several outdated cases in large switch statements. Some atom names had been renamed long ago, while the old string values remained silently in the code, doing nothing except consuming a few CPU cycles.

<?php
switch ($atom->name) {
    case 'Bitshift':     // valid name
    case 'Bitoperation': // old, unused name
}
?>

Such silent dead code is a long-term problem. Over time, it increases complexity in handling, reviews, testing, and even performance discussions.

Removing default in switch or match?

Another side effect is that static analyzers now complain about default branches. With enums, all possible cases are known, making default theoretically unnecessary.

<?php
enum E {
   case A;
   case B;
}
?>

In practice, I kept the default branch. The enum may grow in the future, and new cases would otherwise remain unhandled. The default is therefore a safeguard for future evolution, not for current logic. Static analysis will still help identify missing cases when the time comes.

The Edge Case of Class

PHP had one surprise left. Converting ‘Class’ to Atomname::Class collided with the special ::class keyword used to retrieve fully qualified class names.

Because class is reserved and case-insensitive, the enum case could not compile. The solution was to rename it to Atomname::_Class. Naming exceptions are never pleasant, but sometimes unavoidable.

Static analysis helped detect incorrect comparisons during this transition, as strings and enum objects cannot be compared directly.

Dynamic String Processing vs Enumerations

Some atom names were previously manipulated as strings. For example, building new atom names through concatenation:

<?php
$newAtom = $atom->name . 'definition';
?>

This approach does not translate well to enums. Fortunately, such cases were rare and were replaced with match() expressions.

Exporting Enumeration Cases

Exporting enums means converting them back to strings when data leaves the application. In this case, that mainly meant storing data in the graph database used by Exakat.

Because the enum is backed, each case already provides its string representation through the value property:

<?php
echo $atom->name->value;
?>

Enums do not support __toString() or direct casting, so this explicit access is required.

Importing Enumeration Cases

Importing deserves equal attention. Reading strings from storage normally requires converting them back into enum cases.

In this specific case, it does not matter much. Once data is stored in the database, it is mostly processed there. Atom names remain an internal concept and are not re-imported into PHP frequently. The situation would be different for user data, where validation becomes essential. This is an important consideration when deciding whether to move from strings to enums.

Final Thoughts

The conversion from strings to enumerations turned out to be far more eventful than expected, which probably explains why it had been postponed for so long.

One non-obvious difficulty was the mismatch between treating strings as indivisible tokens conceptually and actually enforcing that constraint through enums. Some parts of the code relied on dynamic string manipulation, which required rethinking rather than simple replacement.

The refactored code ended up about 2% larger. Lines became longer, and some calls had to be split to preserve readability. However, the refactor also revealed unused code and hidden bugs, a common side effect of large structural changes.

Ultimately, moving code forces a closer review of assumptions, and that alone tends to improve code quality.