Prevent multiple PHP scripts at the same time

Like everything, it all started from a simple problem : how to prevent multiple PHP scripts at the same time. And turned into an odyssey of learning, full of evil traps and inglorious victories. In the end, it works, that’s the most satisfying and it possibly matters to noone except me. But « the way is the goal », as said Confucius, so, I decided to share the various findings.

The origin

Exakat runs in command line, and it uses a graph database. The database is central to the processing, and it is crucial to avoid running several scripts at the same time : they will write over each other. So, the problem is simple : preventing several instances to run at the same time, on the database. In commandline, there is no web server that may serve as common place between scripts, sharing some memory and implementing a locking system. It requires to use another common ground : the system.

Using file-lock

Files could play that role. PHP offers flock(), in the standard functions, that implements a locking mechanism at the file level. This is convenient, and has been adapted to all platforms. The downside is that the flock() call waits until any previous lock has been released. This is not practical to warn a user of concurrent usage.

Using file as lock

Creating a file that could serve as a lock is a first-idea solution : the first script creates the file, then any subsequent script finds the file and abort operations. It’s a collaborative system.

<?php

if (file_exists('/tmp/exakat.lock')) {
  $fp = fopen('/tmp/exakat.lock', 'r+') ;
  doSomething() ;
  fclose($fp) ;
  unlink('/tmp/exakat.lock') ;
} else {
  print "Another script is running";
}

?>

The solution is simple. However, there is a ‘race condition’ : between the call to file_exists() and the fopen(), however small, there is time for another script to create the file too. This is also known as TOCTOU : ‘time of check, time of use’ problem. Granted, the time window must be a few milliseconds at best, but still.

If that window of uncertainty is not acceptable, there is an alternative : using directories. Files may be opened by two different processes, which happily overwrite each other. On the other hand, directories don’t have such problems : they may be created and removed, but they can’t be written. So, the best way to check the presence of a directory is to create one. If the creation fails, then the directory is already there. If the creation succeed, then, the directory was not there. This is a bit like merging file_exists() and fopen().

<?php

if (mkdir('/tmp/exakat.lock', 0700)) {
  doSomething() ;
  rmdir('/tmp/exakat.lock') ;
} else {
  print « Another script is running » ;
}

?>

This time, no TOCTOU, no race condition, and a light impact on the file system. So, we may move to the next question : what happens in case of crash ?

The previous solution relies on the creation AND suppression of the directory, when the script reaches the end. In case of a crash or interruption, this brilliant system will be left with a lot of garbage, including the lock directory. PHP is clever at cleaning such garbage, but the directory is out of its reach : in fact, this is wanted. So, when we’ll restart the process, we’ll hit a wall : « another script is already running ».

Crash-proof locking

So, crash-proof is now the new challenge. We need a system that cleans the garbage in case of crash, just like PHP does for everything else. register_shutdown_function() could cover the cases of exit(), die() and other exceptions in the code code, but it wouldn’t be enough for a crash or an interruption.

For those special conditions, we need something that is cleaned or released, even if PHP itself crashes. PHP does a good job at cleaning after itself, so there is actually plethora of solutions. For example, there is tmpfile(), which opens a temporary file. This can be used for any file manipulation, and the file will be removed as soon as the script terminates (though, I suspect a crash would leave some files in the tmp folder, but I may be too suspicious). Tmpfile() is not usable for locking, since it only give access to the file pointer and not to the name of the file in the system. Without such name, there is no way to coordinate several instances.

Semaphore locking

Another tool that is automatically released are semaphore. This is one of the core PHP extension, activated with a simple –enable-sysvsem. It works simply like that :

<?php

$key = ftok('path/to/file', $singleCharacter);
$semaphore = sem_get($key, 1);
if (sem_acquire($semaphore, 1) !== false) {  
  doSomething() ;
  sem_release($semaphore) ;
} else {
  print "Another process is running\n";
}

?>

ftok() converts a path and one character into an integer. __FILE__ is a good candidate, and any character is good too : they are handy when you need various locks on the same path. Then sem_get() prepares the semaphore. Therer are actually 4 arguments, and the fourth, omitted because defaulted, is aptly named ‘$auto-release’ : the semaphore is automatically released at the end of the script, whatever its fate. Then, sem_acquire() does the hard work of obtaining this semaphore. The second argument is ‘$nowait’ : without it, it will wait indefinitely, but with it, it returns immediately. In the example, the final sem_release() is actually too much.

Semaphore were made on purpose to set up some locking system, and they are the most adapted solution.

Too much compilation for Docker

The next snag I hit with this is when creating the Docker file. On a default php:7.1-cli image, the semaphore extension is not compiled. Even though it is a core and –enable extension, this is not available by default. So, I added the semaphore extension the PHP on the docker with the ad-hoc scripts and the size of the docker image doubled. Adding 400Mb, just to make four calls to sem_* and ftok() was a bit too much, so I started again the search for a light solution : one that could fit in a slim PHP.

Socket to lock

Without semaphore and files, what could be the next resource that PHP cleans automatically when it crashes ? Sockets. If PHP opens a socket and then crashes, the socket is closed. The idea is now to open a socket on a port, and if this port is already opened, then it is locked. PHP has a lot of possibilities to open sockets, with socket_bind() or fsockopen() : both are in core, but not in php:7.1-cli. One extension that can’t be removed is stream, and there is a good function there : stream_socket_server().

Finally, streams and sockets

This one is used to build a local server, and reserving a port. It uses an IP (0.0.0.0 will do) and an arbitrary port. We may avoid anything below 1024, and use one that is not reserved already (3306 for MySQL, 7474 for Neo4j, etc.) : there are a lot of them.

<?php

$stream = @stream_socket_server('tcp://0.0.0.0:7600', $errno, $errmg, STREAM_SERVER_BIND) ;

if ( $stream) {
   doSomething() ;
   fclose($stream) ;
} else {
  print "Another process is running\n";
}

?>

fclose() is actually the closing function for a stream socket, so whenever we can, we close cleanly. We can register that for shutdown, and when PHP crashes, the socket is automatically terminated and removed. The option STREAM_SERVER_BIND is used, as it does the port reservation, but do not listen : we don’t actually need to do any networking, so let’s not do too much.

Since stream_socket_server() report an error in case the port is not available, we have to hide it with @, or error_reporting(0). I hate using @, as it is slow and ugly and it kills kitten. It is required here, since hitting an error is part of the schema. So, I have to swallow my pride, and keep it.

Epilogue

This whole odyssey wouldn’t be finished without the final boss, the oger that viciously hit you when you least expect it, and requires great efforts to vanquish. Yet, it was waiting.

After preparing the above solution, I felt satisfied and moved the code in the Exakat engine. The refactoring went fast, as it was simple enough. And I immediately started to to test on several pieces of code : the first run went well, even when trying to start other analysis at various stage of the run. The second analysis stopped dead cold : « Another process is already running ».

That message seemed quite wrong, since there was the only one analysis running. I waited a bit, expecting the port to be released with some delay, but that was wrong. Port are a bit more difficult to check than files : netstat helped. And lsof too :

$ lsof -i :7500
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 40412 dams 3u IPv4 0xee75f0ba12d95641 0t0 UDP *:7500

So, the last analysis, with PID of 40412, actually kept the socket open. Killing the process solved it, but, of course, the problem came back at the next analysis. Something was not releasing the process. And, for some reason, it was « java » that hold it hostage. Now, that was strange !

The opened port was given to another process

In fact, « java » is part of exakat, since neo4j runs on Java. So, the culprit was not a total stranger. In fact, it is pretty simple : when exakat starts executing, it often restarts neo4j, to clean the database for the new tokens. It is faster to stop neo4j, remove the data dir, restart than to query-delete all the nodes. Exec() is used to do all of that.

Now, starting neo4j with exec() makes neo4j a child-process of the current process. When the latter dies or finishes, it releases all the resources, but it passes ownership of the opened port to its child. Neo4j is a database, that often stays on, even after the end of the analysis. And it is run by java. So, once exakat finishes, instead of releasing the socket, it gives it to neo4j, which stays on and keeps it opened. This prevent the next exakat to reserve it.

Finally, the problem resided in the link between the current process and the neo4j child process. If exakat opens the socket first, then restart neo4j, then the two are linked and neo4j will inherit the open socket. But if neo4j is restarted before the socket is open, then the link is not established, and the socket will be released when the scripts finished, while the database stays on. Once the order of priorities has been defined, it was easy to fix this in the code.