[concurrency-interest] A new Lock implementation: FileLock

Gregg Wonderly gregg at cytetech.com
Tue Aug 30 14:30:48 EDT 2005

Dawid Kurzyniec wrote:
>> Advisory file locking is straight forward to use.  When we used 
>> link(2) to create advisory locks in unix, we got into problems with 
>> stuck locks when processes crashed. (...) 
> Again, there is no way around that. At some level, stuff gets written to 
> the filesystem. If the accessor service, or a local resource manager 
> involved in a distributed transaction, crashes during a filesystem 
> operation, it leaves inconsistent data. If the data is left "locked", 
> you have a stale lock - similarly in behavior to Thread.suspend in Java. 
> But if you release the lock, you leave unprotected inconsistent data - 
> like Thread.stop does. In either case, you need some error-recovery 
> procedure; just restarting the crashed process won't suffice. At some 
> level, you simply have to deal with that, possibly resorting to human 
> intervention, because file systems are not transactional. File locks and 
> renaming files as a "commit" are as good as it gets.

In the Jini transaction specification there is a section on recovery after crash.  What is described is that a 
transaction participant is not supposed to respond to the commit request, until it has persisted enough information to 
know how to fully recover to the state it was in prior to the crash.  If it votes commit, and crashes during the 
activities of the commit, it has to be prepared to recover to a commited state.  If it refuses the commit, it has to be 
prepared to roll back on restart.

The intermediate case where the transaction has not been voted on yet, requires some careful attention to details.
You need to keep a crash count, and increment it when you restart.  The API includes providing the crashcount to the 
transaction manager so that when you rejoin the transaction, it can say, hey, you crashed, and we're already voting, or 
hey you crashed, and we're still waiting to vote.  The transaction manager can then respond to your join request with an 
appropriate response, and the application can handle that response accordingly.

It is this specific behavior, encapsulated into the API and the functionality of the transaction manager, that provides 
the power to recover from a stuck lock by just restarting the failed process.  Rather than the actions being a 
perpherial activity of the participant, as you describe above, it is an integral part of the API.  That's the stuff that 
you don't have to reinvent, rediscover or otherwise stumble through.

Gregg Wonderly

More information about the Concurrency-interest mailing list