[concurrency-interest] A new Lock implementation: FileLock
gregg at cytetech.com
Tue Aug 30 14:30:48 EDT 2005
Dawid Kurzyniec wrote:
>> Advisory file locking is straight forward to use. When we used
>> link(2) to create advisory locks in unix, we got into problems with
>> stuck locks when processes crashed. (...)
> Again, there is no way around that. At some level, stuff gets written to
> the filesystem. If the accessor service, or a local resource manager
> involved in a distributed transaction, crashes during a filesystem
> operation, it leaves inconsistent data. If the data is left "locked",
> you have a stale lock - similarly in behavior to Thread.suspend in Java.
> But if you release the lock, you leave unprotected inconsistent data -
> like Thread.stop does. In either case, you need some error-recovery
> procedure; just restarting the crashed process won't suffice. At some
> level, you simply have to deal with that, possibly resorting to human
> intervention, because file systems are not transactional. File locks and
> renaming files as a "commit" are as good as it gets.
In the Jini transaction specification there is a section on recovery after crash. What is described is that a
transaction participant is not supposed to respond to the commit request, until it has persisted enough information to
know how to fully recover to the state it was in prior to the crash. If it votes commit, and crashes during the
activities of the commit, it has to be prepared to recover to a commited state. If it refuses the commit, it has to be
prepared to roll back on restart.
The intermediate case where the transaction has not been voted on yet, requires some careful attention to details.
You need to keep a crash count, and increment it when you restart. The API includes providing the crashcount to the
transaction manager so that when you rejoin the transaction, it can say, hey, you crashed, and we're already voting, or
hey you crashed, and we're still waiting to vote. The transaction manager can then respond to your join request with an
appropriate response, and the application can handle that response accordingly.
It is this specific behavior, encapsulated into the API and the functionality of the transaction manager, that provides
the power to recover from a stuck lock by just restarting the failed process. Rather than the actions being a
perpherial activity of the participant, as you describe above, it is an integral part of the API. That's the stuff that
you don't have to reinvent, rediscover or otherwise stumble through.
More information about the Concurrency-interest