[concurrency-interest] A new Lock implementation: FileLock

Dawid Kurzyniec dawidk at mathcs.emory.edu
Tue Aug 30 16:20:59 EDT 2005


Gregg Wonderly wrote:

>
>
> Dawid Kurzyniec wrote:
>
>>> Advisory file locking is straight forward to use.  When we used 
>>> link(2) to create advisory locks in unix, we got into problems with 
>>> stuck locks when processes crashed. (...) 
>>
>>
>> Again, there is no way around that. At some level, stuff gets written 
>> to the filesystem. If the accessor service, or a local resource 
>> manager involved in a distributed transaction, crashes during a 
>> filesystem operation, it leaves inconsistent data. If the data is 
>> left "locked", you have a stale lock - similarly in behavior to 
>> Thread.suspend in Java. But if you release the lock, you leave 
>> unprotected inconsistent data - like Thread.stop does. In either 
>> case, you need some error-recovery procedure; just restarting the 
>> crashed process won't suffice. At some level, you simply have to deal 
>> with that, possibly resorting to human intervention, because file 
>> systems are not transactional. File locks and renaming files as a 
>> "commit" are as good as it gets.
>
>
> In the Jini transaction specification there is a section on recovery 
> after crash.  What is described is that a transaction participant is 
> not supposed to respond to the commit request, until it has persisted 
> enough information to know how to fully recover to the state it was in 
> prior to the crash.  If it votes commit, and crashes during the 
> activities of the commit, it has to be prepared to recover to a 
> commited state.  If it refuses the commit, it has to be prepared to 
> roll back on restart.
>
> The intermediate case where the transaction has not been voted on yet, 
> requires some careful attention to details.
> You need to keep a crash count, and increment it when you restart.  
> The API includes providing the crashcount to the transaction manager 
> so that when you rejoin the transaction, it can say, hey, you crashed, 
> and we're already voting, or hey you crashed, and we're still waiting 
> to vote.  The transaction manager can then respond to your join 
> request with an appropriate response, and the application can handle 
> that response accordingly.
>
This is very interesting, but doesn't change the picture much. All it 
says is that a process can be able to restore its persistent storage to 
a consistent state when restarted after crash, by utilizing the fact 
that filesystem operations are idempotent. However, the Jini transaction 
API does not help a squad in achieving that. It makes state repair after 
crash a sole responsibility of a transaction participant. What's more, 
the whole scheme assumes that there is a reliable mechanism for 
detecting crashes and restarting processes. The "mechanism" can be a 
sysadmin, or yet another process, but that process can crash too, so 
eventually, the human supervision is needed.

> It is this specific behavior, encapsulated into the API and the 
> functionality of the transaction manager, that provides the power to 
> recover from a stuck lock by just restarting the failed process.  
> Rather than the actions being a perpherial activity of the 
> participant, as you describe above, it is an integral part of the 
> API.  That's the stuff that you don't have to reinvent, rediscover or 
> otherwise stumble through.
>
As I said above, the application programmer has to implement the 
recovery herself in both cases, and the code would look exactly the 
same. (And what "provides the power" is not a transaction manager but 
the filesystem idempotency). The only reason I might want to adhere to a 
distributed transaction API when coding that is if I was making it a 
part of a distributed transaction. I still can't see how it would do me 
any good, and even how to go about it, if I was implementing a singleton 
system service or a mail client. The TX API is about coordinating 
changes of private (persistent) states between distributed collaborants. 
File locking is about synchronizing access to a shared, common state. 
Where's the connection?...

Regards,
Dawid



More information about the Concurrency-interest mailing list