[concurrency-interest] A new Lock implementation: FileLock

Gregg Wonderly gregg at cytetech.com
Tue Aug 30 17:41:35 EDT 2005



Dawid Kurzyniec wrote:
> This is very interesting, but doesn't change the picture much. All it 
> says is that a process can be able to restore its persistent storage to 
> a consistent state when restarted after crash, by utilizing the fact 
> that filesystem operations are idempotent.

I said persistent, not stored on a filesystem.  It might be a filesystem, but it might be something else.

 > However, the Jini transaction
> API does not help a squad in achieving that. It makes state repair after 
> crash a sole responsibility of a transaction participant.

Yes, but because it's already part of the system design, users don't have to invent it.  So, it's a value added by the 
system.  That's what I am trying to make the point of.  Not that it waters the plants and lets the dog out.  It already 
does all the software things that you'd need the software to do.

>> It is this specific behavior, encapsulated into the API and the 
>> functionality of the transaction manager, that provides the power to 
>> recover from a stuck lock by just restarting the failed process.  
>> Rather than the actions being a perpherial activity of the 
>> participant, as you describe above, it is an integral part of the 
>> API.  That's the stuff that you don't have to reinvent, rediscover or 
>> otherwise stumble through.
>>
> As I said above, the application programmer has to implement the 
> recovery herself in both cases, and the code would look exactly the 
> same. 

For any transactional system, yes you'd have to do all of this stuff.  The point is that its already described, 
documented and designed so that you just have to implement the pieces.  That's the value added.  The fact that there is 
a public API on top is an additional value added.  Once your code knows how to use the Jini transaction manager service 
interface, you can use any such compliant manager that might come into existance right?

 > (And what "provides the power" is not a transaction manager but
> the filesystem idempotency). The only reason I might want to adhere to a 
> distributed transaction API when coding that is if I was making it a 
> part of a distributed transaction. I still can't see how it would do me 
> any good, and even how to go about it, if I was implementing a singleton 
> system service or a mail client. The TX API is about coordinating 
> changes of private (persistent) states between distributed collaborants. 
> File locking is about synchronizing access to a shared, common state. 
> Where's the connection?...

A lock's state is always transactionally managed.  You decide that you are ready for the lock to be locked or unlocked 
at specific places.  try { } finally {} is one such fine grained transactional approach where a particular outcome is 
guarenteed without other intervening issues, to proceed to the desired conclusion.  So, when you share a transaction, 
you can use the commit vote as a stepping point.  I would implement a distributed lock with transactions using this 
algorithm

TransactionParticipant p = new MyLocalParticipant();
Transaction.Created mytrans = trmgr.create(Lease.FOREVER);
LeaseRenewalManager lrm = new LeaseRenewalManager();

MyLeaseListener leaselis = new MyLeaseListener( mytrans );
// we pass a lease listener here that might can help mitigate the
// failure of the transaction manager.  When it detects the lease
// state changing, then it can do interesting things.
lrm.renewFor( mytrans.lease, Lease.FOREVER, leaselis );

DistributedLock lock = srvr.getLockAccess("TheWellKnownLockName");
Transaction otherTrans;

while( true ) {
	if( leaselis.transactionValid() == false ) {
		doSomethingInteresting();
	}

	// Attempt the lock with our transaction
	otherTrans = lock.testAndSet( mytrans );

	// Check if we got the lock, or someone else did
	if( otherTrans.equals(mytrans) == false ) {
		// Someone elses lock, join their transaction
		// and wait for it to complete.
		try {
			otherTrans.join( p );
			// if we care about the outcome, then there
			// is some logic that goes here to manage what
			// happens next.  If we don't care about
			// the outcome, then we can just wait for
			// some type of completion and continue.
			p.waitForCompleted();
		} catch( IOException ex ) {
			logException(ex);
		}
	} else {
		// we got the lock, do our work.
		doLockWork();

		// Now, commit the transaction and release the lock.
		try {
			mytrans.commit();
		} catch( RemoteException ex ) {
			logException(ex);
		} finally {
			lrm.cancel(mytrans.lease);
			// Pass in the owning transaction to make sure we release only
			// the correct lock, not another that occured because of network
			// segmentation or other bugs.
			lock.release( mytrans );
		}
		break;
	}
}

I think this is a familar algorithm.  The issue is that there are some needs for handling RemoteExceptions and some 
other related issues that make the API feel heavier.  But, it's the same logic with the same ordering and outcome 
potentials.

My view is that we need to research how to encapsulate all this knowledge into less work for the user.  I think that 
would be a much better choice than having 5 variations for 5 different weights of appication needs.  A common, singular 
API with service provider plugability and other powerful mechanisms allows a single focus on getting work done instead 
of reinventing all the pieces that are needed for each type of application.

I've always found it easier to dummy out the activities of an all encompassing API then try and figure out how to expand 
the capabilities of a limited API to do more than it was designed to to.

A simple example might help you understand my point.  Look at what had to happen with JMX in order to implement 
remoting.  It was designed originally for no-remoting from the point of view of not talking RMI use into account in the 
set of thrown exceptions on all methods.

To solve the problem, the original MBeanServer interface was defined to extend the MBeanServerConnection super interface 
that provided remoting.  This was a fairly simple mechanism.  But, it required all JMX users that wanted remoting to 
change source code to use MBeanServerConnection instead of MBeanServer everywhere.  And then, they now had to deal with 
IOException everywhere.  That's a big impact on a lot of applications.

This is the type of thing that I'm trying to convey.  You can always start simple, but when you're done, is it still 
simple?  Hopefully it is, but there's a very small class of interprocess issues that are simple to solve.

Gregg Wonderly


More information about the Concurrency-interest mailing list