[concurrency-interest] [Fwd: Re: A Fast Wait-Free HashTable]

Chris Purcell chris.purcell.39 at gmail.com
Sat Feb 24 20:22:31 EST 2007


>>  (1) How is the algorithm wait-free? It seems to me that if thread A starts
>> trying to insert a key that hashes to, say, 0; and if another, much faster,
>> thread B inserts an infinite sequence of distinct keys that all hash to 0;
>> then it's perfectly possible for thread A to never succeed, being always
>> preempted in its next possible move by thread B.
>
>  When A fails because of some other write (that's the
> 'no-endless-spurious-failure-CAS' requirement), then A
> treats it as-if A succeeded but was immediately overwritten by B, and A can
> claim (invisible) success.

But in each case, the key B inserted only hashes to the same constant;
it is not the same key. (Indeed, since this is an open-addressed hash
table, one can easily construct an example where none of the keys even
hash to the same constant.) Hence none of B's operations can be
considered to have overwritten A's, and A must continue indefinitely.

>  Doug Adds:
>
>> It is obstruction-free, not necessarily wait-free
> I'm not the pro here, so What's the difference?

Wait-free means there exists an a priori bound on the number of steps
a thread must take to complete its *own* operation. Lock-free means
there exists an a priori bound on the number of steps a thread must
take before being assured that there has been *global* progress.
Obstruction-free means there exists an a priori bound on the number of
steps a thread executing in *isolation* must take before completing
its own operation. Wait-free is strictly stronger than lock-free, and
lock-free is strictly stronger than obstruction-free.

I believe your algorithm is lock-free, but wait-free seems like an
impossible goal for a fast hash table. Hence my interest in assessing
the claim!

>> (2) Is this algorithm significantly different...
>
>  Let's see:  no CAS-size > ptr-size requirement (hence no >64b CAS
> requirement on 64b pointers), no version...

Sorry, I can see that "significantly" there may have been
unintentionally offensive. So the contributions (setting aside the
wait-free issue) are:

 * Given a garbage collector, the version counter field is
unnecessary. This also appears to be true in the unmodified Martin and
Davis algorithm; I can't see how it was used. An important
optimization, as it means 32-bit CAS is sufficient (or 64-bit CAS on a
64-bit machine).

 * Key and value can safely be linked to separately, doubling the size
of the hash table but potentially reducing memory churn. I'm not sure
how this works out overall.

> readers do not need to realize a table-copy is in progress unless...

I think this is true of the Martin and Davis algorithm, too.

>  Rehash can be triggered via a number of ways, but if you use "table
> fullness" you need to maintain a table-full value - which is hard to do if
> 768 cpus are hammering away at the table size!  I am triggering off some
> thread-local state at the moment (too many reprobes on this attempt) but the
> heuristics are still in flux.

Yes, that's a tricky one. I'm looking forward to seeing how you solve it.

> 2. Given that all hash tables share a lot of  commonalities, how "different"
> is "significantly different" and who cares. Do you have a PDF version of
> this 10-yr-old paper?  I could only find the above PS version.

Again, apologies for offense, I was unsure if my initial appraisal of
the similarities was correct. Probably digging myself into a hole here
:)

>>  What did the test harness look like?
>  I copied Doug's - so "synthetic workload, choosing keys uniformly at random
> from a limited set" is exactly it.  In my case I purposely choose a small
> set of keys (1000) to force high levels of write conflicts during updates.

I've used Doug's workload in my experiments, too.

What random number generator do you use?

> There are indeed some tradeoffs in resizing and tombstone scans, that could
> stand better empirical investigation, but it's a work in progress so give
> poor Cliff a break!

No attack intended! Just trying to figure out what situations the
numbers cover, and where to be cautious.

As a suggestion for future investigation, perhaps a good worst-case
test would be to have each thread maintain a set of "owned" keys, K_o.
At each mutate step, the thread picks a key kd from K_o, and does a
delete operation on that key in the table; it then picks a new key ki
uniformly at random from the (very large) keyspace, and inserts that
into the table. (At the end of the step, K_o := K_o - {kd} + {ki}.)
That ensures a bounded population together with a high creation rate
of tombstones, allowing the footprint/time tradeoff to be explored. I
wonder if a high rate of table replications will cause
disproportionately high synchronization overheads, especially on a
768-way?

Cheers,
Chris Purcell


More information about the Concurrency-interest mailing list