[concurrency-interest] Consistency guarantees for ConcurrentHashMap

Jorge Ortiz jorge.ortiz at gmail.com
Thu Jun 11 13:55:13 EDT 2009


Jim, Erkin, James,

I haven't written any code to test these behaviors. This is all purely a
thought experiment at this point. Perhaps I should provide more details.

For the first case, given the implementation of clear():

    public void clear() {
        for (int i = 0; i < segments.length; ++i)
            segments[i].clear();
    }

It's evident that this behavior is at best atomic per segment. There seem to
be no atomicity guarantees between segments. (I understand this is by
design.) Assume Thread B runs first, gets halfway through clear() (clears
half the segments), then suspends. Now Thread A runs, puts "first" in an
already-cleared segment and "second" in a yet-to-be-cleared segment.
Finally, Thread B resumes and clears the remaining segments, including the
one with "second" (but not the one with "first", as that segment had already
been cleared).

The second case, with iterators, follows similarly. Thread A puts "first".
Thread B creates an iterator, and upon creation the iterator's nextEntry is
set to "first". Thread A removes "first", puts "second". Thread B calls
next() on the iterator, receives "first" (because entries are immutable, and
the iterator already has a pointer to "first"'s entry), the iterator's
nextEntry is set to "second", calls next() again, receives "second".

Again, if my understanding is incorrect, please let me know.

Jim,

Thanks for the article on ReadWriteLocks. This information means they're
pretty unsuitable for the purpose I intended, especially since Scala still
needs to support Java 1.5.

I could write map, filter, etc based on weakly consistent iterators, but
weak consistency seems to be not much of a guarantee at all. In particular,
if my understanding of the second scenario I gave is correct, a weakly
consistent iterator can report the map to be in a state that it was never
actually in.

It's worth noting that my understanding of iterator's behavior is not
inconsistent with the explanation given in the Javadocs for keySet(),
values(), and entrySet():

"The view's iterator is a "weakly consistent" iterator that will never throw
ConcurrentModificationException, and guarantees to traverse elements as they
existed upon construction of the iterator, and may (but is not guaranteed
to) reflect any modifications subsequent to construction."

Notably, the iterator "may" reflect modification put("second"), but "won't
guarantee" that modification remove("first") (which happens-before
put("second"), but after the creation of the iterator) will also be
reflected.

I contrast, the Javadoc explanation for the entire ConcurrentHashMap class
is inaccurate, in my opinion:

"Similarly, Iterators and Enumerations return elements reflecting the state
of the hash table at some point at or since the creation of the
iterator/enumeration."

Thanks,

--j

On Thu, Jun 11, 2009 at 2:21 AM, Erkin <erkin.kanlioglu at googlemail.com>wrote:

> Hi Jorge
>
> Case 1) Map doesn't guarantee the order as it calculates hash and put data
> into appropriate segment. Clear method on the other hand works sequentially
> on segments and locks are on per segments. Please correct if I'm wrong.
>
> Case 2) Is it possible to post a test-code with jdk version to prove the
> case?
>
> Thanks
>
> On Thu, Jun 11, 2009 at 7:15 AM, Jorge Ortiz <jorge.ortiz at gmail.com>wrote:
>
>> Consider the following code:
>>
>>     ConcurrentHashMap<String, Boolean> map = new ...;
>>
>>     Thread a = new Thread {
>>         void run() {
>>             map.put("first", true);
>>             map.put("second", true);
>>         }
>>     };
>>
>>     Thread b = new Thread {
>>         void run() {
>>             map.clear();
>>         }
>>     };
>>
>>     a.start();
>>     b.start();
>>     a.join();
>>     b.join();
>>
>> I would expect that one of the following scenarios to be true (for the
>> contents of the map) after this code runs:
>>
>>     Map("first" -> true, "second" -> true)
>>     Map("second" -> true)
>>     Map()
>>
>> However, upon inspection of ConcurrentHashMap, it seems to me that the
>> following scenario might also be true:
>>
>>     Map("first" -> true) ???
>>
>> This seems surprising because "first" gets put before "second", so if
>> "second" is cleared, then surely "first" should be cleared too.
>>
>> Likewise, consider the following code:
>>
>>     ConcurrentHashMap<String, Boolean> map = new ...;
>>     List<String> myKeys = new ...;
>>
>>     Thread a = new Thread {
>>         void run() {
>>             map.put("first", true);
>>
>>             // more stuff
>>
>>             map.remove("first");
>>             map.put("second", true);
>>         }
>>     };
>>
>>     Thread b = new Thread {
>>         void run() {
>>             Set<String> keys = map.keySet();
>>             for (String key : keys) {
>>                 myKeys.add(key);
>>             }
>>         }
>>     };
>>
>>     a.start();
>>     b.start();
>>     a.join();
>>     b.join();
>>
>> I would expect one of the following scenarios to be true for the contents
>> of myKeys after this code has run:
>>
>>     List()
>>     List("first")
>>     List("second")
>>
>> However, upon closer inspection, it seems that the following scenario
>> might also be true:
>>
>>     List("first", "second") ???
>>
>> This is surprising because "second" is only ever put into the map after
>> "first" is removed. They should never be in the map simultaneously, but an
>> iterator might perceive them to be so.
>>
>> Are these interpretations of ConcurrentHashMap's behavior correct?
>>
>> I ask because I want to add concurrent collections to Scala's standard
>> library. Ideally, these could just be wrappers around concurrent Java
>> collections because 1) it saves me a lot of work and 2) it benefits from the
>> significant effort (and thought and real-world testing) that have gone into
>> j.u.concurrent.
>>
>> However, central to Scala's collections are higher-order functions like
>> "map" and "filter" that operate on an entire collection and return a new
>> collection with elements derived from the elements in the old collections.
>> These methods necessarily operate on an entire collection and are frequently
>> implemented in terms of iterators. Without strongly consistent iterators or
>> some other way to provide a consistent "snapshot" of ConcurrentHashMap,
>> "map" and "filter" are... well, pretty useless.
>>
>> If my interpretation of ConcurrentHashMap is correct, are there are other
>> concurrent collections that provide such a "snapshot" view?
>>
>> Could ConcurrentHashMap be wrapped with a ReadWriteLock, where
>> single-element operations (get, put, etc) acquire the read lock and
>> whole-collections operations (map, filter, clear, etc) acquire the write
>> lock? How badly would this hurt performance in the no-writers case?
>>
>> Thanks,
>>
>> --j
>>
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20090611/d6813d61/attachment.html>


More information about the Concurrency-interest mailing list