[concurrency-interest] ConcurrentSkipListMap performance

Nathan Reynolds nathan.reynolds at oracle.com
Mon Oct 24 15:58:16 EDT 2011

There are a lot of important details going on here but I am not an 
expert.  I just know of a few of them.  These details make 
microbenchmarking very difficult.  I would like to hear of other ideas.

JVM warm-up can be tricky.  For example, HotSpot with the -server option 
has 2 optimization passes.  The first pass happens at 1,000 invocations 
or iterations of a loop.  The second pass happens at 10,000 (default) 
invocations or iterations of a loop.  The first pass causes the 
method(s) to be executed as native processor instructions instead of 
being interpreted Java bytecode.  This first pass adds profiling code.  
The second pass takes the profiling results and generates a fully 
optimized method or loop.  This time there isn't any profiling code and 
full speed is achieved.  The optimized code isn't executed until the 
thread exits the loop or method and then re-enters.  Note:  HotSpot does 
have an optimizer logging feature and can be used to determine if the 
optimizer is finished (i.e. by lack of further output).

The test is based on executing a fixed number of operations.  This makes 
it difficult to spot OS scheduler influences.  It also makes it 
difficult to account for ramp up and ramp down.  Both ends of the test 
will not have 300 threads trying to execute at the same time.  Instead, 
you might see some threads finishing within the first few milliseconds 
and the rest of the threads taking longer.  Then, at the end of the 
test, threads will finish and reduce the amount of concurrency.  
Instead, it is better to wait for all of the threads to start running 
and then set a flag to cause all of the threads to reset their 
counters.  Then, run the test for X minutes and set a "quit" flag.  All 
of the threads stop executing and report their counters as results.

In order to reduce noise, the threads and JVM process should be run at 
the highest CPU, disk and network priorities allowed by the OS.  
Background processes (including your own interaction) with the 
environment can skew the results.  Each test should be run multiple 
times (10 is a minimum).  Statistics on the results should be calculated 
to ensure that the results are significantly different.

Multiple passes should be done with a various number of threads.  The 
throughput graph can reveal other significant information with the test 
harness and the code.  It could be that TreeMap performs better at some 
concurrency levels than others.

A full GC should be forced before the test happens so that each test 
pass starts at the same memory usage.  This can be done by calling 
"System.gc()" in a loop and monitoring the GarbageCollectorMXBean.  The 
collection count should increase before the loop exits.

Nathan Reynolds 
<http://psr.us.oracle.com/wiki/index.php/User:Nathan_Reynolds> | 
Consulting Member of Technical Staff | 602.333.9091
Oracle PSR Engineering <http://psr.us.oracle.com/> | Server Technology

On 10/23/2011 10:52 PM, Ben Manes wrote:
> A quick glance at the benchmark [1] indicates that there are a few 
> flaws, such as:
> - the JVM warm-up penalty is being accrued by the CSLM run
> - Use thread.yield() to end a request, as otherwise the active thread 
> may be able to run through its time slice without incurring much lock 
> contention.
> [1] http://osdir.com/ml/attachments/txtBAixyCerpa.txt
> ------------------------------------------------------------------------
> *From:* Ben Manes <ben_manes at yahoo.com>
> *To:* Ted Yu <ted_yu at yahoo.com>
> *Cc:* "concurrency-interest at cs.oswego.edu" 
> <concurrency-interest at cs.oswego.edu>
> *Sent:* Sunday, October 23, 2011 6:33 PM
> *Subject:* Re: [concurrency-interest] ConcurrentSkipListMap performance
> See the file headers. The Skiptree implementation is public domain. 
> The Snaptree is copyrighted by Stanford University 
> (http://ppl.stanford.edu/) and should fall under their policy for 
> academic research. You can contact the author or the lab for licensing 
> details.
> ------------------------------------------------------------------------
> *From:* Ted Yu <ted_yu at yahoo.com>
> *To:* Ben Manes <ben_manes at yahoo.com>
> *Cc:* "concurrency-interest at cs.oswego.edu" 
> <concurrency-interest at cs.oswego.edu>
> *Sent:* Sunday, October 23, 2011 5:24 PM
> *Subject:* Re: [concurrency-interest] ConcurrentSkipListMap performance
> Is either or both of the following compatible with Apache license 2.0 ?
> Thanks
> On Oct 23, 2011, at 12:58 PM, Ben Manes <ben_manes at yahoo.com 
> <mailto:ben_manes at yahoo.com>> wrote:
>> There's not enough information to diagnose the cause, but you may 
>> want to try one of the CSLM replacements.
>> https://github.com/mspiegel/lockfreeskiptree
>> https://github.com/nbronson/snaptree
>> ------------------------------------------------------------------------
>> *From:* Ted Yu <ted_yu at yahoo.com <mailto:ted_yu at yahoo.com>>
>> *To:* "concurrency-interest at cs.oswego.edu 
>> <mailto:concurrency-interest at cs.oswego.edu>" 
>> <concurrency-interest at cs.oswego.edu 
>> <mailto:concurrency-interest at cs.oswego.edu>>
>> *Sent:* Sunday, October 23, 2011 12:01 PM
>> *Subject:* [concurrency-interest] ConcurrentSkipListMap performance
>> HBase expects ConcurrentSkipListMap to have good performance.
>> See the following discussion.
>> Comments are welcome.
>> ---------- Forwarded message ----------
>> From: *Akash Ashok* <thehellmaker at gmail.com 
>> <mailto:thehellmaker at gmail.com>>
>> Date: Sun, Oct 23, 2011 at 2:57 AM
>> Subject: Re: SILT - nice keyvalue store paper
>> To: dev at hbase.apache.org <mailto:dev at hbase.apache.org>
>> I was running some similar tests and came across a surprising 
>> finding. I compared reads and write on ConcurrentSkipListMap ( which 
>> the memstore uses) and synchronized TreeMap ( Which was literally 
>> treemap synchronized). Executed concurrent reads, writes and deletes 
>> on both of them.
>> Surprisingly synchronized treeMap performed better, though just 
>> slightly better, than ConcurrentSkipListMap which KeyValueSkipListSet 
>> uses.
>> Here are the output of a few runs
>> Sometimes the difference was considerable
>> Using HBaseMap it took 20438ms
>> Using TreeMap it took 11613ms
>> Time Difference:8825ms
>> And sometimes the difference was negligible
>> Using HBaseMap it took 13370ms
>> Using TreeMap it took 9482ms
>> Time Difference:3888ms
>> I've attaching the test  java file which I wrote to test it.
>> This might be a very minor differece but still surprising considering 
>> the fact that ConcurrentSkipListMap uses fancy 2 level indexes which 
>> they say improves the deletion performance.
>> And here are the details about the test run.
>> 100 Threads each fetching 10,000 records
>> 100 threads each adding 10,000 records.
>> 100 threads each deletin 10,000 records
>> ( Reads, Writes and deletes simultaneously )
>> Cheers,
>> Akash A
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu 
>> <mailto:Concurrency-interest at cs.oswego.edu>
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu 
> <mailto:Concurrency-interest at cs.oswego.edu>
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20111024/223124d8/attachment-0001.html>

More information about the Concurrency-interest mailing list