[concurrency-interest] The cost of notifyAll and

Aleksandar Prokopec aleksandar.prokopec at gmail.com
Fri May 17 12:20:32 EDT 2013


Hello,

I have a question regarding the performance of `notifyAll` calls 
compared to entering and exiting monitors on the JVM.
Some background - recently there has been a lot of talk on the Scala 
mailing lists about changing the Scala lazy vals implementation
which is currently deadlock-prone in cases where there is no reason for 
it to be.
We have an alternative design that we are trying to evaluate in terms of 
correctness and performance.

The full discussion is available here:
https://groups.google.com/group/scala-internals/browse_thread/thread/702801329e64f11f

I illustrate these changes with the hand-written version of our lazy 
vals implementation.
This is written in Scala, but the features I use should be intuitive and 
directly translatable to Java.
This is a lazy val declaration:

   class LazyCell(x: Int) {
     lazy val value = 0
   }

This is what our compiler currently does for a `lazy val` declaration:

   final class LazySimCell(x: Int) {
     @volatile var bitmap_0: Boolean = false
     var value_0: Int = _
     private def value_lzycompute(): Int = {
       this.synchronized {
         if (bitmap_0) {
           value_0 = 0
           bitmap_0 = true
         }
       }
       value_0
     }
     def value = if (bitmap_0) value_0 else value_lzycompute()
   }

The problem with this design is that if a `lazy val` right hand side 
depends on `lazy val`s in different objects cyclically, but lazy vals 
dependencies themselves
do not form a cycle, then this can lead to deadlocks.

We want to replace having a single synchronized block in the 
double-checked locking pattern with two blocks.
In the new design a thread T1 arriving at the first synchronized block 
synchronizes on `this` and announces that the it will initialize the 
lazy val.
Subsequent readers of the lazy val then wait until they are notified.
T1 then computes the value.
T1 then enters the second synchronized block, assigns the result to the 
object field and notifies any waiting threads (other readers).

This is the newly proposed design, hand-written:

    final class LazySimCellVersion2(x: Int) {
     @volatile var bitmap_0: Byte = 0.toByte
     var value_0: Int = _
     private def value_lzycompute(): Int = {
       this.synchronized {
         if (bitmap_0 == 0.toByte) {
           bitmap_0 = 1.toByte
         } else {
           while (bitmap_0 == 1.toByte) {
             this.wait()
           }
           return value_0
         }
       }
       val result = 0
       this.synchronized {
         value_0 = result
         bitmap_0 = 2.toByte
         this.notifyAll()
       }
       value_0
     }
     def value = if (bitmap_0 == 2.toByte) value_0 else value_lzycompute()
   }

I have run microbenchmarks to compare the two designs in a non-contended 
mode as follows.
I measured the following:
- instantiating an object with a value
- instantiating an object and initializing its lazy value
- same thing, but with manually implemented lazy vals with boolean bitmaps
- same thing, but with byte bitmaps
- the proposed lazy value change with 2 synchronizations blocks, but 
without a notifying potential waiters
- same thing, but with a notify call
repeated 1000000 through 5000000 times, in steps of 1000000.
The instantiated object is assigned to a field each time, to discard 
potential VM optimizations.
The initialization code is composed of creating an integer literal `0` - 
I chose to minimum amount
of computation to avoid having the computation costs amortize the 
lazy-val-related initialization costs.
This should thus be the borderline case where the program does nothing 
else but instantiate objects and initialize their lazy vals.

My platform:
i7-2600, quad-core, hyperthreading, 3.4GHz
MacOS X 10.7.5
Java 7, update 4, 64-bit
Scala 2.10.1

The reports themselves are here:

http://lampwww.epfl.ch/~prokopec/lazyvals/report/

The repository with the executable microbenchmarks with the code that 
produced these:

https://github.com/axel22/lazy-val-bench
https://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/example/SyncBench.scala

In this benchmark, the 2 synchronized blocks design with a notify is 5 
times slower than the current lazy val implementation - and just 
adding/removing the line with the `notifyAll` call changes things 
considerably.

My question:
Is this expected? Is the cost of a `notifyAll` call expected to be this 
high?
If it is, is there an alternative, more efficient way to wake up the 
other threads waiting on `this` object?

Thank you,
Aleksandar Prokopec


  


--
Aleksandar Prokopec
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130517/fe701ad4/attachment.html>


More information about the Concurrency-interest mailing list