[concurrency-interest] The cost of notifyAll and

David Holmes davidcholmes at aapt.net.au
Fri May 17 22:05:08 EDT 2013


Very quick skim ..

You may be seeing the monitor inflation being attribuited to the notifyAll.
In normal use it would happen with the wait(), if not already inflated due
to contention on the monitor.

David Holmes
  -----Original Message-----
  From: concurrency-interest-bounces at cs.oswego.edu
[mailto:concurrency-interest-bounces at cs.oswego.edu]On Behalf Of Aleksandar
Prokopec
  Sent: Saturday, 18 May 2013 2:21 AM
  To: concurrency-interest at cs.oswego.edu
  Subject: [concurrency-interest] The cost of notifyAll and


  Hello,

  I have a question regarding the performance of `notifyAll` calls compared
to entering and exiting monitors on the JVM.
  Some background - recently there has been a lot of talk on the Scala
mailing lists about changing the Scala lazy vals implementation
  which is currently deadlock-prone in cases where there is no reason for it
to be.
  We have an alternative design that we are trying to evaluate in terms of
correctness and performance.

  The full discussion is available here:
  https://groups.google.com/group/scala-internals/browse_thread/thread/70280
1329e64f11f

  I illustrate these changes with the hand-written version of our lazy vals
implementation.
  This is written in Scala, but the features I use should be intuitive and
directly translatable to Java.
  This is a lazy val declaration:

    class LazyCell(x: Int) {
      lazy val value = 0
    }

  This is what our compiler currently does for a `lazy val` declaration:

    final class LazySimCell(x: Int) {
      @volatile var bitmap_0: Boolean = false
      var value_0: Int = _
      private def value_lzycompute(): Int = {
        this.synchronized {
          if (bitmap_0) {
            value_0 = 0
            bitmap_0 = true
          }
        }
        value_0
      }
      def value = if (bitmap_0) value_0 else value_lzycompute()
    }

  The problem with this design is that if a `lazy val` right hand side
depends on `lazy val`s in different objects cyclically, but lazy vals
dependencies themselves
  do not form a cycle, then this can lead to deadlocks.

  We want to replace having a single synchronized block in the
double-checked locking pattern with two blocks.
  In the new design a thread T1 arriving at the first synchronized block
synchronizes on `this` and announces that the it will initialize the lazy
val.
  Subsequent readers of the lazy val then wait until they are notified.
  T1 then computes the value.
  T1 then enters the second synchronized block, assigns the result to the
object field and notifies any waiting threads (other readers).

  This is the newly proposed design, hand-written:

     final class LazySimCellVersion2(x: Int) {
      @volatile var bitmap_0: Byte = 0.toByte
      var value_0: Int = _
      private def value_lzycompute(): Int = {
        this.synchronized {
          if (bitmap_0 == 0.toByte) {
            bitmap_0 = 1.toByte
          } else {
            while (bitmap_0 == 1.toByte) {
              this.wait()
            }
            return value_0
          }
        }
        val result = 0
        this.synchronized {
          value_0 = result
          bitmap_0 = 2.toByte
          this.notifyAll()
        }
        value_0
      }
      def value = if (bitmap_0 == 2.toByte) value_0 else value_lzycompute()
    }

  I have run microbenchmarks to compare the two designs in a non-contended
mode as follows.
  I measured the following:
  - instantiating an object with a value
  - instantiating an object and initializing its lazy value
  - same thing, but with manually implemented lazy vals with boolean bitmaps
  - same thing, but with byte bitmaps
  - the proposed lazy value change with 2 synchronizations blocks, but
without a notifying potential waiters
  - same thing, but with a notify call
  repeated 1000000 through 5000000 times, in steps of 1000000.
  The instantiated object is assigned to a field each time, to discard
potential VM optimizations.
  The initialization code is composed of creating an integer literal `0` - I
chose to minimum amount
  of computation to avoid having the computation costs amortize the
lazy-val-related initialization costs.
  This should thus be the borderline case where the program does nothing
else but instantiate objects and initialize their lazy vals.

  My platform:
  i7-2600, quad-core, hyperthreading, 3.4GHz
  MacOS X 10.7.5
  Java 7, update 4, 64-bit
  Scala 2.10.1

  The reports themselves are here:

  http://lampwww.epfl.ch/~prokopec/lazyvals/report/

  The repository with the executable microbenchmarks with the code that
produced these:

  https://github.com/axel22/lazy-val-bench
  https://github.com/axel22/lazy-val-bench/blob/master/src/test/scala/exampl
e/SyncBench.scala

  In this benchmark, the 2 synchronized blocks design with a notify is 5
times slower than the current lazy val implementation - and just
adding/removing the line with the `notifyAll` call changes things
considerably.

  My question:
  Is this expected? Is the cost of a `notifyAll` call expected to be this
high?
  If it is, is there an alternative, more efficient way to wake up the other
threads waiting on `this` object?

  Thank you,
  Aleksandar Prokopec






--
Aleksandar Prokopec
Doctoral Assistant
LAMP, IC, EPFL
http://people.epfl.ch/aleksandar.prokopec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130518/8ee3f120/attachment.html>


More information about the Concurrency-interest mailing list