[concurrency-interest] Some interesting (confusing?) benchmark results

Ariel Weisberg ariel at weisberg.ws
Tue Feb 12 16:58:16 EST 2013


My experience benchmarking an in memory database that shards down to
the core level on 2x2x4 Nehalem was that the majority of the
performance comes from the first four threads, the next two give you
peak throughput on workloads where transaction size is small. For TPC-C
where transactions are large best performance is with 12 threads
although 8 threads is very close.

We recommend 2/3rds the number of physical cores (sans hyper-threading)
as a starting point to users.

We aren't doing anything NUMA aware although there is no memory shared
between shards on. Asynchronous networking has a dedicated thread for
sending/receiving messages.

I wonder if there is a difference between CPU compute bound tasks and
CPU memory bound tasks. I have noticed the execution time of
transactions increases as you add threads even though there is no state
shared between threads/transactions and there are fewer threads then


On Tue, Feb 12, 2013, at 04:24 PM, Vitaly Davidovich wrote:

  For pure CPU bound work, I typically add 1 or maybe 2 more threads
  than # of hardware threads; this is to account for hardware threads
  possibly hitting a hard page fault and getting suspended.  I don't
  see how having any more than that threads benefits perf.

  Sent from my phone

On Feb 12, 2013 4:21 PM, "√iktor Ҡlang" <[1]viktor.klang at gmail.com>

On Tue, Feb 12, 2013 at 8:28 PM, Kirk Pepperdine <[2]kirk at kodewerk.com>

> Do you agree that thread pool sizing depends on type of work? (IO
bound vs CPU bound, bursty vs steady etc etc)


> Do you agree that a JVM Thread is not a unit of parallelism?


> Do you agree that having more JVM Threads than hardware threads is
bad for CPU-bound workloads?

  No, even with CPU bound workloads I have found that the hardware/OS
  is much better at managing many workloads across many threads than I
  am. So a few more threads is ok, many more threads is bad fast.

That's an interesting observation. Have any more data on that? (really
As I said earlier, for CPU-bound workloads we've seen the best
performance when only loading 60-70% of the cores (other threads exist
on the machine of course).



Viktor Klang
Director of Engineering
[3]Typesafe - The software stack for applications that scale
Twitter: @viktorklang

  Concurrency-interest mailing list
  [4]Concurrency-interest at cs.oswego.edu


Concurrency-interest mailing list

[6]Concurrency-interest at cs.oswego.edu



1. mailto:viktor.klang at gmail.com
2. mailto:kirk at kodewerk.com
3. http://www.typesafe.com/
4. mailto:Concurrency-interest at cs.oswego.edu
5. http://cs.oswego.edu/mailman/listinfo/concurrency-interest
6. mailto:Concurrency-interest at cs.oswego.edu
7. http://cs.oswego.edu/mailman/listinfo/concurrency-interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20130212/4236eeec/attachment.html>

More information about the Concurrency-interest mailing list