[concurrency-interest] Overhead of ThreadLocal data

Andrew Haley aph at redhat.com
Wed Oct 17 07:50:39 EDT 2018


Some of you might be interested to know the overhead of the "fast
path" of ThreadLocals. Some of you might be terrified of the VM
innards and panic when you see assembly code: this post is not for
you. Everybody else, read on...


Here's what happens when you say int n = ThreadLocal<Integer>::get :

         ││ ;; B11: #	B24 B12 <- B3 B10 	Loop: B11-B10 inner  Freq: 997.566
 10.51%  ││  0x000003ff68b38170: ldr	x10, [xthread,#856]         ;*invokestatic currentThread {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal::get at 0 (line 162)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

## Read the pointer to java.lang.Thread from thread metadata.

         ││  0x000003ff68b38174: ldr	w11, [x10,#76]
         ││  0x000003ff68b38178: lsl	x24, x11, #3            ;*getfield threadLocals {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal::getMap at 1 (line 254)
         ││                                                            ; - java.lang.ThreadLocal::get at 6 (line 163)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)
  2.12%  ││  0x000003ff68b3817c: cbz	x24, 0x000003ff68b3824c  ;*ifnull {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal::get at 11 (line 164)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

## Read the pointer to Thread.threadLocals from the Thread. Check it's
   not zero.

         ││ ;; B12: #	B32 B13 <- B11  Freq: 997.551
         ││  0x000003ff68b38180: ldr	w10, [x24,#20]
         ││  0x000003ff68b38184: lsl	x10, x10, #3                ;*getfield table {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal$ThreadLocalMap::getEntry at 5 (line 434)
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

## Read the pointer to ThreadLocals.table from Thread.threadLocals


  2.12%  ││  0x000003ff68b38188: ldr	w11, [x10,#12]              ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal$ThreadLocalMap::getEntry at 8 (line 434)
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)
         ││                                                            ; implicit exception: dispatches to 0x000003ff68b38324

## Read the length field from table

         ││ ;; B13: #	B26 B14 <- B12  Freq: 997.55
         ││  0x000003ff68b3818c: ldr	w13, [x23,#12]

## Read ThreadLocal.threadLocalHashCode

         ││  0x000003ff68b38190: sub	w12, w11, #0x1
         ││  0x000003ff68b38194: and	w20, w13, w12               ;*iand {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal$ThreadLocalMap::getEntry at 11 (line 434)
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)
  1.37%  ││  0x000003ff68b38198: add	x12, x10, w20, sxtw #2

## int i = key.threadLocalHashCode & (table.length - 1);


         ││  0x000003ff68b3819c: cmp	w11, #0x0
         ││  0x000003ff68b381a0: b.ls	0x000003ff68b38270

## make sure table.length is not <= 0. (Can't happen, but VM doesn't know that.)

         ││ ;; B14: #	B20 B15 <- B13  Freq: 997.549
 11.88%  ││  0x000003ff68b381a4: ldr	w10, [x12,#16]

## Entry e = table[i];

         ││  0x000003ff68b381a8: lsl	x25, x10, #3          ;*aaload {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal$ThreadLocalMap::getEntry at 18 (line 435)
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)
         ││  0x000003ff68b381ac: cbz	x25, 0x000003ff68b38208  ;*ifnull {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal$ThreadLocalMap::getEntry at 21 (line 436)
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

## if (e != null

         ││ ;; B15: #	B5 B16 <- B14  Freq: 997.489
  6.37%  ││  0x000003ff68b381b0: ldr	w11, [x25,#12]
         ││  0x000003ff68b381b4: ldrsb	w10, [xthread,#48]
         ││  0x000003ff68b381b8: lsl	x26, x11, #3
  9.41%  ╰│  0x000003ff68b381bc: cbz	w10, 0x000003ff68b38130

## G1 garbage collector special case for weak references: if we're
   doing parallel marking, take a slow path.

            ;; B5: #	B27 B6 <- B18 B4 B16 B15  top-of-loop Freq: 997.489
 24.71%  ↗↗  0x000003ff68b38130: cmp	x26, x23
         ││  0x000003ff68b38134: b.ne	0x000003ff68b38294          ;*invokevirtual getEntry {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.ThreadLocal::get at 16 (line 165)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

##  && e.get() == key)

         ││ ;; B6: #	B7 <- B5 B22  Freq: 997.549
         ││  0x000003ff68b38138: ldr	w11, [x25,#28]
         ││  0x000003ff68b3813c: lsl	x0, x11, #3                 ;*invokevirtual get {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 14 (line 31)

## We now have our ThreadLocal.

         ││ ;; B7: #	B31 B8 <- B6 B25  Freq: 997.564
  1.71%  ││  0x000003ff68b38140: ldr	w11, [x0,#8]                ; implicit exception: dispatches to 0x000003ff68b3830c
         ││ ;; B8: #	B30 B9 <- B7  Freq: 997.563
         ││  0x000003ff68b38144: mov	x12, #0x10000               	// #65536
         ││                                                            ;   {metadata('java/lang/Integer')}
         ││  0x000003ff68b38148: movk	x12, #0x3de8
         ││  0x000003ff68b3814c: cmp	w11, w12
  3.15%  ││  0x000003ff68b38150: b.ne	0x000003ff68b382f4          ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 17 (line 31)

## checkcast to make sure it really is an Integer.


         ││ ;; B9: #	B28 B10 <- B8  Freq: 997.563
         ││  0x000003ff68b38154: ldr	w10, [x0,#12]               ;*getfield value {reexecute=0 rethrow=0 return_oop=0}
         ││                                                            ; - java.lang.Integer::intValue at 1 (line 1132)
         ││                                                            ; - org.sample.ThreadLocalTest::floss at 20 (line 31)

Read the int field. We're done.

12 field loads, 5 conditional branches. That's the overhead of a
single ThreadLocal.get(). Conditional branches depending on the result
of a load from memory are expensive, and we have a lot of them.

-- 
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671


More information about the Concurrency-interest mailing list