[concurrency-interest] Overhead of ThreadLocal data

Francesco Nigro nigro.fra at gmail.com
Wed Oct 17 08:07:51 EDT 2018


That's super nice and somehow expected: that's why some (in this same list)
have suggested to extends Thread and provide an ad hoc field directly I
suppose...
Many thanks for this analysis!!!!

Il giorno mer 17 ott 2018 alle ore 13:58 Andrew Haley via
Concurrency-interest <concurrency-interest at cs.oswego.edu> ha scritto:

> Some of you might be interested to know the overhead of the "fast
> path" of ThreadLocals. Some of you might be terrified of the VM
> innards and panic when you see assembly code: this post is not for
> you. Everybody else, read on...
>
>
> Here's what happens when you say int n = ThreadLocal<Integer>::get :
>
>          ││ ;; B11: #   B24 B12 <- B3 B10       Loop: B11-B10 inner  Freq:
> 997.566
>  10.51%  ││  0x000003ff68b38170: ldr    x10, [xthread,#856]
>  ;*invokestatic currentThread {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 0 (line 162)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ## Read the pointer to java.lang.Thread from thread metadata.
>
>          ││  0x000003ff68b38174: ldr    w11, [x10,#76]
>          ││  0x000003ff68b38178: lsl    x24, x11, #3            ;*getfield
> threadLocals {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal::getMap at 1 (line 254)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 6 (line 163)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>   2.12%  ││  0x000003ff68b3817c: cbz    x24, 0x000003ff68b3824c  ;*ifnull
> {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 11 (line 164)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ## Read the pointer to Thread.threadLocals from the Thread. Check it's
>    not zero.
>
>          ││ ;; B12: #   B32 B13 <- B11  Freq: 997.551
>          ││  0x000003ff68b38180: ldr    w10, [x24,#20]
>          ││  0x000003ff68b38184: lsl    x10, x10, #3
> ;*getfield table {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal$ThreadLocalMap::getEntry at 5 (line 434)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ## Read the pointer to ThreadLocals.table from Thread.threadLocals
>
>
>   2.12%  ││  0x000003ff68b38188: ldr    w11, [x10,#12]
> ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal$ThreadLocalMap::getEntry at 8 (line 434)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>          ││                                                            ;
> implicit exception: dispatches to 0x000003ff68b38324
>
> ## Read the length field from table
>
>          ││ ;; B13: #   B26 B14 <- B12  Freq: 997.55
>          ││  0x000003ff68b3818c: ldr    w13, [x23,#12]
>
> ## Read ThreadLocal.threadLocalHashCode
>
>          ││  0x000003ff68b38190: sub    w12, w11, #0x1
>          ││  0x000003ff68b38194: and    w20, w13, w12               ;*iand
> {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal$ThreadLocalMap::getEntry at 11 (line 434)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>   1.37%  ││  0x000003ff68b38198: add    x12, x10, w20, sxtw #2
>
> ## int i = key.threadLocalHashCode & (table.length - 1);
>
>
>          ││  0x000003ff68b3819c: cmp    w11, #0x0
>          ││  0x000003ff68b381a0: b.ls   0x000003ff68b38270
>
> ## make sure table.length is not <= 0. (Can't happen, but VM doesn't know
> that.)
>
>          ││ ;; B14: #   B20 B15 <- B13  Freq: 997.549
>  11.88%  ││  0x000003ff68b381a4: ldr    w10, [x12,#16]
>
> ## Entry e = table[i];
>
>          ││  0x000003ff68b381a8: lsl    x25, x10, #3          ;*aaload
> {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal$ThreadLocalMap::getEntry at 18 (line 435)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>          ││  0x000003ff68b381ac: cbz    x25, 0x000003ff68b38208  ;*ifnull
> {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal$ThreadLocalMap::getEntry at 21 (line 436)
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ## if (e != null
>
>          ││ ;; B15: #   B5 B16 <- B14  Freq: 997.489
>   6.37%  ││  0x000003ff68b381b0: ldr    w11, [x25,#12]
>          ││  0x000003ff68b381b4: ldrsb  w10, [xthread,#48]
>          ││  0x000003ff68b381b8: lsl    x26, x11, #3
>   9.41%  ╰│  0x000003ff68b381bc: cbz    w10, 0x000003ff68b38130
>
> ## G1 garbage collector special case for weak references: if we're
>    doing parallel marking, take a slow path.
>
>             ;; B5: #    B27 B6 <- B18 B4 B16 B15  top-of-loop Freq: 997.489
>  24.71%  ↗↗  0x000003ff68b38130: cmp    x26, x23
>          ││  0x000003ff68b38134: b.ne   0x000003ff68b38294
> ;*invokevirtual getEntry {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.ThreadLocal::get at 16 (line 165)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ##  && e.get() == key)
>
>          ││ ;; B6: #    B7 <- B5 B22  Freq: 997.549
>          ││  0x000003ff68b38138: ldr    w11, [x25,#28]
>          ││  0x000003ff68b3813c: lsl    x0, x11, #3
>  ;*invokevirtual get {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 14 (line 31)
>
> ## We now have our ThreadLocal.
>
>          ││ ;; B7: #    B31 B8 <- B6 B25  Freq: 997.564
>   1.71%  ││  0x000003ff68b38140: ldr    w11, [x0,#8]                ;
> implicit exception: dispatches to 0x000003ff68b3830c
>          ││ ;; B8: #    B30 B9 <- B7  Freq: 997.563
>          ││  0x000003ff68b38144: mov    x12, #0x10000                   //
> #65536
>          ││                                                            ;
>  {metadata('java/lang/Integer')}
>          ││  0x000003ff68b38148: movk   x12, #0x3de8
>          ││  0x000003ff68b3814c: cmp    w11, w12
>   3.15%  ││  0x000003ff68b38150: b.ne   0x000003ff68b382f4
> ;*checkcast {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 17 (line 31)
>
> ## checkcast to make sure it really is an Integer.
>
>
>          ││ ;; B9: #    B28 B10 <- B8  Freq: 997.563
>          ││  0x000003ff68b38154: ldr    w10, [x0,#12]
>  ;*getfield value {reexecute=0 rethrow=0 return_oop=0}
>          ││                                                            ; -
> java.lang.Integer::intValue at 1 (line 1132)
>          ││                                                            ; -
> org.sample.ThreadLocalTest::floss at 20 (line 31)
>
> Read the int field. We're done.
>
> 12 field loads, 5 conditional branches. That's the overhead of a
> single ThreadLocal.get(). Conditional branches depending on the result
> of a load from memory are expensive, and we have a lot of them.
>
> --
> Andrew Haley
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at cs.oswego.edu
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://cs.oswego.edu/pipermail/concurrency-interest/attachments/20181017/2115143d/attachment-0001.html>


More information about the Concurrency-interest mailing list