[concurrency-interest] Overhead of ThreadLocal data

Nathan and Ila Reynolds nathanila at gmail.com
Mon Nov 12 10:31:18 EST 2018

What about replacing ThreadLocal.ThreadLocalMap with WeakHashMap?  
WeakHashMap uses a ReferenceQueue to track stale entries and WeakHashMap 
gets rid of stale entries with every access.  WeakHashMap checks the 
entry's hash before checking the key for equality.  Using WeakHashMap 
eliminates the cost of maintaining ThreadLocalMap which is essentially a 


On 10/19/2018 11:45 AM, Shevek wrote:
> This sounds VERY like the behaviour we observed - the WeakReferences 
> not getting cleared would make total sense for causing the linear scans.
> On 10/19/18 8:09 AM, Gil Tene via Concurrency-interest wrote:
>> The problem with “short lived” thread locals that do not explicitly 
>> use remove() when they are no longer needed, is that in those 
>> situations “short lived” means “until the next GC detects that the 
>> ThreadLocal is no longer reachable, and the weak reference to that 
>> ThreadLocal starts returning nulls”. Scanning for stale entries more 
>> frequently, or on an API call, won’t help these situations because 
>> the map entries are not “stale” until the collector determines the 
>> unreachability of their related ThreadLocal instances. When short 
>> lived ThreadLocals are created and die at a rate linear to 
>> application throughout, this leads to ThreadLocal maps in active 
>> threads holding thousands or millions of entries, to extremely long 
>> collision chains, and to situations where half the CPU is spent 
>> walking those chains in ThreadLocal.get().
>> Where would such a ThreadLocal instance creation rate come from? 
>> We’ve seen it happen in the wild in quite a few places. I initially 
>> thought of it as an application or library-level misuse of a 
>> ThreadLocal, but with each instance of juc ReentrantReadWriteLock 
>> potentially using a ThreadLocal instance for internal coordination, 
>> application and library writers that use ReentrantReadWriteLock in a 
>> seemingly idiomatic way are often unaware of the ThreadLocal 
>> implications. A simple, seemingly valid pattern for using 
>> ReentrantReadWriteLock would be a system where arriving work 
>> “packets” are handled by parallel threads (doing parallel work within 
>> the work packet), where each work packet uses its own of 
>> ReentrantReadWriteLock instance for coordination of the parallelism 
>> within the packet.
>> With G1 (and C4, and likely Shenandoah and ZGC), with the current 
>> ThreadLocal implementation, the “means “until the next GC detects 
>> that the ThreadLocal is no longer reachable” meaning gets compounded 
>> by the fact that collisions in the map will actually prevent 
>> otherwise-unreachable ThreadLocal instances from becoming 
>> unreachable. As long as get() calls for other (chain colliding) 
>> ThreadLocals are actively performed on any thread, where “actively” 
>> means “more than once during a mark cycle”, the weakrefs from the 
>> colliding entries get strengthened during the mark, preventing the 
>> colliding ThreadLocal instances from dying in the given GC cycle.. 
>> Stop-the-world newgen provides a bit of a filter for 
>> short-lived-enough ThreadLocals for G1, but for ThreadLocals with 
>> mid-term lifecycles (which will naturally occur as queues in a work 
>> system as described above grow under load), or without such a STW 
>> newgen (which none of the newer collectors have or want to have 
>> have), this situation leads to explosive ThreadLocal map growth under 
>> high-enough throughout situations.
>> We’ve created a tweaked implementation of ThreadLocal that avoids the 
>> weakref get()-strengthening problem with no semantic change, by 
>> having ThreadLocal get() determine identity by comparing a 
>> ThreadLocal ID (a unique long value per ThreadLocal instance, 
>> assigned at ThreadLocal instantiation) rather than comparing the 
>> outcome of a weakref get() for each entry in the chain walked when 
>> looking for a match. This makes weakref get()s only occur on the 
>> actual entry you are looking up, and not on any colliding entries, 
>> this preventing the gets from keeping dead ThreadLocals alive. We’ve 
>> been using this implementation successfully with C4 in Zing, and 
>> would be happy to share and upstream if there is interest.
>> Sent from my iPad
>>> On Oct 19, 2018, at 6:45 AM, Nathan and Ila Reynolds via 
>>> Concurrency-interest <concurrency-interest at cs.oswego.edu> wrote:
>>> A while ago, I raised the request for a processor local. Striping a 
>>> highly contended cache line, reduces contention but the cache line 
>>> still bounces around cores.  Using processor local, each cache line 
>>> can be assigned to a core and it remains in the exclusive state for 
>>> that core.  Atomic operations on that cache line are the fastest 
>>> possible.
>>> My original example was a reader-writer lock which is almost 
>>> exclusively used for read operations.  However, with processor 
>>> local, we can implement uncontended counters for heavily used queues.
>>> -Nathan
>>>> On 10/19/2018 7:06 AM, Brian Goetz wrote:
>>>> The problem is deeper than that; people using TL because it's what 
>>>> we've got, when really what people want (in various situations) is 
>>>> { processor, frame, task } locals, and TL is the best approximation 
>>>> we have.  Over in Project Loom, there's a deeper exploration going 
>>>> on of the use cases, and what additional mechanisms might help.
>>>>> On 10/19/2018 6:12 AM, Peter Levart via Concurrency-interest wrote:
>>>>> It is interesting that people have problems with short-lived 
>>>>> ThreadLocals because ThreadLocal tries hard to expunge stale 
>>>>> entries gradually during use. Even get() tries to do it if the 
>>>>> access happens to miss the home slot. Repeated access to the same 
>>>>> entry should not have to skip stale entries over and over again. 
>>>>> There seems to be special use pattern(s) that provoke problems. 
>>>>> Studying them more deeply could show us the weakness of current 
>>>>> ThreadLocal auto-expunge strategy so it could be improved and new 
>>>>> method would not be needed...
>>>>> Maybe the problem is not caused by unexpunged stale entries. It 
>>>>> may be related to degeneration of hashtable that leads to 
>>>>> sub-optimal placement of live entries. In that case, perhaps 
>>>>> triggering a re-hash automatically when get() encounters many live 
>>>>> entries it has to skip would help.
>>>>> So those with problems, please be more specific about them. Can 
>>>>> you show us a reproducer?
>>>>> Regards, Peter
>>>>>> On 10/17/2018 08:26 PM, Doug Lea via Concurrency-interest wrote:
>>>>>> [+list]
>>>>>>> On 10/17/18 11:44 AM, Nathan and Ila Reynolds wrote:
>>>>>>> Can we add the following method to ThreadLocal?
>>>>>>> public static void expungeStaleEntries()
>>>>>> This seems like a reasonable request (although perhaps with an 
>>>>>> improved
>>>>>> name).  The functionality exists internally, and it seems overly
>>>>>> parental not to export it for use as a band-aid by those people 
>>>>>> who have
>>>>>> tried and otherwise failed to solve the zillions of short-lived
>>>>>> ThreadLocals in long-lived threads problem.
>>>>>> Can anyone think of a reason not to do this?
>>>>>> -Doug
>>>>>>> This method will call 
>>>>>>> ThreadLocal.ThreadLocalMap.expungeStaleEntries()
>>>>>>> for the ThreadLocalMap of the current thread.  Thread pools can 
>>>>>>> then
>>>>>>> call this method when the thread finishes processing a job after 
>>>>>>> GC.
>>>>>>> This solves the problem of zillions of short-lived ThreadLocals in
>>>>>>> long-lived threads.
>>>>>> _______________________________________________
>>>>>> Concurrency-interest mailing list
>>>>>> Concurrency-interest at cs.oswego.edu
>>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>>> _______________________________________________
>>>>> Concurrency-interest mailing list
>>>>> Concurrency-interest at cs.oswego.edu
>>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest at cs.oswego.edu
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest at cs.oswego.edu
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

More information about the Concurrency-interest mailing list