[concurrency-interest] HashSet performance

Markus Kohler markus.kohler at gmail.com
Wed Jun 11 03:20:46 EDT 2008


Hi Osvaldo,


On Tue, Jun 10, 2008 at 9:31 PM, Osvaldo Pinali Doederlein <
osvaldo at visionnaire.com.br> wrote:

>
> [snip]
>
> Now, while we are playing with HashSet optimization, another idea is: we
> could have a method like get(), that looks for an object and if it's found,
> returns it. The Set interface only offers contains(), which returns a
> boolean - definitely a cleaner interface for the most common usage of Set,
> but then you can't use a Set for the VERY important use-case of
> canonicalization. Canonicalization would only require a Set, but when the
> canonic object is found, if want to return it, so the caller can use it in
> lieu of the original object. We are forced to use an explicit Map collection
> just because it offers the get() lookup method. A Set.get() method would be
> easily defined: public T get (T obj), checks if the set contains an object
> that's equal to obj, and returns that object (or null if not found). Now, we
> obviously cannot add new methods to the Set interface, but we could do that
> in concrete implementations like HashSet. Implementation of such method
> would be trivial (the existing contains() would me morphed into that new
> method - requiring only to change a return statement and the signature - and
> the new contains() would return "get(obj) != null").
>

This sound like very good idea to me.

>
> P.S.: Sun doesn't need a HashSet that allows efficient canonicalization
> because they cheat - String.intern() is a native method, probably
> implemented with a more efficient collection in the native layer. :)
>

It is implemented in the native layer, but it is not necessarily faster than
HashMap or ConcurrentHashMap.
Actually it used to be much slower in JDK 1.4.2 because the JVM used a
relatively small hashtable and if you interned enough Strings it would
degenerate because it uses linked lists to resolve collisions.
"We" fixed this in the SAP JVM and I think newer Java version got better as
well.

Still, ConcurrentHashMap can be faster in case you have parallel access by
many threads.

The main advantage of String.intern() is that it acts like a WeakHashmap,
but without the high memory overhead.

In the SAP JVM we therefore implemented an optimized String.intern(), which
is fast enough for us for now, but it would be really nice to have an
optimized ConcurrentWeakSet with support for the API you described above.

Regards,
Markus


> A+
> Osvaldo
>
>
> -----------------------------------------------------------------------
> Osvaldo Pinali Doederlein                   Visionnaire Informática S/Aosvaldo at visionnaire.com.br                http://www.visionnaire.com.br
> Arquiteto de Tecnologia                          +55 (41) 337-1000 #223
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest at altair.cs.oswego.edu
> http://altair.cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: /pipermail/attachments/20080611/fd28ffa3/attachment.html 


More information about the Concurrency-interest mailing list