[concurrency-interest] Garbage First vs a counting collector

Boehm, Hans hans.boehm at hp.com
Thu Mar 26 18:16:05 EDT 2009


[Little to do with concurrency, but ...]

> Yes, in my implementation, I was using the malloc libraries 
> and just allowing them to internally manage grouping.  My 
> collector had nothing to deal with compaction or otherwise 
> manage fragmentation.
> 
> In particular there are very real issues where small amounts 
> of memory are used and freed frequently with just as many 
> larger chunks of memory being allocated more permanently.  
> String concatenation from loops like
> 
> 	InputStream rd = ...
> 	int i;
> 	byte buf[] = ...
> 	String str = "";
> 	while( ( i = rd.read(buf) ) > 0 ) {
> 		str += new String( buf, 0, i );
> 	}
> 
> are great at creating small free structures intermingled with 
> larger permanent structures which can quickly fragment memory 
> into unusable segments.
> 
> Compaction is something that is vital to long lived highly 
> dynamic application.
> 
This really depends on the underlying allocator (malloc) implementation.

A well-know simple and asymptotically optimal strategy is to round up all allocation requests to the closest power of two, and allocate each size class in separate memory regions, which never shrink.  The total space use is the sum for each n of

<max total space required for size 2**n objects at any point in time>

This can't be more than NlogN, where N is the maximum number of bytes in use, since there are at most logN distinct size classes, and the total size of each class clearly can't exceed N.  If you know that the distribution of objects among the size classes doesn't change too much, you get closer to a linear bound.  If the ratios between the numbers of objects in any two different size classes doesn't ever change by more than a constant factor c, you get a linear bound.  Thus noncompacting scheme can clearly guarantee space bounds for long running programs.  Whether or not you'll like those bounds probably depends on the application.

Real implementations tend to optimize more for the expected case, and sometimes do worse than this in the worst case.

Certain forms of reference counting may introduce worse challenges.  If you try to do a constant amount of work per allocation and pointer assignment, you end up with a quadratic heap size bound instead of NlogN (See http://portal.acm.org/citation.cfm?doid=964001.964019 )

Hans


More information about the Concurrency-interest mailing list