[concurrency-interest] tree-to-tree differences / getting descendant-count and modification-count for each node

Johannes.Lichtenberger Johannes.Lichtenberger at uni-konstanz.de
Sun Jan 29 16:40:44 EST 2012


I'm not sure if that's the right place to ask, as it seems, the topics
discussed are much more "low level", about the forkJoin-architecture
itself for instance.

I'm currently computing tree-to-tree differences in a top-down
(preorder-traversal) approach and want to get the number of descendants
and the number of modifications for each node (the modifications denote
how many modifications have been done in the whole subtree).

The differences between each two nodes are fired to observers, that is
if a node has been INSERTED, DELETED, UPDATED or is the SAME (and
optionally MOVEDFROM and MOVEDTO in a postprocessing step).

Now I'm going to create an agglomeration, that is want to represent the
diffs in a radial SunburstView and visualize the differences.

I somehow thought about one year ago, that I'm saving the diffs in a
List and traverse the new revision of the tree and change the
transaction to the old revision whenever a DELETE- or MOVEDTO-event /
diff occured. I'm computing the number of descendants and modifications
per node in parallel with an ExecutorService for each node, but maybe I
should also split the task of computing the descendants and
modifications for each node (the trees might become very large) and
essentially it's an O(n^2) operation which can be parallelized. It's
working but it's very complex and slow for about 150_000 nodes, and I
think I would like to deal with much more nodes, even though they are
pruned... that is subtrees which have the same node-ID and the same hash
value are skipped by the diff-algorithm and emitted as SAME_HASH.

I just thought about saving the diffs in a tree-structure itself and
adjusting each ancestor descendant-count and modification-count whenever
a new child is added. That would probably be much easier and reduce the
runtime to O(n*k/2) whereas k is the height of the tree and n the number
of diffs.

But maybe depending on the number of cores it would scale well with my
current approach, as the descendant-count and modification-count is
calculated in parallel to the traversal of the new tree-revision and the
creation of the Sunburst items (but well, the traversal might be much
faster and therefore the results from the BlockingQueue (descendant- and
modification-count) might come too late nontheless). Haven't tested it
yet as I'm currently only able to test on my notebook with a Core 2 Duo,
but I'm just a little bit depressed, as I think saving the diffs in a
tree-structure itself would have been much better and afterwards
traverse the tree and create the Sunburst items for the visualization
thereof. Don't know why I haven't thought about it in the first place,
but I don't know of a ready-made tree-collection -- though I know the
guys from Google Guava are working on it.

Do you think it makes sense to improve the computation of the
modifications and descendants for each node itself (parallelize the tree
traversal)? Or should I refactor the whole code-base to use a
tree-structure in the first place instead of a List which can afterwards
be traversed, whereas the descendant-count and modification-count is
adjusted for each new diff I get from the diff-algorithm (that is for
each ancestor add + 1 to the descendant-count and if it's a modification
(INSERTED, UPDATED, DELETED) add + 1 to the modification-count.

kind regards,

More information about the Concurrency-interest mailing list