[concurrency-interest] Dataflow/Flow-Based programming Frameworks for Java?

nchen.dev at mac.com nchen.dev at mac.com
Wed Nov 9 10:51:23 EST 2011


Short version

What are some dataflow/flow-based programming frameworks that this group has used for Java? I know that there is a commercial version (Pervasive Datarush [1]) and there are at least three open source version (GPars [2], Akka [3], Java FBP [4]). I am interested in open source versions because of what I am doing (see below) but it's hard to find real applications written in them. I've only found very simple examples on their websites and other developer blogs/articles.

I'd like to get this group's expert opinions on the following two questions:

1) What are some of the frameworks that this group has used and what are the strengths/weaknesses that you can share?
2) What are some applications that you (or someone you know) have written in those frameworks? Preferably, the source is available so I can take a look and learn from that.

Long version

I'm looking at flow-based applications (content-based image retrieval, data deduplication, signal processing, ETL [5], etc) and trying to look for design patterns on how developers transform their code. After cataloging these patterns I would then try to automate some of the transformations. 

My group and I have already looked at some applications from the PARSEC benchmark suite [6] over summer and have published a paper about our preliminary results [7]. Since PARSEC is written in C/C++ we used Intel TBB which had excellent support for both pipelines and flow graphs. What we discovered is that a library/framework makes the transformation process more convenient, the resulting code more succinct and easier to understand and the performance was on-par with the PThreads version.

Nonetheless, it's hard to write good source-to-source transformation tools for C++ because of all the intricacies of the language. Thus, we have decided to switch to Java because of our familiarity with the Eclipse Refactoring Toolkit and the existing static/dynamic analysis tools for the Java language. One thing that we have trouble finding is a good library/framework for expressing pipeline and flow based applications. We could probably do this from scratch with the Fork/Join framework or ExecutorService but we would rather extend on an existing library that others have found useful.

So any thoughts on frameworks in Java that support expressing these kinds of applications would be appreciated.


Nicholas Chen
PhD Candidate
University of Illinois at Urbana-Champaign

[1] http://www.pervasivedatarush.com/
[2] http://gpars.codehaus.org/
[3] http://akka.io/
[4] http://sourceforge.net/projects/flow-based-pgmg/
[5] http://en.wikipedia.org/wiki/Extract,_transform,_load
[6] http://parsec.cs.princeton.edu/
[7] http://tmc.supertriceratops.com/papers/tmc2011-3-Reed.pdf

More information about the Concurrency-interest mailing list