Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Superstep
V2.compute()
V3.compute()
V4.compute()
V1.compute()
0
Our neighbors send us their values, we add them up
job chaining ==
unintended consequences
iteration == job chaining
iteration
Version 0.1
> bin/giraph \
~/maxValueGiraph-1.0.jar maxValueGiraph.MaximumValueInGraph \
-w 5 \
-if org.apache.giraph.lib.TextDoubleDoubleAdjacencyListVertexInputFormat \
-ip my_graph \
-of org.apache.giraph.lib.IdWithValueTextOutputFormat \
-op my_output
First, we think we're the max
Aggregators
Combiners
Checkpointing
Even more improved RPC
More
{in|out}putformats
Improved
robustness
YARN!
And compute our new value
Anybody more max?
V2.compute()
1
V3.compute()
V4.compute()
V1.compute()
::
ZooKeeper
halt
If so, tell everybody
<picture of giraffe wearing
mortarboard not found...
how is that possible?>
If not, vote to be done
Trunk (version 0.2 soon?)
Do this a pre-set
number of times
We graduated!
V2.compute()
2
V3.compute()
V4.compute()
HCat, Oozie,
Azkaban, etc.
Higher-level
languages
Monitoring and metrics
halt
Disk IO and job scheduling quickly dominate the algorithm
http://incubator.apache.org/giraph/
2010
2011
2012
(top-level-project domain coming soon)
Map-only jobs? Pre-partitioning the data? HaLoop? HadApt?
A better way of doing large-scale graph processing on Hadoop
V2.compute()
3
V4.compute()
bit.ly/newbie_apache_giraph_issues
And send our new value to
everybody else...
or it's time to quit
halt
"The performance, scalability, and fault-tolerance of Pregel are already satisfactory for graphs with billions of vertices."
*
* A very badly behaved one