Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Apache Giraph - Berlin Buzzwords

Jakob Homan

Updated June 5, 2012

Transcript

How's it built

Think like a vertex

Example: PageRank

Bells and whistles

How do I run a Giraph job?

And what does it look like?

State of the world

Future Work

Thank you!

BSP: Revenge of the messages

Example: Find max value

From Y! to

What's suboptimal with MR?

Giraph: loosely based on...

Key concepts

Superstep

V2.compute()

V3.compute()

V4.compute()

V1.compute()

Our neighbors send us their values, we add them up

job chaining ==

unintended consequences

iteration == job chaining

iteration

Version 0.1

Yahoo! Research developed original codebase
Entered Apache Incubator in July 2011
New Apache team quickly formed

> bin/giraph \

~/maxValueGiraph-1.0.jar maxValueGiraph.MaximumValueInGraph \

-w 5 \

-if org.apache.giraph.lib.TextDoubleDoubleAdjacencyListVertexInputFormat \

-ip my_graph \

-of org.apache.giraph.lib.IdWithValueTextOutputFormat \

-op my_output

First, we think we're the max

Aggregators

Combiners

Checkpointing

Even more improved RPC

{in|out}putformats

Improved

robustness

YARN!

And compute our new value

Anybody more max?

Improved out-of-box experience
Significant memory improvements
Improved definition of Vertex and Combiner
Lots of useful file formats

V2.compute()

V3.compute()

V4.compute()

V1.compute()

ZooKeeper

Writing to disk isn't always evil
Store work at user-defined intervals
Restart on failure

halt

If so, tell everybody

Global values calculated in superstep n, available in next superstep
Sum, Min, Max provided
User-definable

User-defined function to combine messages before being sent or delivered
Similar to combiners in Hadoop
Saves on network or memory

<picture of giraffe wearing

mortarboard not found...

how is that possible?>

Shared state
Master-Worker coordination
Aggregators

If not, vote to be done

Trunk (version 0.2 soon?)

Do this a pre-set

number of times

We graduated!

V2.compute()

V3.compute()

V4.compute()

HCat, Oozie,

Azkaban, etc.

Higher-level

languages

Monitoring and metrics

halt

Dramatically improved RPC system <- A Big Deal!
HBase and Accumulo integration
Hadoop 1.0 support

Disk IO and job scheduling quickly dominate the algorithm

http://incubator.apache.org/giraph/

2010

2011

2012

-w = how many workers. Worker == Mapper
Standard bin script
Lots of {in|out}putformats with silly names
Read and write from directories

(top-level-project domain coming soon)

Fewer messages sent as algorithm progresses
Three supersteps versus one MR job. Still faster?

Map-only jobs? Pre-partitioning the data? HaLoop? HadApt?

A better way of doing large-scale graph processing on Hadoop

V2.compute()

V4.compute()

bit.ly/newbie_apache_giraph_issues

And send our new value to

everybody else...

or it's time to quit

halt

"The performance, scalability, and fault-tolerance of Pregel are already satisfactory for graphs with billions of vertices."

* A very badly behaved one

Choose a template

Hiking Journey (AI Assisted)

Elevate your presentations with our immersive Hiking Journey Prezi AI-assisted presentation template, meticulously crafted to showcase the beauty of your adventures, from scenic trails to breathtaking landscapes, providing a visually compelling experience for every outdoor enthusiast.

Constellations (AI Assisted)

Illuminate your ideas with our captivating Constellations Prezi AI-assisted presentation template, merging celestial elegance with professional design to elevate your content and guide your audience through a stellar visual experience.

Modular - Dark (AI Assisted)

Revolutionize your presentations with our Modular Prezi AI-assisted presentation template, a versatile and customizable solution that adapts to your unique content, providing a visually stunning and cohesive framework for professionals, educators, and creatives.

See more templates →

Presentations from around the world

YO CAM DUDE DONNY

Mallory Worsnop

Hinduism

lina Agam

PRIMEO AULA 1 - Jovem Aprendiz

Shelly Martins

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better