What is the motivation for Google Pregel?
- Some graph sets are massive, and computations can be parallelized
A Pregel computation consists of a sequence of iterations called a superstep.
What are the 6 things that occur during a superstep?
The model of computation in pregel is vertex-centric. What does this mean?
“think like a vertex”
During a superstep in pregel, what can a vertex do?
What does pregel maintain in the state for each vertex?
In pregel, supersteps are computed _______, whereas workers compute _______ within each superstep, and communicate in _______ supersteps
synchronously
asynchronously
between
What is a vertex partition in Pregel?
A subset of a directed graph’s vertices. Each worker is responsible for a vertex partition
In Pregel, vertices can be either “active” or “inactive”.
How does a vertex switch from active to inactive? vice versa?
When does the pregel program stop executing?
A distributed execution stops when all vertices are inactive and where are no more messages to process
Do computing units (vertices) in Pregel use shared memory?
No. They use message passing. Each vertex uses message passing to communicate with other vertices
Upon initialization of a Pregel program, how is vertex ownership determined?
In Pregel, what does a worker do upon initialization?
In Pregel, what does the master do upon initialization?
Upon initialization of the Pregel program, can the user override the default partitioning scheme?
Yes, it can be overwritten to exploit locality by co-locating graph components or dense subgraphs
In Pregel, combiners are supported (similar to Hadoop).
What is the benefit of combiners in Pregel? What is the drawback?
In Pregel, for what cases are combiners applicable?
How are combiners enabled?
Combiners are applicable for when the function applied at each vertex is communicative and associative (ex: min, max, sum)
Combiners are user-defined and explicitly enabled
In Pregel, aggregators are supported.
What do aggregators do? What can they be used for?
In Pregel, aggregators are supported.
Explain the mechanism for workers and masters using aggregators in between supersteps
As a part of fault tolerance in pregel systems, what does the master do at the start of a superstep?
How is the frequency of these operations determined?
For fault tolerance, at the start of a superstep:
- The master tells the workers to save their state to a persistent storage. Just like a checkpoint
Fault tolerance in Pregel systems. What happens when the master detects one or more worker failures?
When the master detects a failure:
In a Pregel system, what how can the fault tolerance be made more effiicient?
A more efficient recovery system:
Pregel is a model for ______ distributed _____ computation
scalable
graph