new graph x Flashcards

Question 1

Q

What applications are best suited for GraphX or big graph processing?

Answer

A

Applications involving extremely large graphs that cannot fit on a single machine, such as:

Web graphs (pages + hyperlinks, billions of edges)

Knowledge graphs (entities + relationships)

Social networks (hundreds of millions of users and connections)

Question 2

Q

Why does big graph processing need distributed systems like GraphX?

Answer

A

Big graphs often contain hundreds of millions of vertices and billions of edges, making them too large to store or process on a single machine. Distributed graph frameworks handle large-scale analytics efficiently

Question 3

Q

What are the two major types of graph partitioning?

Answer

A

Edge Cut:
Each vertex goes to one partition.
Edges may be replicated across partitions.

Vertex Cut:
Each edge is assigned to one partition.
Vertices may be replicated.
GraphX uses Vertex Cut to reduce communication cost

Question 4

Q

Which graph partitioning method does GraphX use, and why?

Answer

A

GraphX uses Vertex Cut, because assigning edges to partitions and replicating vertices helps minimize inter-partition communication, which is crucial for scalable graph analytics.

Question 5

Q

What are basic vertex operations in GraphX?

Answer

A

filter: keep only vertices satisfying a predicate
mapValues: modify vertex attributes
diff: remove vertices that exist in another vertex set

Question 6

Q

What are basic edge operations in GraphX?

Answer

A

mapValues: modify edge values

reverse: flip direction of edges (u → v becomes v → u)

Question 7

Q

How are graph vertices and edges represented in GraphX?

Answer

A

GraphX uses the Property Graph Model:
VertexRDD[VD] stores (VertexID, VertexData)
EdgeRDD[ED] stores (SourceID, DestinationID, EdgeData

Question 8

Q

What is Pregel-style graph processing?

Answer

A

Each vertex sends messages along outgoing edges.

Each vertex updates its data by aggregating incoming messages.

Steps repeat until a stopping condition is reached (e.g., no updates).

Question 9

Q

What is the Pregel API in GraphX used for?

Answer

A

vprog: how a vertex updates its value

sendMsg: how vertices send messages

mergeMsg: how messages are combined

It returns a new graph after each iteration.

Question 10

Q

What are examples of Pregel-style graph algorithms shown in the PDF?

Answer

A

Connected Components: propagate smallest vertex ID

Single-Source Shortest Path (SSSP): propagate distance updates

PageRank: propagate rank contributions

All follow the message-passing, iterative Pregel model.

Question 11

Q

What is the general flow of a Pregel superstep?

Answer

A

Vertex receives messages

Vertex updates its state (vprog)

Vertex sends new messages (sendMsg)

System merges them (mergeMsg)

Repeat until convergence or iteration limit

new graph x Flashcards

(11 cards)