new graph x Flashcards

(11 cards)

1
Q

What applications are best suited for GraphX or big graph processing?

A

Applications involving extremely large graphs that cannot fit on a single machine, such as:

Web graphs (pages + hyperlinks, billions of edges)

Knowledge graphs (entities + relationships)

Social networks (hundreds of millions of users and connections)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why does big graph processing need distributed systems like GraphX?

A

Big graphs often contain hundreds of millions of vertices and billions of edges, making them too large to store or process on a single machine. Distributed graph frameworks handle large-scale analytics efficiently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two major types of graph partitioning?

A

Edge Cut:
Each vertex goes to one partition.
Edges may be replicated across partitions.

Vertex Cut:
Each edge is assigned to one partition.
Vertices may be replicated.
GraphX uses Vertex Cut to reduce communication cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which graph partitioning method does GraphX use, and why?

A

GraphX uses Vertex Cut, because assigning edges to partitions and replicating vertices helps minimize inter-partition communication, which is crucial for scalable graph analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are basic vertex operations in GraphX?

A

filter: keep only vertices satisfying a predicate
mapValues: modify vertex attributes
diff: remove vertices that exist in another vertex set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are basic edge operations in GraphX?

A

mapValues: modify edge values

reverse: flip direction of edges (u → v becomes v → u)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are graph vertices and edges represented in GraphX?

A

GraphX uses the Property Graph Model:
VertexRDD[VD] stores (VertexID, VertexData)
EdgeRDD[ED] stores (SourceID, DestinationID, EdgeData

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Pregel-style graph processing?

A

Each vertex sends messages along outgoing edges.

Each vertex updates its data by aggregating incoming messages.

Steps repeat until a stopping condition is reached (e.g., no updates).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the Pregel API in GraphX used for?

A

vprog: how a vertex updates its value

sendMsg: how vertices send messages

mergeMsg: how messages are combined

It returns a new graph after each iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are examples of Pregel-style graph algorithms shown in the PDF?

A

Connected Components: propagate smallest vertex ID

Single-Source Shortest Path (SSSP): propagate distance updates

PageRank: propagate rank contributions

All follow the message-passing, iterative Pregel model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the general flow of a Pregel superstep?

A

Vertex receives messages

Vertex updates its state (vprog)

Vertex sends new messages (sendMsg)

System merges them (mergeMsg)

Repeat until convergence or iteration limit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly