Web graphs
*Vertices represent pages
*edges represent hyder links
Knowledge graphs
*Vertices represent entities, eg; locations,people,events
*edges represent relationships, eg; capital of, occured in
Graph model
Graph[VD,ED]
VD: verteice
ED : edge
examples of
Knowledge Graph
graph[String,String]
DV: String= entity names
ED: String = relationship
examples of graph[String,String]
Road Network
DV: (longitude, latitude)
ED: (streedName, speedLimit)
slides show graph creation code
Graph partitioning; Edge cut
*Each vertex is assigned to one partition
*Some edges are replicated
Graph partitioning; Vertex cut
*Each edge is assigned one partition
*Some vertices are replicated
GraphX adopts which graph partitioning for what function
adopts Vertex cut to minimize communication cost
Simple graph processing
Vertex Operations
Vertex Operation; Filter
Keeps only a subset of vertices that satisfy the predicated
Vertex Operation; mapValues
Modify the value on each
vertex
Vertex Operation; diff(other: VertexRDD[VD]):
Removes all
vertices in the other set of vertices
Edge operations
*mapValues
*reverse
Edge operations; mapValues
Modify the value on each
edge
Edge operations;Reverse
Reverse the direction of each
edge (V1 -> V2) → (V2 -> V2)
Iterative Graph Processing operations
Pregel API
def pregelA(
vprog: (VertexId, VD, A) => VD,
sendMsg: EdgeTriplet[VD, ED] =>
Iterator[(VertexId, A)],
mergeMsg: (A, A) => A)
: Graph[VD, ED]
Graph Analytics: Connected
Components
Goal
Assign each vertex the smallest ID in
its connected component
Graph Analytics: Connected
Components
message
Each vertex sends its current
label (initially its own ID) to neighbors.
Graph Analytics: Connected
Components
Update Rule
On each superstep, a vertex
adopts the smallest label received from its
neighbors if it’s smaller than its current
label
Graph Analytics: Connected
Components
Terminiation
When no vertex changes its
label (i.e., messages stop propagating),
the algorithm converges
Graph Analytics: Single-source Shortest
Path (SSSP)
Goal
Find the minimum distance from a
source vertex to all others.