Hypercube bisection width
p/2
Square Toirodal Mesh Bisection Width
2*sqrt(p)
Mesh Bisection Width
Min(n,m)
Fully Connected Bisection Width
p^2/4
Crossbar Bisection Width
p
Omega Bisection Width
p/2
Message transimission time
l (latency) + n (bytes)/b (bytes/second)
Snooping Cache Coherence
The idea behind snooping comes from bus-based systems: When the cores share a bus, any signal transmitted on the bus can be “seen” by all the cores connected to the bus. Thus when core 0 updates the copy of x stored in its cache, if it also broadcasts this information across the bus, and if core 1 is “snooping” the bus, it will see that x has been updated, and it can mark its copy of x as invalid. This is more or less how snooping cache coherence works.
Directory Based Cache Coherence
Shared Memory vs Distributed Memory
Shared Memory:
Pros:
1. Implicit coordination of processors through shared data structures.
2. Appealing programming model for many programmers.
3. Generally suitable for systems with a small number of processors.
Cons:
1. Scaling interconnect can be costly.
2. Conflicts over access to the bus increase dramatically with more processors.
3. Large crossbars, while efficient, are expensive.
Distributed Memory:
Pros:
1. Relatively inexpensive interconnects like hypercube and toroidal mesh.
2. Well-suited for systems with thousands of processors.
3. Better for problems requiring vast amounts of data or computation.
Cons:
1. Requires explicit message passing for coordination.
2. More complex programming model for many programmers.
3. Not as suitable for small-scale systems with few processors.
MIMD
SIMD
Modified State
Shared State
Invalid State
Bus Read
Generated by read operation of memory block not
in local cache
Bus Read Exclusive
Write Back
Cache controller writes a block marked M back to
main memory
Local Write from M, Cache Miss
Local Write from I, Cache Miss
Read from I, cache miss