The four parallel paradigms
Learning outcomes
Levels of parallelism
Speedup
Given two variants of a program solving the same problem- a baseline, and a optimzed implementation, faster algorithm, or parallel version- with running times t and t’ (optimized time).
S = t/t’
Amdahl’s law

S(f,s) = 1/((1-f)+f/s)
as S goes to infinity then 1/(1-f)
Multicore CPU are technical necessity ?
Yes. Cooling is a bottleneck when increasing clock frequency of a CPU
Flynn’s taxonomy

Memory hierarchy

cache memory
Small hi-speed memory attached to processor core
Symmetric Multiprocessor (SMP)
High performance computing (HPC)

Classical HPC compute cluster is an appropriate computer architecture for Monte Carlo simulations like the parallel Pi example. Assume that the parallelization across nodes is not a problem.
Yes, HPC is a good computer architecture for Monto carlo simulations.
Difference between HPC and commodity

Distributed compute cluster (commodity hardware)

Workload comparison between HPC and Datascience

Data-intensive Compute Cluster

Latency vs computation
Computation is cheap, datamovement is very expensive
Multithreading
In multi-threaded programming the time needed to communicate between two threads is typically on the order of
200ns
In multithreaded- programming all threads can simultenously….
…read nd write, but not the same data
Threads writing to memory incorrectly

Threds writing to memory correctly

Locking
Deadlocks