Chapter 4 Flashcards by Awa Hope

What are the four steps in the Parallel Design Process?

PARTITION
COMMUNICATE
AGGLOMERATE
MAP

These steps help in analyzing and designing parallel algorithms.

How well did you know this?

Not at all

Perfectly

What is a dependency in the context of parallel computing?

A situation where one operation must complete before another can begin

Dependencies are considered the enemy of parallelism.

How well did you know this?

Not at all

Perfectly

True or false: Only RAW dependencies limit parallelism.

TRUE

WAR and WAW dependencies can often be worked around.

How well did you know this?

Not at all

Perfectly

What are the three types of data dependencies?

RAW (Read After Write)
WAR (Write After Read)
WAW (Write After Write)

Each type has different implications for parallel execution.

How well did you know this?

Not at all

Perfectly

What is a loop-carried dependency?

A dependency where iteration i depends on results from iteration i-k

These dependencies are primary barriers to loop parallelization.

How well did you know this?

Not at all

Perfectly

What is the key insight regarding S2 and S3 in the dependency example?

S2 and S3 can run in parallel because they have no dependency on each other

This illustrates how dependencies affect execution order.

How well did you know this?

Not at all

Perfectly

What are Bernstein’s Conditions for Parallelism?

Condition 1: No RAW
Condition 2: No WAR
Condition 3: No WAW

These conditions help determine if two tasks can run in parallel.

How well did you know this?

Not at all

Perfectly

What is data decomposition in parallel computing?

Partitioning the data and applying the same operation

This strategy is useful for uniform workloads.

How well did you know this?

Not at all

Perfectly

What is the block distribution strategy?

Divide data into contiguous chunks

Best for uniform workloads where each element takes the same time to process.

How well did you know this?

Not at all

Perfectly

What is the cyclic distribution strategy?

Assign elements in round-robin fashion

Best for non-uniform workloads where processing time varies by element position.

How well did you know this?

Not at all

Perfectly

What is the block-cyclic distribution strategy?

Assign blocks of size B in round-robin fashion

Used in ScaLAPACK and many HPC applications.

How well did you know this?

Not at all

Perfectly

What is the functional (task) decomposition?

Partitioning the computation/functions

Different processors execute different tasks, often combined in real applications.

How well did you know this?

Not at all

Perfectly

What is the pipeline decomposition?

A special form of functional decomposition where stages operate on a stream of data

It helps in improving latency and throughput.

How well did you know this?

Not at all

Perfectly

What is the recursive decomposition?

Dividing a problem into subproblems of the same type and solving them in parallel

Commonly used in divide-and-conquer algorithms.

How well did you know this?

Not at all

Perfectly

What does a dependency graph show?

Which operations must precede others

It helps visualize dependencies and critical paths.

How well did you know this?

Not at all

Perfectly

What is the critical path in a dependency graph?

Study These Flashcards

The longest path through the graph, determining the minimum time to complete all tasks

It indicates the maximum speedup possible.

What are the three techniques to eliminate or reduce dependencies?

Study These Flashcards

Variable Renaming
Privatization
Reduction

These techniques help enable parallelism by addressing different types of dependencies.

What is the first step in the PCAM model for parallel algorithm design?

Study These Flashcards

Partitioning

Decompose problem into tasks (Data or functional).

In OpenMP, what directive is used to indicate a parallel for loop?

Study These Flashcards

pragma omp parallel for

This directive allows multiple iterations of the loop to be executed in parallel.

What does the reduction clause do in OpenMP?

Study These Flashcards

Combines values from multiple threads into a single value

Example: #pragma omp parallel for reduction(+:sum) for summing elements.

What is the purpose of loop fission?

Study These Flashcards

Split loops to separate dependencies

This can allow for parallel execution of independent parts.

What is the goal of load balancing in parallel computing?

Study These Flashcards

Even distribution of computation

This helps to maximize resource utilization and minimize idle time.

True or false: Fine-grained tasks have a small size and many tasks.

Study These Flashcards

TRUE

Fine-grained tasks typically have higher overhead due to more synchronization.

What are the communication patterns in parallel computing?

Study These Flashcards

Point-to-Point
Broadcast
Scatter
Gather
All-to-All
Reduction

These patterns determine how tasks exchange data.

What does **loop interchange** achieve?

Reorders loops to change dependency structure ## Footnote This can allow for better parallelization by reducing dependencies.

In the context of parallel programming, what does **granularity** refer to?

The size of tasks in a parallel program ## Footnote It can be fine-grained (small tasks) or coarse-grained (large tasks).

What is the **OpenMP directive** for static scheduling with chunks of 4?

#pragma omp for schedule(static, 4) ## Footnote This assigns equal chunks of 4 iterations upfront to threads.

Fill in the blank: The **PCAM** model stands for **P**, **C**, **A**, **M**.

Partitioning, Communication, Agglomeration, Mapping ## Footnote This model provides a systematic approach to parallel algorithm design.

What is the **dependency type** in the following code snippet: for (i = 0; i < n; i++) { a[i] = b[i] + c[i]; }

Fully parallel ## Footnote There are no dependencies between iterations.

What is the **dependency type** in the following code snippet: for (i = 1; i < n; i++) { a[i] = a[i-1] * 2; }

Sequential only ## Footnote Each iteration depends on the result of the previous one.

What is the **OpenMP implementation** for parallelizing matrix multiplication?

#pragma omp parallel for collapse(2) ## Footnote This allows both outer loops (i and j) to be executed in parallel.

What is the **dependency type** in the following code snippet: for (i = 0; i < n; i++) { sum = sum + a[i]; }

Parallel (use reduction) ## Footnote The sum can be computed in parallel using a reduction clause.

What is the **dependency type** in the following code snippet: for (i = 2; i < n; i++) { a[i] = a[i-2] + b[i]; }

2-way parallel ## Footnote There are dependencies, but they can be parallelized in two ways.

What is the **dependency type** in the following code snippet: for (i = 0; i < n; i++) { a[i] = a[i] + 1; }

Fully parallel ## Footnote Each iteration is independent of the others.

Chapter 4 Flashcards

(34 cards)