Describe the program transformation Loop interchange (for a sequential program).
How does it affect performance on modern processor architectures? Give a general
rule for when its application should be beneficial for performance, and a general
rule for when it is safely applicable
a
void seqfolr( float *x, float *a, float *b, int N )
{
int i;
x[0] = b[0];
for (i=1; ia
#define N 100000 double x[N]; // array of double precision floats ... for (i=8; i
a
Why is it, in general, so hard for C/C++ compilers to statically analyze a given
sequential legacy program and parallelize it automatically?
a
Consider the following sequential loop: #define N 100000 double x[N]; // array of double precision floats ... for (i=8; i
a
What is auto-tuning, where could it be applied, and what is the main motivation for
it? (2p)
a
Name and shortly describe two different loop transformations that can improve the
cache hit rate of a loop, and explain why.
a