Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
964 views
in Technique[技术] by (71.8m points)

c - Cache misses when accessing an array in nested loop

So I have this question from my professor, and I can not figure out why vector2 is faster and has less cache misses than vector1.

Assume that the code below is a valid compilable C code.

Vector2:

void incrementVector2(INT4* v, int n) {
     for (int k = 0; k < 100; ++k) {
          for (int i = 0; i < n; ++i) {
               v[i] = v[i] + 1;
          }
     }
}

Vector1:

void incrementVector1(INT4* v, int n) {
     for (int i = 0; i < n; ++i) {
          for (int k = 0; k < 100; ++k) {
               v[i] = v[i] + 1;
          }
     }
}

NOTE: INT4 means the integer is 4 Bytes in size.

In terms of CPU specs: cache size = 12288KB, line size=64B and only considering this single cache interacting with main memory.

Question

Why does vector2 have a faster runtime than vector1? And why vector1 will have more cache misses than vector2?

Me and a few classmates worked on this for a while and couldn't figure it out. We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time? we are not sure if this is correct, and also not sure how to bring cache lines in to this either.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We thought it could be due to the fact that vector2 has better spatial locatlity, since the array is been accessed by the inner loop constantly, while in vector1, only one element is accessed at a time?

Well, both codes have the same accessing pattern, iterating over the array v with a stride of 1. Cache spacial-locality-wise both codes are the same. However, the second code:

void incrementVector1(INT4* v, int n) {
     for (int i = 0; i < n; ++i) {
          for (int k = 0; k < 100; ++k) {
               v[i] = v[i] + 1;
          }
     }
}

Has a better temporal-locality because you access the same element 100 times, whereas in:

void incrementVector2(INT4* v, int n) {
     for (int k = 0; k < 100; ++k) {
          for (int i = 0; i < n; ++i) {
               v[i] = v[i] + 1;
          }
     }
}

you only access it once on every 'n' iterations.

So either you did a mistake, your teacher is playing some kind of strange game or I am missing something obvious.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...