Exercise 5.
The Intel Xeon X5650 processor has the following characteristics,
taken from /proc/cpuinfo:
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
stepping : 2
microcode : 0x1f
cpu MHz : 1600.021
L3 cache size : 12288 KB
physical id : 0
siblings : 12
core id : 0
cpu cores: 6
{...}
clflush size : 64
cache line size : 64
Consider the following two functions, which each increment the values in an array by 100.
void incrementVector1(INT4* v, int n) {
for (int k = 0; k < 100; ++k) {
for (int i = 0; i < n; ++i) {
v[i] = v[i] + 1;
}
}
}
void incrementVector2(INT4* v, int n) {
for (int i = 0; i < n; ++i) {
for (int k = 0; k < 100; ++k) {
v[i] = v[i] + 1;
}
}
}
The following data collected using the perf utility captures runtime information for executing
each of these functions on the Intel Xeon X5650 processor for various data sizes. In this data:
? the program vector1.bin executes the function incrementVector1;
? the program vector2.bin executes the function incrementVector2;
? the programs take a command line argument which sets the value of n;
? both programs begin by allocating an array of size n and initializing all elements to 0.
? LLC-loads means “last level cache loads”, the number of accesses to L3;
? LLC-load-misses means “last level cache misses”, the number of L3 cache misses.
Runtime performance of vector1.bin.
Performance counter stats for ’./vector1.bin 1000000’:
230,070 LLC-loads
3,280 LLC-load-misses # 1.43% of all LL-cache references
0.383542737 seconds time elapsed
Performance counter stats for ’./vector1.bin 3000000’:
669,884 LLC-loads
242,876 LLC-load-misses # 36.26% of all LL-cache references
1.156663301 seconds time elapsed
Performance counter stats for ’./vector1.bin 5000000’:
1,234,031 LLC-loads
898,577 LLC-load-misses # 72.82% of all LL-cache references
1.941832434 seconds time elapsed
Performance counter stats for ’./vector1.bin 7000000’:
1,620,026 LLC-loads
1,142,275 LLC-load-misses # 70.51% of all LL-cache references
2.621428714 seconds time elapsed
Performance counter stats for ’./vector1.bin 9000000’:
2,068,028 LLC-loads
1,422,269 LLC-load-misses # 68.77% of all LL-cache references
3.308037628 seconds time elapsed
Runtime performance of vector2.bin.
Performance counter stats for ’./vector2.bin 1000000’:
16,464 LLC-loads
1,168 LLC-load-misses # 7.049% of all LL-cache references
0.319311959 seconds time elapsed
Performance counter stats for ’./vector2.bin 3000000’:
42,052 LLC-loads
17,027 LLC-load-misses # 40.49% of all LL-cache references
0.954854798 seconds time elapsed
Performance counter stats for ’./vector2.bin 5000000’:
63,991 LLC-loads
38,459 LLC-load-misses # 60.10% of all LL-cache references
1.593526338 seconds time elapsed
Performance counter stats for ’./vector2.bin 7000000’:
99,773 LLC-loads
56,481 LLC-load-misses # 56.61% of all LL-cache references
2.198810471 seconds time elapsed
Performance counter stats for ’./vector2.bin 9000000’:
120,456 LLC-loads
76,951 LLC-load-misses # 63.88% of all LL-cache references
2.772653964 seconds time elapsed
Question 1: Consider the cache miss rates for vector1.bin. Between the vector sizes
1000000 and 5000000, the cache miss rate drastically increases. What is the cause of this
increase in cache miss rate?
Question 2: Consider the cache miss rates for both programs. Notice that the miss rate
between the two programs is roughly equal for any particular array size. Why is that?
question from:
https://stackoverflow.com/questions/65952556/what-is-the-answer-about-this-cache-miss-exercise