Wednesday, January 30, 2008

Associativity in Modern CPU Cache

(Jan 30, 2008)

Recently, I have used OpenMP to perform multi-processing a lot. However, OpenMP may cause 'false share' often if we are not careful. False share is a situation that two threads write in a different memory locations, but unfortunately, the two memory locations are assigned the same cache slot. If this situation happens, performance will be degraded significantly.

Cache associativity will play an important role on this issue, especially if we have 8 cores or more.
So, let's look at cache associativity for some modern CPUs.

AMD Athlon 64 X2 has 2-way associative L1 cache and 16-way for L2 cache (ref).
AMD Phenom has 2-way associative L1 cache, 16-way for L2 and 32-way for L3 cache s (ref-page 4)

Intel Core 2 E4000 and E6000 series: 8-way associative L1 cache (ref-page 9) and from what I got from CPU-Z, it has 16-way associative L2 cache.

Intel Core 2 E8000 series: 8-way associative L1 cache and 24-way associative L2 cache (from CPU-Z).

So, I think CPUs from both manufactures should do well in scaling, but from what I got from Tom's hardware, Phenom scales very well and better than Core 2 Quad. I, however, cannot confirm this until both platform are more matured and more serious evaluation are available.

No comments: