Caching

Locality

Programs tend to reuse data and instructions near those they have used recently, or that were recently referenced themselves. The former refers to spatial locality, the latter refers to temporal locality.
For example, in the following case, sum is a variable referenced again and again in each iteration, so this is temporal locality; whereas each arr ‘s element is referenced one by one in succession, which is spatial locality.

1
2
3
for (int i=0; i<10; i++) {
sum += arr[i];
}

Good programmer should be able to utilize principles of locality (loop interchange, loop fusion)

Memory Hierarchies

For memory and storage system, (faster, smaller, more expensive) vs. (slower, larger, less expensive). Since we cannot achieve both in same level of memory,
the idea of memory hierarchy comes. In brief, memory hierarchy creates a pool of storage that costs as much as cheap storage near the bottom but serves data to
programs at the rate of fast storage near the top.

Principles of locality underlies the effect of memory hierarchy. Programs tend to access data at k level more often than k+1 level, which owes to spatial locality,
whereas temporal locality plays another role: when one data element is brought to cache and is referenced, its neighboring data elements will also be brought into cache.

Caching

Directly mapped

suppose $$ m $$-bit memory, $$ 2^k $$ blocks, each block have $$ 2^n $$ bytes,

1
2
3
4
5
   tag    index  offset
[ m-k-n ][ k ][ n ]
BLOCK_ADDR = MEM_ADDR >> n
INDEX = (MEM_ADDR >> n) % 2^k
OFFSET = MEM_ADDR % 2^n

Set associative

suppose $$ m $$-bit memory, $$ 2^s $$ sets, each block have $$ 2^n $$ bytes,

1
2
3
4
5
    tag     index  offset
[ m-s-n ][ s ][ n ]
BLOCK_ADDR = MEM_ADDR >> n
SET_INDEX = (MEM_ADDR >> n) % 2^s
OFFSET = MEM_ADDR % 2^n

Therefore, 1-set associative cache is equal to directly mapped cache, and if a cache has $$ 2^k $$ blocks, then $$ 2^k $$ set associative cache would be same as a fully associative cache.

Although directly-mapped caches is easy to impelement, it has disadvantanges. There are several memory addresses which can be mapped into same block, which results in many misses. Another end of spectrum is fully-associative caches. The idea behind is that memory address can be mapped to any block. But in this way, we need to check all bits of memory address, and the tag of every cache block. Set-associative caches combine the advantages of both.

https://courses.cs.washington.edu/courses/cse378/09wi/lectures/lec16.pdf