Caching
Caching is a very general technique for improving computer system
performance.
Based on the principle of locality of reference, it is used in a
computer's primary storage hierarchy, its operating system, networks,
and databases.
Caching improves access time and reduces data traffic to data sources
that have limited throughput.
For most of this web page only reads are considered.
A section at the end deals with issues that arise when writes are
considered.
Examples
-
primary storage caches - several levels of cache, including virtual
memory, for storage of data and instructions used by executing
programs
-
translation look-aside buffer - a cache for virtual memory page table
entries
-
track cache - cache used in many hard disks
-
database cache - cache used by a database management system
-
file system cache - cache used by an operating system for disk data
-
dynamic link library cache - cache used by an operating system to
hold dynamic link library subprograms
-
browser caches - cache used in a browser for storage of web pages,
style sheets, and scripts
-
web cache - cache used by some internet service providers for web
pages, style sheets, and scripts
-
DNS cache - cache used in the Domain Name System (DNS) for
translating host names to Internet Protocol (IP) addresses
In addition, caching can be used in dynamic programming algorithms for
computed data, and in just-in-time compilation for machine code
translations of functions or methods.
Primary Storage Hierarchies
Modern computer primary storage hierarchies use multiple levels of
caching in order to bridge the gulf between processor speeds and
main memory and disc speeds.
Older desktops and laptops systems generally had two cache levels.
Servers and recent desktops and laptops have had three cache levels.
Older Primary Storage Hierarchies
Level |
Controller |
Storage |
Technology |
L1 Cache |
on-chip |
on-chip |
SRAM |
L2 Cache |
motherboard |
motherboard |
SRAM |
Main Memory |
motherboard |
motherboard |
DRAM |
Virtual Memory |
operating system |
disk |
magnetic |
Recent Primary Storage Hierarchies
Level |
Controller |
Storage |
Technology |
L1 Cache |
on-chip |
on-chip |
SRAM |
L2 Cache |
on-chip |
on-chip |
SRAM |
L3 Cache |
on-chip |
motherboard |
SRAM |
Main Memory |
motherboard |
motherboard |
DRAM |
Virtual Memory |
operating system |
disk |
magnetic |
Locality of Reference
The principles of locality are facts about data and instruction
accesses that arise from patterns that we frequently use for storing
and accessing information.
It has two aspects: temporal locality and spatial locality.
The Principle of Temporal Locality
Recently accessed data and instructions are likely to be accessed
in the near future.
We expect temporal locality with regard to instructions because a
large part of execution time is spent in repetitive code such as
loops and recursions.
We expect temporal locality with regard to data for a variety of
reasons.
One example is accessing control and summary variables inside loops
such as i
and sum
in the following loop.
sum = 0;
for (int i = 0; i < 10; i++) {
sum = sum + A[i];
}
The Principle of Spatial Locality
Data and instructions close to recently accessed data and
instructions are likely to be accessed in the near future.
The phrase "close to" has different meanings in different contexts.
In the case of primary memory caches, "close to" refers to the
proximity of memory addresses.
For disk caching, "close to" refers to proximity of data location on
a disk.
We expect spatial locality with regard to instructions because the
normal execution of instructions is sequential.
We expect spatial locality with regard to data because in programs,
especially scientific programs where execution time is critical, much
of the execution time is spent in sequential processing of arrays.
For example, consider the access to the array A
in the
following loop.
sum = 0;
for (int i = 0; i < 10; i++) {
sum = sum + A[i];
}
The Combined Principle of Locality
Recently accessed data and instructions and nearby data and
instructions are likely to be accessed in the near future.
Cache Organization
A cache level sits between a data requester and a large, slow
data source (the lower level).
It uses a small but fast local store.
Operation
The following algorithm is used by a cache controller in order to take
advantage of both temporal and spatial locality.
if the requested data is in the local store // cache hit
return the requested data // fast
else // cache miss
get the requested and nearby data from the lower level // slow
save it in the local store
return the requested data
Cache operation is transparent: the request semantics
through the cache is the same as direct access to the lower level.
Cache Benefits
-
A cache reduces the effective access time to data in the lower-level
store.
If most of the data accesses are cache hits then the effective access
time is close to the access time of the cache's local store.
-
A cache reduces the traffic to the lower-level store.
Cache hits are not passed on to the lower-level store.
This is important for dealing with throughput limitations of the
lower-level store.
Cache Writes
When writes are considered there are two additional questions to be
considered in cache design.
-
On a write miss, should the cache allocate a cache line?
Most modern caches do allocate a cache line.
-
On a write hit, should the cache immediately update the lower level
(write-through) or just mark the line in the local store as
"dirty" and update the lower level when the line is replaced
(write-back)?
Write-through is simpler to implement, but write-back reduces
lower-level traffic.
Over the years, the trend has been towards more use of write-back.
This is especially true with modern multi-core processors.
Also, the simplest cache designs make it impossible to satisfy a new
request until a cache write completes.
More modern caches use a small write queue in front of the lower-level
store, allowing the cache to proceed with processing hits without
waiting for writes to complete.