

| Computer Architecture                                                                                                                                                                                                                                   |                                                                                                                                                                                                              |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Speed<br>Cost                                                                                                                                                                                                                                           | 6.1 The memory hierarchy (cont'd):                                                                                                                                                                           |  |  |
|                                                                                                                                                                                                                                                         | There are memories of different speeds, costs and sizes.                                                                                                                                                     |  |  |
| / Cache<br>Memory                                                                                                                                                                                                                                       | Faster memory is more expensive.                                                                                                                                                                             |  |  |
| Main<br>Memory<br>DISK<br>→ Size                                                                                                                                                                                                                        | To <u>increase the average speed</u> and (at the same time) to<br><u>decrease the cost</u> of the memory system, different types<br>of memories are organized in the form of multilevel<br>memory hierarchy. |  |  |
|                                                                                                                                                                                                                                                         | A faster memory (SRAM), namely the cache memory, is inserted between the main memory and the CPU.                                                                                                            |  |  |
| The CPU can directly                                                                                                                                                                                                                                    | access the main memory and cache memory (internal).                                                                                                                                                          |  |  |
| However, data on the                                                                                                                                                                                                                                    | disk must first be fetched into main (or cache) memory).                                                                                                                                                     |  |  |
| <b>The goal:</b> To provide a memory system with a <u>high speed</u> that is close to the fastest level of memory and <u>cost per byte low</u> as the cheapest level.<br>This solution takes the advantage of the <b>locality</b> of computer programs. |                                                                                                                                                                                                              |  |  |
| http://akademi.itu.edu.tr/en/buzlu<br>http://www.buzluca.info                                                                                                                                                                                           | ca 0000 2013 - 2020 Feza BUZLUCA 6.2                                                                                                                                                                         |  |  |

| Computer Architecture                                                                                                                                                                              |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| 6.2 The principle of locality (Locality of reference):                                                                                                                                             |  |  |  |
| Programs tend to reuse instructions and data they have used recently.                                                                                                                              |  |  |  |
| Observation: A typical program spends 90% of its execution time in only 10% of the code.                                                                                                           |  |  |  |
| We can predict what instructions and data a program will use in the near future based on its access in the recent past.                                                                            |  |  |  |
| There are two types of locality:                                                                                                                                                                   |  |  |  |
| <b>Temporal Locality</b> (Locality in time): Recently accessed addresses are likely to be accessed in the near future.                                                                             |  |  |  |
| <b>Spatial Locality</b> (Locality in space) : After an access to a memory address the next access will be likely to a nearby address.                                                              |  |  |  |
| Reasons for locality:                                                                                                                                                                              |  |  |  |
| Structure of the program: The execution of instructions follows a sequential order. Program segments, such as routines and macros, tend to be stored in the same neighborhood of the memory space. |  |  |  |
| In addition, related data is stored in nearby locations.                                                                                                                                           |  |  |  |
| Loops                                                                                                                                                                                              |  |  |  |
| Arrays                                                                                                                                                                                             |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca                                                                                                                                                               |  |  |  |

http://www.buzluca.info



| Computer Architecture                                                                                                                                                                                                                |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 6.4 Memory Technologies                                                                                                                                                                                                              |  |  |
| RAM (Random-Access Memory):                                                                                                                                                                                                          |  |  |
| Actually, referring to only this type of memory as "random access" is a misuse of<br>the term because all of the semiconductor (electronic) memories used in<br>computer systems are random access (not sequential).                 |  |  |
| Distinguishing characteristic of RAM:                                                                                                                                                                                                |  |  |
| • CPU can access the memory to read data and to write new data directly (easily and rapidly without an additional device).                                                                                                           |  |  |
| These operations are accomplished through the use of electrical signals.                                                                                                                                                             |  |  |
| <ul> <li>It would be more precise to call this type of memory as "Read-Write Memory".</li> <li>RAM is volatile. If the power is interrupted, then the data are lost.<br/>Thus, RAM can be used only as temporary storage.</li> </ul> |  |  |
| Memory latency measures:                                                                                                                                                                                                             |  |  |
| <ul> <li>Access time: Time between write/read request and when desired word has<br/>been stored or made available for use</li> </ul>                                                                                                 |  |  |
| <ul> <li>Cycle time: Minimum time between two unrelated requests to memory</li> </ul>                                                                                                                                                |  |  |
| There are two types of RAM; DRAM (Dynamic RAM) and SRAM (Static RAM).                                                                                                                                                                |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.5                                                                                                                                         |  |  |

| mputer Architecture                                                                                                                                                                         |    |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| DRAM (Dynamic RAM):                                                                                                                                                                         | I  |
| It is used to implement the <b>main memory</b> .                                                                                                                                            |    |
| A dynamic RAM (DRAM) is made with cells that <u>Address line</u><br>store data as charge on capacitors.                                                                                     |    |
| Because capacitors have a natural tendency to<br>discharge, dynamic RAMs require periodic charge Transistor Storage<br><b>refreshing</b> to maintain data storage (~ every 8ms). T capacito |    |
| The stored charge leaks away, even with power $\bot$ $\_$ continuously applied (The term <i>dynamic</i> refers to Bit line Ground this tendency ).                                          |    |
| During the refresh process, the memory is unavailable (latency). (-) $(-)$                                                                                                                  | .) |
| Must be re-written after being read. Reading destroys the information (latency)                                                                                                             | ). |
| Difference between access time and cycle time. Cycle time > access time (-)                                                                                                                 |    |
| Cheap and dense: one transistor/bit. More bits can be placed on one chip. (+)                                                                                                               |    |
| ne improvements:                                                                                                                                                                            |    |
| ,<br>SDRAM (Synchronous DRAM): Added clock to DRAM interface. Burst mode.                                                                                                                   |    |
| DDRAM (Double data rate DRAM): Data is transferred on both the rising edge and falling edge.                                                                                                |    |
| tp://akademi.itu.edu.tr/en/buzluca<br>tp://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.6                                                                                                    |    |



|                                                                                                                                                                                                                    | Associative Memory / Content Addressable Memory (CAM):                                 |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|--|
| •                                                                                                                                                                                                                  | Random access memory (SRAM) + circuitry for search                                     |  |
| •                                                                                                                                                                                                                  | It consists of SRAM to store data and digital circuits for parallel search.            |  |
| <ul> <li>It is used in high speed searching applications.</li> </ul>                                                                                                                                               |                                                                                        |  |
|                                                                                                                                                                                                                    | Cache memory is one of the primary uses for associative memory.                        |  |
| <ul> <li>A Content Addressable Memory (CAM) is an SRAM-based memory, which can<br/>be accessed in parallel to search for a given search word, providing as result<br/>the address of the matching data.</li> </ul> |                                                                                        |  |
| •                                                                                                                                                                                                                  | A word is retrieved based on a portion of its contents rather than its address.        |  |
| <ul> <li>The user supplies a data word (Argument A) (not the address), and the CAM<br/>searches its entire memory simultaneously (not sequentially) to see if that<br/>word is stored anywhere in it.</li> </ul>   |                                                                                        |  |
| •                                                                                                                                                                                                                  | The user also supplies a key value (K) to determine the portion of data to search for. |  |
| <ul> <li>If the search data (or the required portion) is found in a row of the memory,<br/>then the match bit for that row is set, and the data stored in this row is<br/>output.</li> </ul>                       |                                                                                        |  |







| Computer Architecture 6.5.1 Cache Memory Principles (cont'd)                                                                                            |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| Block transfer:                                                                                                                                         |  |  |  |
| Upon a cache miss (the requested element is not found in cache), <b>a <i>block</i> of</b>                                                               |  |  |  |
| elements that contains the requested element is brought (copied) from main                                                                              |  |  |  |
| memory to cache memory.                                                                                                                                 |  |  |  |
| Reason for block (instead of a single word) transfer: spatial locality.                                                                                 |  |  |  |
| The next element to be requested will most likely be located near the currently                                                                         |  |  |  |
| requested element.                                                                                                                                      |  |  |  |
| The disadvantage of transferring a block instead of a single element is that it                                                                         |  |  |  |
| takes longer.                                                                                                                                           |  |  |  |
| To reduce the block transfer time between main and cache memories, the memory                                                                           |  |  |  |
| interleaving technique is used.                                                                                                                         |  |  |  |
| Data are stored in different memory modules; that is, consecutive memory addresses are located in successive memory modules, so they can be accessed at |  |  |  |
| the same time (similar to RAID, section 7.2).                                                                                                           |  |  |  |
| <u>M7 M6 M5 M4 M3 M2 M1 M0</u>                                                                                                                          |  |  |  |
| Interleaved                                                                                                                                             |  |  |  |
| consecutive B7 B6 B5 B4 B3 B2 B1 B0 main memory                                                                                                         |  |  |  |
|                                                                                                                                                         |  |  |  |
| One block of                                                                                                                                            |  |  |  |
| cache memory                                                                                                                                            |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca                                                                                                                    |  |  |  |

| Computer Architecture                                                                                                          | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                   |  |
|--------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--|
| 6.5.1 Cache Memo                                                                                                               | <b>pry Principles</b> (cont'd)                                                                |  |
| Replacement Techniques:                                                                                                        |                                                                                               |  |
|                                                                                                                                | ory is lower than that of main memory.<br>of the data in main memory can be kept in cache     |  |
|                                                                                                                                | a replacement algorithm is needed to determine<br>replaced by the new block from main memory. |  |
| There are different replacement techniques; the most common ones are:                                                          |                                                                                               |  |
| <ul> <li>FIFO (First In, First Our replaced.</li> </ul>                                                                        | t): The block that has been in the cache the longest is                                       |  |
| <ul> <li>LRU (Least Recently Used): The block that has been used the least while residing in the cache is replaced.</li> </ul> |                                                                                               |  |
| The access history of each block is taken into consideration.                                                                  |                                                                                               |  |
| An <i>aging counter</i> is assigi<br>block in cache.                                                                           | ned to each block to keep track of references to that                                         |  |
| A <b>hardware unit</b> , called the Cache Memory Controller or Cache Memory<br>Management Unit, performs cache operations.     |                                                                                               |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info                                                                | 2013 - 2020 Feza BUZLUCA 6.13                                                                 |  |

| Computer Architecture                                                                                                                                                                                                                                                                                                        |                                                                                                                                                     |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>6.5.2 Cache Memory A</li> <li>Is a content of main memory co</li> <li>If present, where in cache mer</li> </ul>                                                                                                                                                                                                     | urrently present in cache memory?                                                                                                                   |
| <ol> <li>Full Associative Mapping         <ul> <li>a) Without blocks:</li> <li>In practice, all mapping techniques</li> </ul> </li> </ol>                                                                                                                                                                                    |                                                                                                                                                     |
| <ul> <li>Method: The most frequently reference associative memory.</li> <li>The address generated by the CPU is searched for in the cache (content addressable memory).</li> <li>If there is a hit, data is read from cache.</li> <li>If a miss occurs, data is read from main memory and also copied into cache.</li> </ul> | erenced addresses and their data are kept in an<br>Valid<br>Bits<br>(V)<br>1<br>A000 02<br>Hit<br>A001 3A<br>Hit<br>00C0 54<br>1<br>0400 A1<br>Data |
| Without blocks, the technique ben locality.                                                                                                                                                                                                                                                                                  | efits from only temporal locality but not spatial<br>s are moved between main and cache memories.                                                   |



| Computer Architecture                                                                                                                                                                                                                                                                |                  |                                                  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|--------------------------------------------------|--|
| Example: Full Associative Mapping                                                                                                                                                                                                                                                    |                  |                                                  |  |
| Main Memory: 256K x words<br>Block size: 16 wordsAddress: a = 18 bits<br>w = 4 bits Main memory contains 214 blocks. b = 14<br>Cache Memory: 2K x words<br>data can be stored.Cache Memory: 2K x words<br>data can be stored.Cache memory contains 27 = 128 frames. f = 7<br>18 bits |                  |                                                  |  |
| Main memory address:                                                                                                                                                                                                                                                                 | Block Number     | Word number                                      |  |
| Associat<br>search<br>V Tag (14 E<br>128<br>tags                                                                                                                                                                                                                                     |                  | 4 bits<br>Block 0<br>Block 1<br>:<br>Block 16383 |  |
| (associative)                                                                                                                                                                                                                                                                        | (SRAM)<br>Memory | Main Memory (DRAM)                               |  |
| http://akademi.itu.edu.tr/en/buzluc<br>http://www.buzluca.info                                                                                                                                                                                                                       | a 💽 🛈            | 80 2013 - 2020 Feza BUZLUCA 6.16                 |  |



| Computer Architecture                                                                                                                                                                                                                                                                                                                                                               |                                                                                |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|--|
| <b>Example:</b> Array in a cache memory system<br>Main Memory: 256K x words Cache Memory: 2K x words Block size: 16 words<br><b>Case A)</b> A program in this system accesses an array with the starting address<br>\$00002 and a size of 10 words. The array is not in cache memory.<br>Assume: When the CPU starts accessing the array, the least recently used frame is<br>F #1. |                                                                                |  |
| Starting address of the array: 00 0000 0000 0000 000<br>End address of the array: 00 0000 0000 0000 10                                                                                                                                                                                                                                                                              |                                                                                |  |
| Frame 0<br>Tag:<br>\$00002<br>\$0000B<br>\$0000B<br>\$0000B                                                                                                                                                                                                                                                                                                                         | When the CPU accesses the first word of the array (\$00002), a miss occurs.    |  |
| Frame 1 10 words 4 Block 1                                                                                                                                                                                                                                                                                                                                                          | worus, the cuche munuyement                                                    |  |
| transferred.                                                                                                                                                                                                                                                                                                                                                                        | system transfers Block 0 in<br>its entirety (16 words) to<br>Frame 1.          |  |
| Cache Memory Main Memory                                                                                                                                                                                                                                                                                                                                                            | When the CPU accesses the<br>next 9 elements of the array,<br>hits will occur. |  |
|                                                                                                                                                                                                                                                                                                                                                                                     | In total: 1 miss, 9 hits.                                                      |  |
| http://akademi.itu.edu.tr/en/buzluca                                                                                                                                                                                                                                                                                                                                                | 2013 - 2020 Feza BUZLUCA 6.18                                                  |  |

| Computer Architecture                                                                                                                                | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| <b>Example:</b> Array in a cache memory system (cont'd)                                                                                              |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |
| <b>Case B)</b> A program in this system<br>\$0000A and a size of 10 words. Th<br>Assume that, the least recently u<br>Starting address of the array: | used frames are Frames #0 and #2.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |  |  |  |
| 000000000000000000000000000000000000                                                                                                                 | Block 0<br>Block 1<br>Block |  |  |  |
| Cache Memory                                                                                                                                         | Main Memory                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |  |  |  |
| Although the size of the array is smaller than the block (frame) size, it occupies two frames in cache memory.                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info                                                                                      | COSO 2013 - 2020 Feza BUZLUCA 6.19                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |  |  |  |

| Computer Architecture                                                                                                                                                         |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| Example: Array in a cache memory system (cont'd)                                                                                                                              |  |  |
| • When the CPU accesses the first word of the array (\$0000A), a miss occurs.                                                                                                 |  |  |
| <ul> <li>The cache management system transfers Block 0 in its entirety (16 words) to<br/>Frame 0.</li> </ul>                                                                  |  |  |
| <ul> <li>Next 5 accesses to the array generate hits.</li> </ul>                                                                                                               |  |  |
| • When the CPU accesses the 7th word of the array (\$00010), a miss occurs because this word is in another block and its address has a new tag value.                         |  |  |
| • The cache management system transfers Block 1 with its entirety (16 words) to Frame 2 (because of LRU).                                                                     |  |  |
| <ul> <li>Next 3 accesses to the array generate hits.</li> </ul>                                                                                                               |  |  |
| • In total: 2 misses, 8 hits                                                                                                                                                  |  |  |
| If the starting address of the array was selected properly (for example \$00000),<br>the array would occupy only one block and one frame in the cache memory as in<br>case A. |  |  |
| Therefore, placement policies for arrays in main memory affect the performance of cache systems.                                                                              |  |  |
|                                                                                                                                                                               |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.20                                                                                 |  |  |

| Computer Architecture                                                                                         |                                                                             |                                                                                         |          |  |
|---------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------------------------------------------------------------------|----------|--|
| 2. Direct Mapping                                                                                             |                                                                             |                                                                                         |          |  |
| An incoming main memory block is always placed into a specific, fixed cache frame location.                   |                                                                             |                                                                                         |          |  |
| It is not necessary to search for the location of a block in the cache because it is predetermined and fixed. |                                                                             |                                                                                         |          |  |
| Therefore, associative mem                                                                                    | ory is not necessary.                                                       |                                                                                         |          |  |
| As the size of main memory<br>main memory map to the san                                                      |                                                                             | f the cache, several b                                                                  | locks of |  |
| It is necessary to determine which main memory block is currently residing in a frame.                        |                                                                             |                                                                                         |          |  |
| The cache memory control unit divides the address from the CPU into three fields:<br>a bits                   |                                                                             |                                                                                         |          |  |
| ,∘ Tag                                                                                                        | Cache Frame number                                                          | Word number                                                                             |          |  |
| a-(f+w) bi                                                                                                    | its fbits •                                                                 | w bits                                                                                  |          |  |
| It indicates which of the<br>blocks that can be placed<br>in this frame is currently<br>in cache.             | will hold this data.<br>The blocks that have th<br>reside in the same frame | ne number of the cache f<br>is filed in common (same<br>e.<br>side in the cache any giv | ) try to |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info                                               |                                                                             | 2013 - 2020 Feza BUZLU                                                                  |          |  |

| Computer Arc                                                                                                                                | hitecture                           |                                                     |                                                          |  |
|---------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|-----------------------------------------------------|----------------------------------------------------------|--|
|                                                                                                                                             | a bit                               |                                                     |                                                          |  |
| Tag                                                                                                                                         | Cache Frame n                       | um. Word num.                                       | Main memory                                              |  |
| a-(f+w) bi                                                                                                                                  | t fbit                              | w bit                                               | 2 <sup>b</sup> blocks                                    |  |
|                                                                                                                                             | V Tag                               | g: a-(f+w) bit<br>Frame 0                           | Block 0 One block 2 <sup>w</sup> words                   |  |
| The frame                                                                                                                                   | e is 🔔 🗆 🗖                          | <b>*</b> ``                                         | Block 1                                                  |  |
| determine                                                                                                                                   | d directly.                         | Frame 1                                             | : 28                                                     |  |
| i = B mod                                                                                                                                   | F                                   | :                                                   | $\frac{\text{Block } 2^{f}}{\text{Block } 2^{f}}$        |  |
| i : Frame nu                                                                                                                                |                                     | Frame 2 <sup>f</sup> -1                             |                                                          |  |
| B: Block nui<br>F: Number o                                                                                                                 |                                     | F=2 <sup>f</sup> frames                             | Block 2 <sup>b</sup> -1  <i>)</i>                        |  |
| F. Number $GF=2^{f}$                                                                                                                        | of frames                           |                                                     |                                                          |  |
| A main memory block can only reside in the cache frame with the number that is determined by the "Cache Frame number" filed of the address. |                                     |                                                     |                                                          |  |
|                                                                                                                                             |                                     | empty) frames in the c<br>Id cannot be in the ca    | cache, two blocks with the same<br>che at the same time. |  |
|                                                                                                                                             |                                     | sion algorithm is not n<br>coming block is fixed, t | ecessary.<br>his frame will be replaced.                 |  |
| <b>Associativ</b><br>cache.                                                                                                                 | ve memory is no                     | t necessary because t                               | here is no need to search in the                         |  |
| http://akademi<br>http://www.buz                                                                                                            | .itu.edu.tr/en/buzluca<br>luca.info |                                                     | 2013 - 2020 Feza BUZLUCA 6.22                            |  |

| Example (Direct Mapping):                                                                    |                                                                 |                                                                                                  |                                                                                                            |
|----------------------------------------------------------------------------------------------|-----------------------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| Main Memory: 256K x word<br>Block size: 16 words<br>Cache Memory: 2K x word<br>data capacity | w = 4 bits                                                      |                                                                                                  | ontains 2 <sup>14</sup> blocks. b = 1<br>=128 frames. f = 7                                                |
|                                                                                              |                                                                 | 18 bits                                                                                          |                                                                                                            |
| Main memory address:                                                                         | Tag                                                             | Frame number                                                                                     | Word number                                                                                                |
| - L                                                                                          | 7 bits                                                          | 7 bits                                                                                           | 4 bits                                                                                                     |
|                                                                                              | ie ronowing                                                     | two addresses t                                                                                  | ry to reside in the same                                                                                   |
| cache frame.<br>Tag Frame num. Word num.<br>0000000 0000000 XXXX                             | The "Fran<br>the same:<br>At a spec                             | ne number" fields<br>0000000. They w                                                             | ry to reside in the same<br>s of both addresses are<br>vill be placed in Frame C<br>only one of these data |
| cache frame.<br>Tag Frame num. Word num.                                                     | The "Fran<br>the same:<br>At a spec<br>can be in<br>urrently in | ne number" fields<br>0000000. They w<br>ific point in time,<br>cache memory.<br>Frame 0 of cache | s of both addresses are<br>vill be placed in Frame (<br>only one of these data<br>e memory, the tag value  |



٦



| Computer Architecture                                                             |                                                                                            |                                                                                                  |  |  |
|-----------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------|--|--|
| 3. Set Associative Mapping                                                        |                                                                                            |                                                                                                  |  |  |
| This method is a compromise between direct mapping and full associative mapping.  |                                                                                            |                                                                                                  |  |  |
| The cache is divided into a nu<br>frames.                                         | mber of <u>set</u> s, and e                                                                | each set consists of a number of                                                                 |  |  |
| the equation Snum = B mod S                                                       | , where S is the nun<br>nd Snum is the spec<br>aps to any frame in<br>letermine the set (f | fixed).                                                                                          |  |  |
| The cache controller divides t                                                    | the address issued                                                                         | by the CPU into three fields:                                                                    |  |  |
| a bits                                                                            |                                                                                            |                                                                                                  |  |  |
| Tag Cache Set number Word number                                                  |                                                                                            |                                                                                                  |  |  |
| , a-(s+w) bits                                                                    | s s bits°、                                                                                 | w bits                                                                                           |  |  |
| Tag is used to search the<br>block in within the determined<br>set (associative). |                                                                                            | This field is used to identify<br>the specific cache set that<br>should hold the targeted block. |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info                   |                                                                                            | 2013 - 2020 Feza BUZLUCA 6.26                                                                    |  |  |

Г



| Computer Architecture                                                                                                                                                                                                                                                                                                                                                                                                                                            |            |            |         |                        |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|------------|---------|------------------------|
| Computer Architecture         Example (Set Associative):         Main Memory: 256K × words       Address: a = 18 bits         Block size: 16 words       w = 4 bits         Main memory contains 2 <sup>14</sup> blocks.       b = 14         Cache Memory: 2K × words       Cache memory contains 2 <sup>7</sup> = 128 frames.         data capacity.       Given: Each set contains 2 frames.         Hence, the cache memory contains 64 sets.         18 bit |            |            |         |                        |
| Main memory address:                                                                                                                                                                                                                                                                                                                                                                                                                                             | Tag        | Set number | Word nu | mber                   |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 8 bits     | 6 bits     | 4 bits  |                        |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | T 0  .:+   |            |         | Main memory            |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | Tag: 8 bit | Frame 0    |         | Block 0                |
| Set 0                                                                                                                                                                                                                                                                                                                                                                                                                                                            |            | ***        | <br>    | Block 1                |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |            | Frame 1    |         | :                      |
| Set 1 _                                                                                                                                                                                                                                                                                                                                                                                                                                                          | ][]<br>]   | Frame 2    |         | Block 64               |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |            | Frame 3    |         | :                      |
| ······                                                                                                                                                                                                                                                                                                                                                                                                                                                           | :          | : .        | ``      | Block 128              |
| Set 63                                                                                                                                                                                                                                                                                                                                                                                                                                                           |            | Frame 126  |         | :                      |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |            | Frame 127  |         | Block 16383            |
| http://akademi.itu.edu.tr/en/buzluc<br>http://www.buzluca.info                                                                                                                                                                                                                                                                                                                                                                                                   | a          |            | 2013 -  | 2020 Feza BUZLUCA 6.28 |

| Computer Architecture                                                                                                                                                                                                                           |     |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Summary of mapping techniques:                                                                                                                                                                                                                  |     |
| <ul> <li>Associative mapping is the most flexible and efficient technique because ar<br/>incoming main memory block can reside in any frame of the cache.</li> </ul>                                                                            | ۱   |
| The disadvantage of this technique is the high <i>hardware cost</i> due to the associative memory.                                                                                                                                              |     |
| • In <b>direct mapping</b> , the cache frame for each memory block is fixed.                                                                                                                                                                    |     |
| The main disadvantage is the <i>inefficient</i> use of the cache; a number of main memory blocks may compete for a cache frame even if there exist other em frames.                                                                             |     |
| This disadvantage decreases the hit ratio.                                                                                                                                                                                                      |     |
| The main advantage of this technique is its simplicity; no search is needed.                                                                                                                                                                    |     |
| No replacement mechanism is needed because block locations are fixed.                                                                                                                                                                           |     |
| <ul> <li>The cache utilization efficiency of the set associative mapping technique is<br/>expected to be moderate (i.e., not as high as the fully associative mapping<br/>technique and not as low as the direct mapping technique).</li> </ul> | ;   |
| The technique inherits the simplicity of the direct mapping technique in terr of determining the target set.                                                                                                                                    | ns  |
| By changing the number of frames in a set, we can bring it closer to one of t<br>other two techniques.                                                                                                                                          | he  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6                                                                                                                                                      | .29 |

| Computer Architecture                                                                                         |  |  |  |  |
|---------------------------------------------------------------------------------------------------------------|--|--|--|--|
| 6.5.3 Cache Memory - Main Memory Interactions                                                                 |  |  |  |  |
| $\rightarrow$ <b>Read (Hit)</b> : Data is read from cache memory.                                             |  |  |  |  |
| → Read (Miss):                                                                                                |  |  |  |  |
| a) Read-Through (RT):                                                                                         |  |  |  |  |
| While the data (block) is being brought from main memory to cache, it is also read by the CPU simultaneously. |  |  |  |  |
| Cache memory and main memory are accessed in parallel.                                                        |  |  |  |  |
| b) No Read-Through (NRT):                                                                                     |  |  |  |  |
| Data are first brought from main memory to cache memory, and then the CPU reads data from the cache.          |  |  |  |  |
| $\rightarrow$ Write (Hit):                                                                                    |  |  |  |  |
| a) Write-Through (WT):                                                                                        |  |  |  |  |
| In each write operation, data is written to cache and also to main memory.                                    |  |  |  |  |
| Disadvantage: It increases the access time.                                                                   |  |  |  |  |
| Advantage: It provides coherence between the cache frames and their counterparts in main memory.              |  |  |  |  |
|                                                                                                               |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.30                 |  |  |  |  |

| Computer Architecture                                           | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                |  |  |  |
|-----------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|--|--|--|
| $\rightarrow$ Write (Hit) (cont'd):                             |                                                                                                            |  |  |  |
| b) Write-Back (WB):                                             |                                                                                                            |  |  |  |
| Writes are done only to the cad                                 | che.                                                                                                       |  |  |  |
| A block is written back to main                                 | memory only when a replacement is needed.                                                                  |  |  |  |
| There are two types of write-b<br>write-back.                   | There are two types of write-back policies: Simple write-back and flagged write-back.                      |  |  |  |
| <ul> <li>Simple Write-Back (SWB):</li> </ul>                    |                                                                                                            |  |  |  |
| The replaced frame is always written back to main memory.       |                                                                                                            |  |  |  |
| It is not checked whether the frame was changed or not.         |                                                                                                            |  |  |  |
| <ul> <li>Flagged Write-Back (FWB):</li> </ul>                   |                                                                                                            |  |  |  |
|                                                                 | a bit, called the <i>dirty bit</i> , to indicate that at<br>een made to the block while residing in cache. |  |  |  |
|                                                                 | bit is checked: if it is set, then the block is otherwise, it is simply overwritten by the                 |  |  |  |
| The dirty bit is stored in the to                               | ag memory of the cache.                                                                                    |  |  |  |
|                                                                 |                                                                                                            |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca                            |                                                                                                            |  |  |  |
| http://akademi.itu.edu.tr/en/buziuca<br>http://www.buziuca.info | 2013 - 2020 Feza BUZLUCA 6.31                                                                              |  |  |  |

| Computer Architecture                                                                                                                           |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| $\rightarrow$ Write (Miss):                                                                                                                     |  |  |
| a) Write Allocate (WA):                                                                                                                         |  |  |
| The main memory block is updated and brought to cache.                                                                                          |  |  |
| a) No Write Allocate (NWA):                                                                                                                     |  |  |
| The missed main memory block is updated in main memory and not brought to cache.                                                                |  |  |
| If an attempt is made to read this block later, a miss will occur, and data will be brought to cache.                                           |  |  |
| The write-through (WT) policy can be used together with write-allocate (WA) or no-write-allocate (NWA) methods. WTWA, WTNWA                     |  |  |
| In write-back (WB) policy, to maintain coherence between cache and main memory at the beginning, the write-allocate (WA) method is used (WBWA). |  |  |
| Information held in Tag memory:                                                                                                                 |  |  |
| In addition to Valid (V) and tag bits, depending on the method used, the following data must be also kept in tag memory:                        |  |  |
| <ul> <li>If LRU is used aging counters,</li> </ul>                                                                                              |  |  |
| <ul> <li>If flagged Write-Back (FWB ) method is used "dirty" bit (D).</li> </ul>                                                                |  |  |
| A single line of a tag memory: V D Counter Tag                                                                                                  |  |  |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.32                                                   |  |  |



| Computer Architecture                                                    |                    |                      |                                         |                                                              |
|--------------------------------------------------------------------------|--------------------|----------------------|-----------------------------------------|--------------------------------------------------------------|
| 6.5.5 Access Time:                                                       |                    |                      |                                         |                                                              |
| ta: Average Memory Access Time                                           |                    |                      |                                         |                                                              |
| W: Write ratio (number of write accesses / total number of all accesses) |                    |                      |                                         |                                                              |
| h: Hitro                                                                 | atio               |                      |                                         |                                                              |
| t <sub>cache</sub> : Cache                                               | e memory o         | iccess time          |                                         |                                                              |
| t <sub>main</sub> : Main                                                 | memory ac          | cess time            |                                         |                                                              |
|                                                                          |                    |                      | en main memory and                      | d cache                                                      |
| Wd: The p                                                                | probability        | that a block in a    | cache is updated                        |                                                              |
| WT, RT/LT WB, WA, NRT/NLT                                                |                    |                      |                                         |                                                              |
| (Write-through , Parallel read/write) (Write-back, Serial read/write)    |                    |                      |                                         |                                                              |
| Probability NWA WA SWB FWB                                               |                    |                      |                                         |                                                              |
| Read Hit Access Time Access Time                                         |                    | Fime                 |                                         |                                                              |
| (1-w)h                                                                   | t <sub>cache</sub> | t <sub>cache</sub>   | t <sub>cache</sub>                      | t <sub>cache</sub>                                           |
| Read Miss                                                                |                    |                      | <b>.</b>                                | $w_d (2t_{trans} + t_{cache}) +$                             |
| (1-w)(1-h)                                                               | t <sub>trans</sub> | t <sub>trans</sub>   | 2t <sub>trans</sub> +t <sub>cache</sub> | $(1-w_d)(t_{trans}+t_{cache})$                               |
| Write Hit                                                                | L                  |                      |                                         |                                                              |
| wh                                                                       | t <sub>main</sub>  | t <sub>main</sub>    | t <sub>cache</sub>                      | t <sub>cache</sub>                                           |
| Write Miss                                                               |                    |                      |                                         | $W_d (2t_{trans} + t_{cache}) +$                             |
| w(1-h)                                                                   | t <sub>main</sub>  | $t_{main}+t_{trans}$ | 2t <sub>trans</sub> +t <sub>cache</sub> | (1-w <sub>d</sub> )(t <sub>trans</sub> +t <sub>cache</sub> ) |
| http://akademi.itu.edu.tr/en/buzluca                                     |                    |                      |                                         |                                                              |



| Computer Architecture                                                                                                                                                                                                                                    |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Exemplary processors with cache memories:                                                                                                                                                                                                                |
| • Intel386™: Cache memory is outside of the CPU chip. SRAM memory.                                                                                                                                                                                       |
| • Intel486™ (1989)<br>8-KByte on-chip (L1)                                                                                                                                                                                                               |
| <ul> <li>Intel<sup>®</sup> Pentium<sup>®</sup> (1993)</li> <li>L1 on-chip: 8 KB instruction, 8 KB data cache (Harvard architecture)</li> </ul>                                                                                                           |
| • Intel P6 Family: (1995-1999)                                                                                                                                                                                                                           |
| <ul> <li>Intel Pentium Pro:</li> <li>L1 on-chip: 8 KB instruction, 8 KB data cache (Harvard architecture)</li> <li>First L2 cache memory in the CPU chip.</li> <li>L2 on-chip: 256 KB. Different interconnections between L1, L2 and the CPU.</li> </ul> |
| <ul> <li>Intel Pentium II:</li> <li>L1 on-chip: 16 KB instruction, 16 KB data cache (Harvard architecture)</li> <li>L2 on-chip: 256 KB, 512 KB, 1 MB</li> </ul>                                                                                          |
| <ul> <li>Intel<sup>®</sup> Pentium<sup>®</sup> M (2003)</li> <li>L1 on-chip: 32 KB instruction, 32 KB data cache</li> <li>L2 on-chip: up to 2 MByte</li> </ul>                                                                                           |
| <ul> <li>Intel<sup>®</sup> Core<sup>™</sup> i9-9900 (2019)</li> <li>Multicore: 8 cores. Private caches (L1: 512KiB) and shared caches (L2: 2 MiB)</li> <li>L3: 16 MiB smartcache: All cores share this cache.</li> </ul>                                 |
| http://akademi.itu.edu.tr/en/buzluca<br>http://www.buzluca.info 2013 - 2020 Feza BUZLUCA 6.36                                                                                                                                                            |