Tutorial 8 Memory part 2

Question 1

Consider two alternate caches, each with 4 sets holding 1 block per set and one 32-bit word per block. One cache is direct mapped and the other is fully associative with LRU replacement policy. The machine is byte addressed with write back policy. What would the overall miss ratio be for the following address stream on the direct mapped cache? Assume the cache starts out completely invalidated.

../_images/read_write.png

Give an example address stream consisting of only reads that would result in a lower miss ratio if fed to the direct mapped cache than if it were fed to the fully associative cache.

Question 2

Assume that you have a computer with 1 clock cycle per instruction (1 CPI) when all accesses to memory are in cache. Except for load and store instruction, every instructions executes in exacly 1 cylce. LD/ST takes 1 cycle in case of a hit and 50 cycles to complete in the case of a miss. The load and store instruction consist of 25 % of the total number of instructions. The miss rate is 5 %. Determine the speedup obtained when there is no cache miss compared to the case when there are cache misses.

Question 3

Consider the following arm assembly program

 .global _start
  _start:
       vec: .word 3,4,5,7
       vec2: .word 1,2,3,4
       vec3: .word 6,7,8,9
       out: .space 8
       LDR R0, vec
       LDR R1, vec2
       LDR R3, out
       MOV R4, #0
       MOV R5, #4
       MOV R6, #0
       MOV R9, #0
 LOOP1:
       CMP R9, R5
       BGE LOOP2
       LDR R7, [R0]
       LDR R8, [R1]
       ADD R0, R0, #4
       ADD R1, R1, #4
       MLA R6,R7,R8,R6
       ADD R9,R9, #1
       B LOOP1
LOOP2:
       STR R9, [R3], #4
       MOV R9, #0
       LDR R1, vec3
       ADD R4, R4,#1
       CMP R4, #2
       BGE END
       B LOOP1
END
       .end

A computer system uses 32-bit memory addresses and it has a main memory consisting of 1G bytes. It has a 4KB cache organized in the block-set-associative manner, with 4 blocks per set and a block / cache line size of 64 bytes. Assuming the cache is initially empty and LRU algorithm is used for block replacement. Calculate the number of hits and the number of misses when the system uses the following cache type.

  1. A unified L1 cache with both instruction and data going inside it.

  2. Seprate L1 caches for both both data and instructions where each cache is 2KB.