Originally posted by: CTho9305
The thing is, even with a non-blocking cache, if you have, say, a 50-entry re-order buffer, any delays longer than 50 cycles are guarateend to stall you (assuming none of those intructions take multiplies or otherwise slow you down), because you can't track more instructions than the missing memory access and the next 49 instructions/uops. You also have to consider that it's very likely that within a few instructions, you'll hit something that depends on the result of the memory access, and you have to delay that instruction, and therefore any later ones that depend on it.