- Aug 7, 2005
- 158
- 0
- 0
I recently posted an article on my blog about how memory performance is being limited by the FSB and how the full performance potential of DDR3 cannot possibly be realized until Nehalem with an integrated memory controller (IMC). I'd like your thoughts on this.
Here's the article...
Thanks,
-Chris.
Here's the article...
I thought I would write a quick article on why it's time for the Front Side Bus (FSB) to retire and be replaced by an Integrated Memory Controller (IMC) in terms the average overclocker can understand.
The FSB
First, it's important to revisit how the FSB works. The Intel FSB uses AGTL+ gunning transceiver logic that allows 4 transfers per clock and has given rise to the term "Quad-Pumped Front Side Bus". Thus a Core2Duo FSB running at 333MHz can perform 1333 million transfers per second. Given the FSB is 64bits wide, the bandwidth is therefore (1333x8) 10.6GB/s. This is the peak transfer rate of Intel's current FSB.
If you overclock the FSB to 400MHz, you effectively increase the transfer rate to 1600 million T/s for an effective peak bandwidth of 12.8GB/s. If you are skilled or lucky enough to overclock your FSB to an outstanding 533MHz, you would unlock a bandwidth potential of 17GB/s.
Of course, Intel has chosen a top FSB speed of 333MHz for stability and thermal reasons.
Memory
Next, it's important to revisit the memory subsystem and how it works. DDR or double-data rate memory supports a data transfer on both the rising and falling edge of the clock. In addition, DDR2 allows the memory bus clock to work at double the memory module clock. Thus with DDR2 you effectively get 4 transfers per memory clock cycle. With DDR2-800 you are getting 800 million transfers per second from a 400MHz memory bus clock while the modules themselves are clocked at 200MHz. Finally, the bus width of DDR2 memory is 64 bits or 8 bytes.
The bandwidth of memory is calculated as: transfers per second (T/s) x 8 bytes per transfer (B/T) x interleaved channels... or in the case of a single channel DDR2-800 module: 800M x 8 x 1 = 6.4GB/s.
DDR3 works similarly to DDR2 but allows the memory bus to operate at 4x the memory frequency so DDR3-1600 provides 1600 T/s on a 800MHz bus at a 200MHz memory clock.
Dual channel or interleaved memory has been supported for several generations of memory controllers from Intel. Interleaved memory works similar to the concept of stripping drives in a RAID0 array where data is striped across memory channels using an effective stripe size equal to a cache line boundary. The effective bandwidth of interleaved multi-channel memory theoretically scales directly with the number of channels although real-world tests and benchmarks have revealed the improvements in most applications to be much less although I'm not sure these tests considered the bottle-neck imposed by the FSB.
Limtations of the FSB
The table below summarizes the peak memory bandwidth available from current memory technology. For each class of memory, it shows the clock, bus clock, transfers per second, and the bandwidth available in single, dual, and even triple channel interleaved modes. The table then also tabulates the required FSB transfers per second and required FSB clock frequency required to support full memory bandwidth.
Can't embed images so click here to see chart
FSB frequencies are color coded as follows:
White: FSB is exceeded by or officially supported by Intel
Orange: FSB can be achieved by overclocking
Red: FSB that is unattainable
Conclusions
As you can see, a FSB frequency of 333MHz currently available on Intel's top of the line products cannot even support the full bandwidth capabilities of widely used interleaved dual-channel DDR2 memory... The FSB is already a bottleneck when running DDR2-800 memory in dual channel mode.
The prospects for the FSB are even more dire when you look at DDR3 performance. The current 333MHz FSB is only suitable for single-channel DDR3 up to DDR3-1333 speeds. Running multi-channel or higher speed DDR3 memory causes the current FSB to become a serious bottleneck.
The Future
The performance of DDR3 memory promises to be significant. Nehalem's Bloomfield performance product due out later this year will offer triple-channel DDR3 that offers theoretical peak bandwidth of 32GB/s (for modest DDR3-1333) which is on par with the L2 cache performance in a Q6600! However, you can see why an IMC is needed to realize this kind of performance because that kind of bandwidth would require a FSB running at 1000MHz which is simply impossible.
If you have a current investment in DDR3 and are frustrated that you can't extract the most from it, then wait for Nehalem. If you are considering a switch to DDR3, I would highly recommend you get by with cheap DDR2 for now and make the move to DDR3 on a Nehalem based platform.
Thanks,
-Chris.