- Mar 3, 2017
- 1,623
- 5,894
- 136
Well that AMD slide says 15+%. That is, 15% is a baseline.
Yep. The 'IPC 10-15%+' is look like the another 'ST 15%+' to me. Nobody knows where it will land at last.
I'm not blaming him for the number lol, I'm just saying he has troubles interpreting data he has been spoon fed. That is why he put such a wide range of 15-25% on his "prediction", as he has no idea what the changes listed on one of the slides even mean.lol what? I am not defending anyone. I am aware that most of MLID leaks are usually BS but you are blaming him for a number in an AMD slide that he even said it will likely be higher in that same video.
CCD's don't need SoIC-X, something akin to Foveros is fine.
4 tile is possible, but it has to be more cores than Turin, so you just inflate the size of each tile.
Maybe the server standard CCD moves to 16 cores with Zen6, but I really doubt it.
Dense CCD 16 cores for sure, the 32 core thing I'll defer to the likes of Spec.
Also the dense part probably has some structural changes outside of the CCD's, the target market and all.
The fanout introduced with RDNA3 is nice and cheap, and works well for this purpose. Si bridge would be overkill.
Cache isn't magic, 8 cores is basically the limit for a high performance L3.
Venice is built for the guys who care about TCO and perf density and nothing else.
Barrier of entry will be set extremely high, like MI300 pricing or so, even though it is more expensive to make but volume should naturally be higher.
The slide clearly deals with CCX (Core Complex) not CCD (Core Complex Die).
you can't just look at a microarchitecture block diagram and come up with an accurate IPC figure. just look at A17 with all the changes and it ended up with just a 3% IPC upliftI'm not blaming him for the number lol, I'm just saying he has troubles interpreting data he has been spoon fed. That is why he put such a wide range of 15-25% on his "prediction", as he has no idea what the changes listed on one of the slides even mean.
Actually you can, as with Zen 5 we have some massive ALU, AGU, L/S, FP, frontend, backend increased. Basically this is a bigger core change than Zen 3 was, probably on the level of Bullldozer to Zen 1 change. Anyone who knows a little bit about CPU uarchitecture design should be able to guess this will be a BIG jump in performance.you can't just look at a microarchitecture block diagram and come up with an accurate IPC figure. just look at A17 with all the changes and it ended up with just a 3% IPC uplift
you can guess it will be a large jump but you can't guess an accurate IPC figures. again A17 had a a lot of changes with an extra decoder and a wider backend and it ended up with 3%.Actually you can, as with Zen 5 we have some massive ALU, AGU, L/S, FP, frontend, backend increased. Basically this is a bigger core change than Zen 3 was, probably on the level of Bullldozer to Zen 1 change. Anyone who knows a little bit about CPU uarchitecture design should be able to guess this will be a BIG jump in performance.
You are comparing apples and oranges (ARM vs x86). Bottlenecks and overall pipeline flows are different between the two.you can guess it will be a large jump but you can't guess an accurate IPC figures. again A17 had a a lot of changes with an extra decoder and a wider backend and it ended up with 3%.
you can guess it will be a large jump but you can't guess an accurate IPC figures. again A17 had a a lot of changes with an extra decoder and a wider backend and it ended up with 3%.
Yep. He had slides that were directly saying that its large Strix Point Halo SKU, albeit - cut down to 24 CUs will compete with 3050 mobile graphics, but he was happy to report that standard 16 CU, without MALL cache will do just that, which will just not happen.I'm not blaming him for the number lol, I'm just saying he has troubles interpreting data he has been spoon fed. That is why he put such a wide range of 15-25% on his "prediction", as he has no idea what the changes listed on one of the slides even mean.
I assume zero bubble conditional branches means they've found a way to make the penalty of a branch misprediction zero. Normally, a branch misprediction stalls the core because the core is flushed and the execution units have no work to do until the instructions from the correct branch make it through the pipeline. Imagine the core is a pipe that flows water, and it is your goal as a CPU architect for water to flow as fast as possible through the pipe. The gap between the flush and when the execution units have work to do is the "bubble", essentially an air pocket that halts your water flow.BTW can anyone explain what the Zen5 Zero bubble Conditional branches might mean? Or the "Memory Profiler" for Zen6?
Edit: Formatting
AMD claims no bubbles on most predictions due to the increased branch predictor bandwidth, here I can see parallels to what Arm had introduced with the Cortex-A77, where a similar doubled-up branch predictor bandwidth would be able to run ahead of subsequent pipelines stages and thus fill bubble gaps ahead of them hitting the execution stages and potentially stalling the core.
Decode width is unknown too. Odd it is not mentioned in the highlights. Would be a bit weird if it remained at 4.Also, we don't know (yet) the size of the very important uOP cache, which was big update for both Zen 3 and Zen 4. I expect this structure to get a big size increase in Zen 5.
Same. Likely holds more uops and larger dispatch. Zen 4's uop$ already had 9 ops/cycle and the core is fed from the uop cache most of the time, so it's only fair that a wider core requires more throughput from the uop cache as well. Hopefully, it's something like 12 ops/cycle or more to compliment the zero bubble misprediction on conditional branches.Also, we don't know (yet) the size of the very important uOP cache, which was big update for both Zen 3 and Zen 4. I expect this structure to get a big size increase in Zen 5.
Meanwhile the branch predictor’s op cache has been more significantly improved. The op cache is not only 68% larger than before (now storing 6.75k ops), but it can now spit out up to 9 macro-ops per cycle, up from 6 on Zen 3. So in scenarios where the branch predictor is doing especially well at its job and the micro-op queue can consume additional instructions, it’s possible to get up to 50% more ops out of the op cache. Besides the performance improvement, this has a positive benefit to power efficiency since tapping cached ops requires a lot less power than decoding new ones.
Actually, he is not entirely wrong.Yep. He had slides that were directly saying that its large Strix Point Halo SKU, albeit - cut down to 24 CUs will compete with 3050 mobile graphics, but he was happy to report that standard 16 CU, without MALL cache will do just that, which will just not happen.
I think that it's very obvious AMD wanted to leave some very important stuff out as that presentation was bound to get leaked quickly after they shared it. If we got it now, competition got it much earlier. I don't think that anyone from intel was taking the 10-15%+ IPC "projection" seriouslyDecode width is unknown too. Odd it is not mentioned in the highlights. Would be a bit weird if it remained at 4.
I assume zero bubble conditional branches means they've found a way to make the penalty of a branch misprediction zero. Normally, a branch misprediction stalls the core because the core is flushed and the execution units have no work to do until the instructions from the correct branch make it through the pipeline. Imagine the core is a pipe that flows water, and it is your goal as a CPU architect for water to flow as fast as possible through the pipe. The gap between the flush and when the execution units have work to do is the "bubble", essentially an air pocket that halts your water flow.
Zen 3’s L1 BTB could track 1024 branch targets and handle them with 1 cycle latency, meaning that the frontend won’t need to stall after a taken branch if the target comes from the L1 BTB. Zen 4’s L1 BTB keeps the same 1 cycle latency, but improves capacity.
Zen6 has three CCD types.Zen 5 standard would be N4 with 8 cores per CCD
Zen 5c N3 with 16 cores per CCD
Zen 6 standard N3 with 16 cores per CCD
Zen 6c N2 with 32 cores per CCD