New Zen microarchitecture details

Page 129 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 10, 2006
11,715
2,012
126
Sure, power/perf is better, but clocks is what we're talking about.

That's just what Con core design does: clock high.

Frequencies are not comparable, but rest of characteristics you mentioned are mostly are, especially since i extended comparison to GCN3.

This is all irrelevant, assuming 1050 Ti is actually a full GP107. If the rumors turn out true [as in ,1050 Ti will overclock worse by a significant margin than 1060 and it is built on 14nm LPP], then we pretty much have a solid confirmation that 14nm LPP is simply bad for high clocking parts, since we have all seen clocking ability of Pascal (namely the ease of hitting 2Ghz and difficulty of going further). If they don't, great, wait for Zen keeps on.

The thing that I believe bjt2 misses is that Polaris is largely a similar architecture to Tonga. When the designs are similar and you spend a good amount of time optimizing your circuit implementation, you can get higher frequencies on the new process.

Zen is completely different from the Construction cores. Per core, the execution resources have gone up significantly. The sizes of the key buffers (register files, re-order buffer, scheduler, etc.) are all way up. Increasing the sizes of those buffers while keeping frequency high is very tough, which is one of the reasons that Intel's perf/clock improvements have been fairly modest -- they are trying to increase IPC while keeping frequency capability roughly the same (this is something they failed at with Broadwell, but succeeded at with Skylake).
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Sure, power/perf is better, but clocks is what we're talking about.
Fiji, max clock 1050 MHz, 1.2V default core
Polaris, max clock 1266 MHz, 1.137 default vcore
+20% clock with -5% voltage. And we all know that the voltage on polaris is quite generous.

What is this? a miracle? No. It's the new 14nm FF process at its infancy, compared to the much mature 28nm BULK.

That's just what Con core design does: clock high.

Frequencies are not comparable, but rest of characteristics you mentioned are mostly are, especially since i extended comparison to GCN3.

I posted above the frequency comparison between 28nm and 14nm. Fiji has 4096 SP and 290W TDP, Polaris 2304 with 150W TDP. So at similar TDP (actually less TDP per SP for Polaris), we have +20% clock and -5% vcore.

Zen should have similar FO4 of Bulldozer. I will expect at least the same top clock. Not 20% more because top clock XV are very high with Vcore, while Fiji is not so high...

This is all irrelevant, assuming 1050 Ti is actually a full GP107. If the rumors turn out true [as in ,1050 Ti will overclock worse by a significant margin than 1060 and it is built on 14nm LPP], then we pretty much have a solid confirmation that 14nm LPP is simply bad for high clocking parts, since we have all seen clocking ability of Pascal (namely the ease of hitting 2Ghz and difficulty of going further). If they don't, great, wait for Zen keeps on.
NVIDIA is not AMD. AMD has already shown how to do 4.3GHz chips on a worse process and HDL low power library. And the rumors that 1050 does not overclock as high as 1060 are just rumors... We will see...
 

leoneazzurro

Golden Member
Jul 26, 2016
1,106
1,838
136
Sure, power/perf is better, but clocks is what we're talking about.

And clocks are around 15%-20% higher than a comparable GCN on 28nm part. While Nvidia on the same 28nm process had 70%+ higher (boost) clocks. Comparatively speaking, Nvidia gained less in passing from TSMC 28nm to TSMC 16nm than what AMD gained passing from 28nm TSMC to 14nm GF (also because they had a better starting point, that is for sure).
Also, I am tired of this nonsense about the 14nm GF/Samsung process being "focused on low power". It is called "Low power" because, being based on the smallest geometry avaiable for those manufacturers, it is also the lowest power consuming process those manufacturers have. It has absolutely nothing to do with some mystical impossibility to ramp up with clocks.
Also, I have yet to see a review of the 1050TI, so it's hard to understand if there are really indeed issues with the process itself or if it's an Nvidia design choice. And no, "my cousin told me so" is not a valid argument. The same is true for AMD supporters here. I'll believe in a 4+GHz 8 core Zen when AMD actually releases it.
 
Reactions: bjt2

bjt2

Senior member
Sep 11, 2016
784
180
86
The thing that I believe bjt2 misses is that Polaris is largely a similar architecture to Tonga. When the designs are similar and you spend a good amount of time optimizing your circuit implementation, you can get higher frequencies on the new process.

Zen is completely different from the Construction cores. Per core, the execution resources have gone up significantly. The sizes of the key buffers (register files, re-order buffer, scheduler, etc.) are all way up. Increasing the sizes of those buffers while keeping frequency high is very tough, which is one of the reasons that Intel's perf/clock improvements have been fairly modest -- they are trying to increase IPC while keeping frequency capability roughly the same (this is something they failed at with Broadwell, but succeeded at with Skylake).
AMD has gone from unified int scheduler to single split schedulers, to avoid lose something going for a 4 port scheduler to six port scheduler. Anyway many here said that the problem to go up in bulldozer was the L2. Moreover the integer pipeline stages seems to be higher. Some instruction latencies are higher. The FP scheduler is 4 ports as BD, decoders should be similar and so the rest. There is only the uop cache, the new stack engine and 2 more ALU but now with split schedulers as in the old k7-k10. The problematic cache was completely redesigned. Why should this design be slower than BD?
 
Mar 10, 2006
11,715
2,012
126
AMD has gone from unified int scheduler to single split schedulers, to avoid lose something going for a 4 port scheduler to six port scheduler. Anyway many here said that the problem to go up in bulldozer was the L2. Moreover the integer pipeline stages seems to be higher. Some instruction latencies are higher. The FP scheduler is 4 ports as BD, decoders should be similar and so the rest. There is only the uop cache, the new stack engine and 2 more ALU but now with split schedulers as in the old k7-k10. The problematic cache was completely redesigned. Why should this design be slower than BD?

Zen is going to deliver more performance than Bulldozer, and I have no doubt that it will actually be a viable choice for some enthusiasts (thanks to better performance, more modern platform, etc.). But I continue to believe that if you want the best performance in the vast majority of client workloads, Kaby Lake will be a superior solution.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Fiji, max clock 1050 MHz, 1.2V default core
Polaris, max clock 1266 MHz, 1.137 default vcore
+20% clock with -5% voltage. And we all know that the voltage on polaris is quite generous.

What is this? a miracle? No. It's the new 14nm FF process at its infancy, compared to the much mature 28nm BULK.
14nm FF is hardly infant.

I posted above the frequency comparison between 28nm and 14nm. Fiji has 4096 SP and 290W TDP, Polaris 2304 with 150W TDP. So at similar TDP (actually less TDP per SP for Polaris), we have +20% clock and -5% vcore.

Zen should have similar FO4 of Bulldozer. I will expect at least the same top clock. Not 20% more because top clock XV are very high with Vcore, while Fiji is not so high...
Once again, you are comparing 20nm FF [let's be serious] with 28nm BULK.
Want to have a laugh? I am fairly positive Fiji has similar lower power per ALU as Polaris 10. Basically it all went into 20% higher clock. A freaking shrink+FinFETs, and it's not like clocks were high to begin with.
NVIDIA is not AMD. AMD has already shown how to do 4.3GHz chips on a worse process and HDL low power library. And the rumors that 1050 does not overclock as high as 1060 are just rumors... We will see...
NVIDIA is not AMD, NVIDIA has already shown how to do 2Ghz GPUs. Will shall see if 1050 Ti hits those.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
14nm FF is hardly infant.

Polaris is the first chip with 14nm LPP. Maybe you are talking about 14nm LPE that was used for other things. Not high performance architecture as I can remember. And anyway 28nm BULK is way older. At least 5 years, if i remember well...

Once again, you are comparing 20nm FF [let's be serious] with 28nm BULK.
Want to have a laugh? I am fairly positive Fiji has similar lower power per ALU as Polaris 10. Basically it all went into 20% higher clock. A freaking shrink+FinFETs, and it's not like clocks were high to begin with.

Actual Polaris consumption is 110W, less than half than Fiji. Fiji consumption instead was quite accurate. And Fiji has HBM that consumes way less than GDDR5 of polaris. Moreover Fiji has less than double the SPs than Polaris (4096 vs 2304), for a power consumption more than double and 20% less clock... On a process in its first incarnation versus a very mature one... If you redo the calculation, you can see than power/SP is less in polaris despite +20% clock...

NVIDIA is not AMD, NVIDIA has already shown how to do 2Ghz GPUs. Will shall see if 1050 Ti hits those.

Nvidia has also a faster GPU on the 28nm... As other said here, NVIDIA gained less clock than AMD from the new process...
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
Actual Polaris consumption is 110W
Unless you want to claim that VRM and memory on Rx480 consumes almost 50 watts, it's not. More like 130W, that does make it more power hungrier per ALU than Fiji LE.

On a process in its first incarnation versus a very mature one... If you redo the calculation, you can see than power/SP is less in polaris despite +20% clock...
https://www.techpowerup.com/reviews/MSI/GTX_1070_Gaming_Z/24.html

Go ahead, do them.

As other said here, NVIDIA gained less clock than AMD from the new process...
From 1600 to 2100, that's a 30% clock gain. AMD went from 1100-1200 to 1400-1500, more like 25% gain. If we compare factual non-overclocked clocks, it's even worse.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,106
1,838
136
Unless you want to claim that VRM and memory on Rx480 consumes almost 50 watts, it's not. More like 130W, that does make it more power hungrier per ALU than Fiji LE.
.
You will be surprised about how much these things are consuming. Yes, everything off-GPU it is in the 50W proximity (8Gbytes version) Just look at RAM datasheets and you are already over 30W. And you cannot compare directly with Fiji, Fiji has a vastly different memory subsytem, HBM accounts for an huge power saving on those SKUs.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
You will be surprised about how much these things are consuming. Yes, everything off-GPU it is in the 50W proximity (8Gbytes version) Just look at RAM datasheets and you are already over 30W. And you cannot compare directly with Fiji, Fiji has a vastly different memory subsytem, HBM accounts for an huge power saving on those SKUs.
HBM is about 15W worth of power consumption on memory, just 15W savings. Yes, there are some other savings in memory controllers, but we compare dies themselves. Also, you forget that there are 2 Fijis. Fiji XT consumes 280W. Fiji LE consumes 220W (as much as Rx480 Nitro OC+). So, yes, entirety of process power consumption savings went into clocks.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Unless you want to claim that VRM and memory on Rx480 consumes almost 50 watts, it's not. More like 130W, that does make it more power hungrier per ALU than Fiji LE.


https://www.techpowerup.com/reviews/MSI/GTX_1070_Gaming_Z/24.html

Go ahead, do them.


From 1600 to 2100, that's a 30% clock gain. AMD went from 1100-1200 to 1400-1500, more like 25% gain. If we compare factual non-overclocked clocks, it's even worse.

I remember of some review citing the whole polaris consumption as 110W. Anyway 280/130>2 and 4096/2304<2. We can also calculate power per SP:

280/4096=0.068W/SP for Fiji
130/2304=0.056W/SP for Polaris

68/56, about a -20% power per SP and +20% clock...

I don't get why you posted a link to an Nvidia GPU test... We are comparing similar architectures here...
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Unless you want to claim that VRM and memory on Rx480 consumes almost 50 watts, it's not. More like 130W, that does make it more power hungrier per ALU than Fiji LE.


https://www.techpowerup.com/reviews/MSI/GTX_1070_Gaming_Z/24.html

Go ahead, do them.


From 1600 to 2100, that's a 30% clock gain. AMD went from 1100-1200 to 1400-1500, more like 25% gain. If we compare factual non-overclocked clocks, it's even worse.
Actually it is consuming that much at stock clock. GPU die alone under load is 110W. Unfortunately I cannot prove this to anyone, have been testing this GPU at friends home, who has reference RX 480 8 GB.
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
HBM is about 15W worth of power consumption on memory, just 15W savings. Yes, there are some other savings in memory controllers, but we compare dies themselves. Also, you forget that there are 2 Fijis. Fiji XT consumes 280W. Fiji LE consumes 220W (as much as Rx480 Nitro OC+). So, yes, entirety of process power consumption savings went into clocks.
What is Fiji LE? You mean Fiji Pro from Asus Strix Fury? Then actual average power consumption under load for that particular GPU is 200W. Peak power consumption is 226W.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
353
96
I remember of some review citing the whole polaris consumption as 110W. Anyway 280/130>2 and 4096/2304<2. We can also calculate power per SP:

280/4096=0.068W/SP for Fiji
130/2304=0.056W/SP for Polaris

68/56, about a -20% power per SP and +20% clock...

I don't get why you posted a link to an Nvidia GPU test... We are comparing similar architectures here...
Once again, i said Fiji.

It mean, we are dealing with
3584/226 = ~15.8 ALUs per watt
2304/167 = ~13.8 ALUs per watt. QED. And yes, you can use rx470 if you want, but Furmark test from same page clearly tells us that Fiji LE does not hit power limit outside of Furmark, while rx470 sits at it, so it would not mean anything.
What is Fiji LE? You mean Fiji Pro from Asus Strix Fury? Then actual average power consumption under load for that particular GPU is 200W. Peak power consumption is 226W.
Yeah, i forget that LE is another tier down. Yeah, i mean Fiji Pro.

Anyways, i am severely off-topic as is, but do you have numbers for Fiji Pro from GPU-Z by a chance? PM if you so desire.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
HBM is about 15W worth of power consumption on memory, just 15W savings. Yes, there are some other savings in memory controllers, but we compare dies themselves. Also, you forget that there are 2 Fijis. Fiji XT consumes 280W. Fiji LE consumes 220W (as much as Rx480 Nitro OC+). So, yes, entirety of process power consumption savings went into clocks.
What is the clock of this fiji LE? It's not 1050 i presume. So for same power/SP we have more than +20% in clock... This is normal...

220/4096=0.53W/SP
Ok... Slightly less power (but i am not sure that polaris die draw 130W) for much less clock...
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
I remember of some review citing the whole polaris consumption as 110W. Anyway 280/130>2 and 4096/2304<2. We can also calculate power per SP:

280/4096=0.068W/SP for Fiji
130/2304=0.056W/SP for Polaris

68/56, about a -20% power per SP and +20% clock...

I don't get why you posted a link to an Nvidia GPU test... We are comparing similar architectures here...
http://www.tomshardware.com/reviews/amd-radeon-rx-480-power-measurements,4622.html
Quote: AMD is right when it says that the Radeon RX 480 GPU is a true 110W GPU. The average GPU-Z measurement result comes in exactly at that point. But that's not the whole story. After all, there’s still the rest of the graphics card, including other components that consume power. And then there are power losses as well.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
Once again, i said Fiji.

It mean, we are dealing with
3584/226 = ~15.8 ALUs per watt
2304/167 = ~13.8 ALUs per watt. QED. And yes, you can use rx470 if you want, but Furmark test from same page clearly tells us that Fiji LE does not hit power limit outside of Furmark, while rx470 sits at it, so it would not mean anything.

You have inverted the ratios... And anyway Polaris does not draw 167W...

EDIT: and if Fiji LE does not have full 4096 SPs but 3584, then the ratio is much worse.
 

bjt2

Senior member
Sep 11, 2016
784
180
86
http://www.tomshardware.com/reviews/amd-radeon-rx-480-power-measurements,4622.html
Quote: AMD is right when it says that the Radeon RX 480 GPU is a true 110W GPU. The average GPU-Z measurement result comes in exactly at that point. But that's not the whole story. After all, there’s still the rest of the graphics card, including other components that consume power. And then there are power losses as well.

So probabily we are in the ballpark of less than 0.05W per SP... Good...
 

Glo.

Diamond Member
Apr 25, 2015
5,930
4,990
136
Yeah, i forget that LE is another tier down. Yeah, i mean Fiji Pro.

Anyways, i am severely off-topic as is, but do you have numbers for Fiji Pro from GPU-Z by a chance? PM if you so desire.
There is no Fiji LE. There is only Fiji XT or Pro. Fiji XT is used in R9 Nano, Fury X, S9300X2 and Radeon Pro Duo GPUs.
Fiji Pro is used in every AMD Fury branded GPU.

But there is not Fiji LE, GPU. Its way off-topic, here.
 

cdimauro

Member
Sep 14, 2016
163
14
61
Are you saying that a 19 stage pipeline CPU on the SAME ISA has an higher FO4 of a 15 stage pipeline architecture?

FO4 delay in ns is the process.

I am talking of relative FO4 delay.

For instance it's estimated a 17 FO4 for each BD stage, namely 17 time a FO4 delay.
And BD on 28nm BULK tops at 4.2-4.3GHz, with 3.8-4.1 base clock (depending on the TDPs).

Relative FO4 is architecture related.

Are you saying that a 19 stage x86 architecture has an higher FO4 delay of a 15 stage x86 architecture of the same manufacturer?
The architecture (x86) is the same, but the micro-architecture is not, and that's what counts.

There's also no clear relationship between the pipeline stages and the FO4.

So, the assumptions about the FO4 of Zen, comparing it to Bulldozer, have no foundations.
 
Reactions: Arachnotronic

bjt2

Senior member
Sep 11, 2016
784
180
86
The architecture (x86) is the same, but the micro-architecture is not, and that's what counts.

There's also no clear relationship between the pipeline stages and the FO4.

So, the assumptions about the FO4 of Zen, comparing it to Bulldozer, have no foundations.

Anyway since some operation latencies are higher, and Zen pipeline stages are not less than BD, I think that the FO4 is not higher.
 

CentroX

Senior member
Apr 3, 2016
351
152
116
There must be some kind of unknown advantage with zen though. It is designed by keller afterall.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |