Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

maddie · Jun 15, 2020

Glo. said:
And on what did you based your assumption that smaller die size is cheaper to make, then, in the first place, hmmm?

The real world? Do I pass?

Glo. · Jun 15, 2020

maddie said:
The real world? Do I pass?

Then what warrants your indirect suggestion that it is so much significantly cheaper to make, that it is much more financially viable to cut down cheaper die to fit in entry level products?

Hmmm? How big difference in manufacturing cost it will be to warrant putting it into lower price margin product lineup?

maddie · Jun 15, 2020

Glo. said:
Then what warrants your indirect suggestion that it is so much significantly cheaper to make, that it is much more financially viable to cut down cheaper die to fit in entry level products?

Hmmm? How big difference in manufacturing cost it will be to warrant putting it into lower price margin product lineup?

Once it's cheaper to make, it will always be cheaper to place in a lower priced product. No exceptions. You seem to think that because it's higher performance, then AMD will lose more (lower margin). this is completely incorrect. So what if they can get more for a full die. The truth is that the substitute die (navi 10) is bigger and it's not like they won't be producing N23. I can see your point if they were choosing between the two but this is not the case.

Glo. · Jun 15, 2020

maddie said:
Once it's cheaper to make, it will always be cheaper to place in a lower priced product. No exceptions. You seem to think that because it's higher performance, then AMD will lose more (lower margin). this is completely incorrect. So what if they can get more for a full die. The truth is that the substitute die (navi 10) is bigger and it's not like they won't be producing N23. I can see your point if they were choosing between the two but this is not the case.

Let me quote your post:

maddie said:
The real world? Do I pass?

And ask you this: in real world, how much cheaper to make is the die that has 240 mm2 from a die that has 252 mm2, but has 40% higher performance and can have, from business perspective in REAL WORLD, higher price margin?

What in REAL WORLD, would you do, knowing you have competitive advantage over your direct competitor?

Would you sell old design, that already payed itself off, at lower margin, and go for volume, and use the better performing die, to increaqse margins, or would you let those margins go to the toilet and cut it down, to fit in particular form factor?

maddie · Jun 15, 2020

Glo. said:
Let me quote your post:

And ask you this: in real world, how much cheaper to make is the die that has 240 mm2 from a die that has 252 mm2, but has 40% higher performance and can have, from business perspective in REAL WORLD, higher price margin?

What in REAL WORLD, would you do, knowing you have competitive advantage over your direct competitor?

Would you sell old design, that already payed itself off, at lower margin, and go for volume, and use the better performing die, to increaqse margins, or would you let those margins go to the toilet and cut it down, to fit in particular form factor?

Development for both die have been financed.
Newer smaller die is cheaper to fab.
Always cheaper to use newer lower cost die EVEN for a lower end product.

I really don't see how you can say producing a more expensive product saves money.

Maybe you're thinking that a sale to a lower end product is a lost full die sale. This will only be true if AMD cannot produce enough volume to satisfy the full or close to full N23 market, in which case they should simply stop producing N10 and transfer all production to N23. In all cases, production of N10 should stop to get the highest total margin as both products are competing for the same production lines.

Veradun · Jun 15, 2020

Glo. said:
Even if that 240 mm2 die has higher performance than that 250 mm2 die?

Is it really that uncharacteristic? Or illogical?

They will probably refresh Navi14 and release a fully enabled SKU

maddie · Jun 15, 2020

Veradun said:
They will probably refresh Navi14 and release a fully enabled SKU

This makes a lot more sense for the low end.

Glo. · Jun 15, 2020

Veradun said:
They will probably refresh Navi14 and release a fully enabled SKU

If Full Navi 14 die, with faster memory can compete with Lowest end Nvidia next Gen GPUs, that is possible.

If not, then what else there is?

RetroZombie · Jun 15, 2020

DisEnchantment said:
I prepared a sample workflow using older design flow, but still generically OK.

Interesting chart.
With it we can see that in fact many fabless chip manufactures (like nvidia, apple, amd, QC, ...) actually have near zero disadvantaged by being fabless over the ones like intel.

We were always told that intel with control over all had a major advantage but we can clearly see that it's not that important it's just they need to work all together to achieve the 'result'.
Maybe at best case scenario time to market for products release would be shorter (for intel) vs others.

beginner99 · Jun 16, 2020

Glo. said:
r would you let those margins go to the toilet and cut it down, to fit in particular form factor?

Of course you wouldn't cut down N23 dies that you could sell as full-dies. The point of not using N10 anymore is simple. If you still produce N10 it needs more wafers (and hence cost more) than if you would move said production to N23. These chips use the exact same factory and hence are in competition with each other. If you use 1000 wafers to produce N10 and N23 dies, you end up with less usable chips than if you use 1000 wafers to make only N23. More chips means more sales and more income. On top of that you are more flexible because you can assign a fully-functional N23 die to either card/sku, the fully enabled or cut-down and simply vary that depending on demand without needing to change production at all.

EDIT: In fact if you underestiamted demand for full N23, you can easily adjust and actually make a lot more money than in your split-scheme. Or another option is you use all the additional dies you get due to saving die space for the full product meaning you get more fulyl enabled cards than in your split-scheme and hence more money.

GodisanAtheist · Jun 16, 2020

Veradun said:
They will probably refresh Navi14 and release a fully enabled SKU

-The low end rarely gets much love, but it would be great to see AMD put some RDNA2 efficiency secret sauce into the 5500xt to give it some much needed oomph to compete with the 1650 successor.

Konan · Jun 16, 2020

MLiD’s latest received leak - speculates reasons for fakeness Or not and why in places...

my take = fake

Krteq · Jun 16, 2020

Oh Jeeeezzz

Why people still have tendencies to trust any nonsense? ...this time about NAVI and Ampere

DisEnchantment · Jun 16, 2020

Ajay said:
Where'd you get that chart? I've been looking for something like that for a while.

It is an old slide from Mid 2000s from Agilent.
Back then I just graduated from Uni and I was doing internship at Philips Semi when it was still there and they were developing 802.11a chips and I was playing with Xilinx FPGAs.
The Backend and Frontend design are quite different these days and each company have different flows but high level it is pretty much same.

AtenRa · Jun 16, 2020

Konan said:
MLiD’s latest received leak - speculates reasons for fakeness Or not and why in places...

my take = fake

Yea no way they have PDFs ready today for 30th September NDA, more than three months away.

Glo. · Jun 16, 2020

beginner99 said:
Of course you wouldn't cut down N23 dies that you could sell as full-dies. The point of not using N10 anymore is simple. If you still produce N10 it needs more wafers (and hence cost more) than if you would move said production to N23. These chips use the exact same factory and hence are in competition with each other. If you use 1000 wafers to produce N10 and N23 dies, you end up with less usable chips than if you use 1000 wafers to make only N23. More chips means more sales and more income. On top of that you are more flexible because you can assign a fully-functional N23 die to either card/sku, the fully enabled or cut-down and simply vary that depending on demand without needing to change production at all.

EDIT: In fact if you underestiamted demand for full N23, you can easily adjust and actually make a lot more money than in your split-scheme. Or another option is you use all the additional dies you get due to saving die space for the full product meaning you get more fulyl enabled cards than in your split-scheme and hence more money.

But you still will be manufacturing N10 dies, no matter what. Apple will use them, OEMs will use them. You do not retire one die after one year of manufacturing.

Glo. · Jun 16, 2020

If This info from MLID is correct, it suggests that full Navi 21 die has more than 80 CUs, and more than 448 bit GDDR6.

96 CUs with 512 bit GDDR6 bus? That would be way out of anyones expectations.

moinmoin · Jun 16, 2020

AMD Announces Radeon Pro 5600M Navi GPU with HBM2 - Inside Apple's MacBook Pro 16"

www.anandtech.com

So Apple gets a currently exclusive Radeon Pro 5600M with HBM2. How does that fit with all the rumors and leaks?

beginner99 · Jun 16, 2020

Glo. said:
But you still will be manufacturing N10 dies, no matter what. Apple will use them, OEMs will use them. You do not retire one die after one year of manufacturing.

fair enough. But my logic still holds true. If you don't offer that n10 desktop sku rebrand, then you need a lot less N10 wafers which you can use for N23 instead.

EDIT: And another advantage of making more N23 dies is that AMD could adjust the binning eg increase the clock a bit for the fully enabled product which will use the best dies only. In fact after all this a N10 rebrand makes very little sense IMHO.

Qwertilot · Jun 16, 2020

Strangely specific argument this. If they've genuinely upped perf/watt by 50% then that's a full generational improvement.

In any world where you've got the resources to do it you do a top to bottom refresh as soon as sensible (a few months.).

Krteq · Jun 16, 2020

moinmoin said:
AMD Announces Radeon Pro 5600M Navi GPU with HBM2 - Inside Apple's MacBook Pro 16"

www.anandtech.com

So Apple gets a currently exclusive Radeon Pro 5600M with HBM2. How does that fit with all the rumors and leaks?

Nowise. That's a custom GPU designed exclusively for Apple. This has nothing to do with standard customer dGPU segment

Veradun · Jun 16, 2020

Glo. said:
If This info from MLID is correct, it suggests that full Navi 21 die has more than 80 CUs, and more than 448 bit GDDR6.

96 CUs with 512 bit GDDR6 bus? That would be way out of anyones expectations.

My expectation based on the 500mm2 rumor was 100CU and 384b.

They may move mm2s around for more PHYs and less CUs or less CUs and more frequency tho.

Ajay · Jun 16, 2020

Veradun said:
My expectation based on the 500mm2 rumor was 100CU and 384b.

They may move mm2s around for more PHYs and less CUs or less CUs and more frequency tho.

I would think a larger % of the die will be used for cache - just a guess though.

Ajay · Jun 16, 2020

DisEnchantment said:
It is an old slide from Mid 2000s from Agilent.
Back then I just graduated from Uni and I was doing internship at Philips Semi when it was still there and they were developing 802.11a chips and I was playing with Xilinx FPGAs.
The Backend and Frontend design are quite different these days and each company have different flows but high level it is pretty much same.

Darn, did look familiar given what I already know. Well, guess I'll keep hoping I find a more modern flow chart. Thx.

DisEnchantment · Jun 18, 2020

Some new patents specifically for RT from AMD

20200193681 MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT
20200193682 MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
20200193683 ROBUST RAY-TRIANGLE INTERSECTION
20200193684 EFFICIENT DATA PATH FOR RAY TRIANGLE INTERSECTION
20200193685 WATER TIGHT RAY TRIANGLE INTERSECTION WITHOUT RESORTING TO DOUBLE PRECISION

TL;DR;
AMD's Ray Intersection Unit containing HW for accelerating ray-box and ray-triangle intersection tests is present in the CU along side the SIMDs and not within the Texture Processor.
The previous circulated assumptions of the Ray Intersection Unit within the TMU is probably incorrect.
The Engine can perform 4 ray-box tests or 1 ray-triangle tests per cycle.
Therefore the publicized numbers of 380 Billion ray intersection tests per second for XSX is actually for ray-box intersection.
Testing for ray hit involves performing ray-box tests and then finally ray-triangle tests.

In order to mininimize the silicon footprint, the Ray Intersection Unit has a mux to switch certain blocks in a the pipeline to handle either ray-box or ray-triangle intersection tests.
Result, smaller silicon footprint.

Awesome part,
Performing intersection test of a ray with a triangle by converting the coordinate system into barycentric where the z axis is the direction of the ray.
Testing if ray intersects the triangle is then simply checking if x,y of ray is within the triangle.
And this whole operation is achieved without using division(!) and just using shear transformation/matrix multiplication
Additional patent then adds to this by attempting to achieve reliable results without fp64(which might have been needed due to coordinate transformation, precision loss during mathematical operations with very small deltas/values)
Result, faster operation with smaller silicon footprint.

About traditional backwards ray traying,
In backwards ray tracing, the ray generation shader 302 generates a ray having an origin at the point of the camera. The point at which the ray intersects a plane defined to correspond to the screen defines the pixel on the screen whose color the ray is being used to determine. If the ray hits an object, that pixel is colored based on the closest hit shader 310. If the ray does not hit an object, the pixel is colored based on the miss shader 312. Multiple rays may be cast per pixel, with the final color of the pixel being determined by some combination of the colors determined for each of the rays of the pixel.

20200193681 MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT
Abstract
Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.

20200193682 MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
Abstract
Described herein is a merged data path unit that has elements that are configurable to switch between different instruction types. The merged data path unit is a pipelined unit that has multiple stages. Between different stages lie multiplexor layers that are configurable to route data from functional blocks of a prior stage to a subsequent stage. The manner in which the multiplexor layers are configured for a particular stage is based on the instruction type executed at that stage. In some implementations, the functional blocks in different stages are also configurable by the control unit to change the operations performed. Further, in some implementations, the control unit has sideband storage that stores data that "skips stages." An example of a merged data path used for performing a ray-triangle intersection test and a ray-box intersection test is also described herein.

20200193683 ROBUST RAY-TRIANGLE INTERSECTION
Abstract
A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.

20200193684 EFFICIENT DATA PATH FOR RAY TRIANGLE INTERSECTION
Abstract
Described herein is a technique for performing ray-triangle intersection without a floating point division unit. A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, removing division.

20200193685 WATER TIGHT RAY TRIANGLE INTERSECTION WITHOUT RESORTING TO DOUBLE PRECISION
Abstract
Described herein is a technique for performing ray-triangle intersection test in a manner that produces watertight results. The technique involves translating the coordinates of the triangle such that the origin is at the origin of the ray. The technique involves projecting the coordinate system into the viewspace of the ray. The technique then involves calculating barycentric coordinates and interpolating the barycentric coordinates to get a time of intersect. The signs of the barycentric coordinates indicate whether a hit occurs. The above calculations are performed with a non-directed floating point rounding mode to provide watertightness. A non-directed rounding mode is one in which the mantissa of a rounded number is rounded in a manner that is not dependent on the sign of the number.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Senior member

Lifer

Lifer

Golden Member