Question Speculation: RDNA2 + CDNA Architectures thread

Page 11 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,690
6,345
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
The real world? Do I pass?
Then what warrants your indirect suggestion that it is so much significantly cheaper to make, that it is much more financially viable to cut down cheaper die to fit in entry level products?

Hmmm? How big difference in manufacturing cost it will be to warrant putting it into lower price margin product lineup?
 

maddie

Diamond Member
Jul 18, 2010
4,786
4,771
136
Then what warrants your indirect suggestion that it is so much significantly cheaper to make, that it is much more financially viable to cut down cheaper die to fit in entry level products?

Hmmm? How big difference in manufacturing cost it will be to warrant putting it into lower price margin product lineup?
Once it's cheaper to make, it will always be cheaper to place in a lower priced product. No exceptions. You seem to think that because it's higher performance, then AMD will lose more (lower margin). this is completely incorrect. So what if they can get more for a full die. The truth is that the substitute die (navi 10) is bigger and it's not like they won't be producing N23. I can see your point if they were choosing between the two but this is not the case.
 

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
Once it's cheaper to make, it will always be cheaper to place in a lower priced product. No exceptions. You seem to think that because it's higher performance, then AMD will lose more (lower margin). this is completely incorrect. So what if they can get more for a full die. The truth is that the substitute die (navi 10) is bigger and it's not like they won't be producing N23. I can see your point if they were choosing between the two but this is not the case.
Let me quote your post:
The real world? Do I pass?
And ask you this: in real world, how much cheaper to make is the die that has 240 mm2 from a die that has 252 mm2, but has 40% higher performance and can have, from business perspective in REAL WORLD, higher price margin?

What in REAL WORLD, would you do, knowing you have competitive advantage over your direct competitor?

Would you sell old design, that already payed itself off, at lower margin, and go for volume, and use the better performing die, to increaqse margins, or would you let those margins go to the toilet and cut it down, to fit in particular form factor?
 

maddie

Diamond Member
Jul 18, 2010
4,786
4,771
136
Let me quote your post:

And ask you this: in real world, how much cheaper to make is the die that has 240 mm2 from a die that has 252 mm2, but has 40% higher performance and can have, from business perspective in REAL WORLD, higher price margin?

What in REAL WORLD, would you do, knowing you have competitive advantage over your direct competitor?

Would you sell old design, that already payed itself off, at lower margin, and go for volume, and use the better performing die, to increaqse margins, or would you let those margins go to the toilet and cut it down, to fit in particular form factor?
Development for both die have been financed.
Newer smaller die is cheaper to fab.
Always cheaper to use newer lower cost die EVEN for a lower end product.

I really don't see how you can say producing a more expensive product saves money.

Maybe you're thinking that a sale to a lower end product is a lost full die sale. This will only be true if AMD cannot produce enough volume to satisfy the full or close to full N23 market, in which case they should simply stop producing N10 and transfer all production to N23. In all cases, production of N10 should stop to get the highest total margin as both products are competing for the same production lines.
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
I prepared a sample workflow using older design flow, but still generically OK.
Interesting chart.
With it we can see that in fact many fabless chip manufactures (like nvidia, apple, amd, QC, ...) actually have near zero disadvantaged by being fabless over the ones like intel.

We were always told that intel with control over all had a major advantage but we can clearly see that it's not that important it's just they need to work all together to achieve the 'result'.
Maybe at best case scenario time to market for products release would be shorter (for intel) vs others.
 

beginner99

Diamond Member
Jun 2, 2009
5,219
1,591
136
r would you let those margins go to the toilet and cut it down, to fit in particular form factor?

Of course you wouldn't cut down N23 dies that you could sell as full-dies. The point of not using N10 anymore is simple. If you still produce N10 it needs more wafers (and hence cost more) than if you would move said production to N23. These chips use the exact same factory and hence are in competition with each other. If you use 1000 wafers to produce N10 and N23 dies, you end up with less usable chips than if you use 1000 wafers to make only N23. More chips means more sales and more income. On top of that you are more flexible because you can assign a fully-functional N23 die to either card/sku, the fully enabled or cut-down and simply vary that depending on demand without needing to change production at all.

EDIT: In fact if you underestiamted demand for full N23, you can easily adjust and actually make a lot more money than in your split-scheme. Or another option is you use all the additional dies you get due to saving die space for the full product meaning you get more fulyl enabled cards than in your split-scheme and hence more money.
 
Reactions: Tlh97

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,100
136
Where'd you get that chart? I've been looking for something like that for a while.
It is an old slide from Mid 2000s from Agilent.
Back then I just graduated from Uni and I was doing internship at Philips Semi when it was still there and they were developing 802.11a chips and I was playing with Xilinx FPGAs.
The Backend and Frontend design are quite different these days and each company have different flows but high level it is pretty much same.
 
Reactions: Tlh97 and coercitiv

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
Of course you wouldn't cut down N23 dies that you could sell as full-dies. The point of not using N10 anymore is simple. If you still produce N10 it needs more wafers (and hence cost more) than if you would move said production to N23. These chips use the exact same factory and hence are in competition with each other. If you use 1000 wafers to produce N10 and N23 dies, you end up with less usable chips than if you use 1000 wafers to make only N23. More chips means more sales and more income. On top of that you are more flexible because you can assign a fully-functional N23 die to either card/sku, the fully enabled or cut-down and simply vary that depending on demand without needing to change production at all.

EDIT: In fact if you underestiamted demand for full N23, you can easily adjust and actually make a lot more money than in your split-scheme. Or another option is you use all the additional dies you get due to saving die space for the full product meaning you get more fulyl enabled cards than in your split-scheme and hence more money.
But you still will be manufacturing N10 dies, no matter what. Apple will use them, OEMs will use them. You do not retire one die after one year of manufacturing.
 

Glo.

Diamond Member
Apr 25, 2015
5,753
4,659
136
If This info from MLID is correct, it suggests that full Navi 21 die has more than 80 CUs, and more than 448 bit GDDR6.

96 CUs with 512 bit GDDR6 bus? That would be way out of anyones expectations.
 

beginner99

Diamond Member
Jun 2, 2009
5,219
1,591
136
But you still will be manufacturing N10 dies, no matter what. Apple will use them, OEMs will use them. You do not retire one die after one year of manufacturing.
fair enough. But my logic still holds true. If you don't offer that n10 desktop sku rebrand, then you need a lot less N10 wafers which you can use for N23 instead.

EDIT: And another advantage of making more N23 dies is that AMD could adjust the binning eg increase the clock a bit for the fully enabled product which will use the best dies only. In fact after all this a N10 rebrand makes very little sense IMHO.
 
Reactions: Tlh97

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
Strangely specific argument this. If they've genuinely upped perf/watt by 50% then that's a full generational improvement.

In any world where you've got the resources to do it you do a top to bottom refresh as soon as sensible (a few months.).
 
Reactions: Tlh97 and Lodix

Veradun

Senior member
Jul 29, 2016
564
780
136
If This info from MLID is correct, it suggests that full Navi 21 die has more than 80 CUs, and more than 448 bit GDDR6.

96 CUs with 512 bit GDDR6 bus? That would be way out of anyones expectations.

My expectation based on the 500mm2 rumor was 100CU and 384b.

They may move mm2s around for more PHYs and less CUs or less CUs and more frequency tho.
 

Ajay

Lifer
Jan 8, 2001
15,959
8,065
136
It is an old slide from Mid 2000s from Agilent.
Back then I just graduated from Uni and I was doing internship at Philips Semi when it was still there and they were developing 802.11a chips and I was playing with Xilinx FPGAs.
The Backend and Frontend design are quite different these days and each company have different flows but high level it is pretty much same.
Darn, did look familiar given what I already know. Well, guess I'll keep hoping I find a more modern flow chart. Thx.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,659
6,100
136
Some new patents specifically for RT from AMD

20200193681 MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT
20200193682 MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
20200193683 ROBUST RAY-TRIANGLE INTERSECTION
20200193684 EFFICIENT DATA PATH FOR RAY TRIANGLE INTERSECTION
20200193685 WATER TIGHT RAY TRIANGLE INTERSECTION WITHOUT RESORTING TO DOUBLE PRECISION



TL;DR;
AMD's Ray Intersection Unit containing HW for accelerating ray-box and ray-triangle intersection tests is present in the CU along side the SIMDs and not within the Texture Processor.
The previous circulated assumptions of the Ray Intersection Unit within the TMU is probably incorrect.
The Engine can perform 4 ray-box tests or 1 ray-triangle tests per cycle.
Therefore the publicized numbers of 380 Billion ray intersection tests per second for XSX is actually for ray-box intersection.
Testing for ray hit involves performing ray-box tests and then finally ray-triangle tests.

In order to mininimize the silicon footprint, the Ray Intersection Unit has a mux to switch certain blocks in a the pipeline to handle either ray-box or ray-triangle intersection tests.
Result, smaller silicon footprint.

Awesome part,
Performing intersection test of a ray with a triangle by converting the coordinate system into barycentric where the z axis is the direction of the ray.
Testing if ray intersects the triangle is then simply checking if x,y of ray is within the triangle.
And this whole operation is achieved without using division(!) and just using shear transformation/matrix multiplication
Additional patent then adds to this by attempting to achieve reliable results without fp64(which might have been needed due to coordinate transformation, precision loss during mathematical operations with very small deltas/values)
Result, faster operation with smaller silicon footprint.

About traditional backwards ray traying,
In backwards ray tracing, the ray generation shader 302 generates a ray having an origin at the point of the camera. The point at which the ray intersects a plane defined to correspond to the screen defines the pixel on the screen whose color the ray is being used to determine. If the ray hits an object, that pixel is colored based on the closest hit shader 310. If the ray does not hit an object, the pixel is colored based on the miss shader 312. Multiple rays may be cast per pixel, with the final color of the pixel being determined by some combination of the colors determined for each of the rays of the pixel.


20200193681 MECHANISM FOR SUPPORTING DISCARD FUNCTIONALITY IN A RAY TRACING CONTEXT
Abstract
Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.

20200193682 MERGED DATA PATH FOR TRIANGLE AND BOX INTERSECTION TEST IN RAY TRACING
Abstract
Described herein is a merged data path unit that has elements that are configurable to switch between different instruction types. The merged data path unit is a pipelined unit that has multiple stages. Between different stages lie multiplexor layers that are configurable to route data from functional blocks of a prior stage to a subsequent stage. The manner in which the multiplexor layers are configured for a particular stage is based on the instruction type executed at that stage. In some implementations, the functional blocks in different stages are also configurable by the control unit to change the operations performed. Further, in some implementations, the control unit has sideband storage that stores data that "skips stages." An example of a merged data path used for performing a ray-triangle intersection test and a ray-box intersection test is also described herein.

20200193683 ROBUST RAY-TRIANGLE INTERSECTION
Abstract
A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.

20200193684 EFFICIENT DATA PATH FOR RAY TRIANGLE INTERSECTION
Abstract
Described herein is a technique for performing ray-triangle intersection without a floating point division unit. A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, removing division.

20200193685 WATER TIGHT RAY TRIANGLE INTERSECTION WITHOUT RESORTING TO DOUBLE PRECISION
Abstract
Described herein is a technique for performing ray-triangle intersection test in a manner that produces watertight results. The technique involves translating the coordinates of the triangle such that the origin is at the origin of the ray. The technique involves projecting the coordinate system into the viewspace of the ray. The technique then involves calculating barycentric coordinates and interpolating the barycentric coordinates to get a time of intersect. The signs of the barycentric coordinates indicate whether a hit occurs. The above calculations are performed with a non-directed floating point rounding mode to provide watertightness. A non-directed rounding mode is one in which the mantissa of a rounded number is rounded in a manner that is not dependent on the sign of the number.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |