Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Glo. · Jan 21, 2023

MrTeal said:
What would be the point of a 24CU IGP on an APU? That's a pretty massive GPU, just slightly smaller than a RX 7600M/S, but that's a 90W/75W TDP solution. It also has 32MB of IC and 256GB/s of bandwidth out to GDDR6. With dual channel LPDDR5x the whole APU would have 120GB/s bandwidth, with DDR5-5600 it would be less and shared with the CPU. Unless they completely change the memory system, that poor GPU would be completely memory staved.

The point of a large iGPU is to scale the usage of such product to larger number of use cases. Car infotainment, wearables, wearable VR, handhelds, push for efficiency, push for reduction of design costs, and manufacturing costs, push for AI expansion and use cases, and plenty more.

People really live in the past thinking that everything will be as always was, when we are on the brink of software/hardware/experience paradigm shift.

If you do not get it, already: scaling iGPUs larger is to increase TAM, and use cases to increase both volume, and profit margins. The goal(of AMD and Intel) is that APUs/SOCs are going to be 90% of all of computing. That is the reason why Nvidia tried to buy ARM, to have a competitive edge in a world where all they have are essentially non-APU projects.

And no, we are not talking about small iGPUs, integrated in CPUs or CPU packages. We are talking about big and powerful GPUs.

Glo. · Jan 21, 2023

In the discussion about Strix Point, and 24 CUs.

IF the rumors are true, and if Kepler is correct about RDNA3 architecture being fixed for Strix Point - it would mean that Dual Issue is working properly, and we should expect the fabled 256 ALUs/WGP.

So if SP has 24 CUs/12 WGPs, then it also has 3072 vALUs/1536 ALUs.

insertcarehere · Jan 21, 2023

Glo. said:
The point of a large iGPU is to scale the usage of such product to larger number of use cases. Car infotainment, wearables, wearable VR, handhelds, push for efficiency, push for reduction of design costs, and manufacturing costs, push for AI expansion and use cases, and plenty more.

Except for Car infotainment, none of the other use cases listed here require, or frankly want a large iGPU that requires more power than what those use cases can provide for the large iGPU to scale, 24CU RDNA3 is a waste running at 15w.

Glo. said:
People really live in the past thinking that everything will be as always was, when we are on the brink of software/hardware/experience paradigm shift.

We live in a world where cost per transistor has basically stagnated and wafer costs are skyrocketing with each new process, and therefore every mm^2 of silicon is precious. An APU with a large GPU component (24CU RDNA3 IGP would definitely qualify) being sold to the public will inevitably face at least some consumers that don't assign a large premium to the GPU part, effectively making that chunk of die space a waste. It just makes more sense to cut that extra silicon out of the APU and assign it to actual GPUs, where there is far more certainty that potential buyers would assign value to the GPU in question.

Glo. said:
If you do not get it, already: scaling iGPUs larger is to increase TAM, and use cases to increase both volume, and profit margins. The goal(of AMD and Intel) is that APUs/SOCs are going to be 90% of all of computing. That is the reason why Nvidia tried to buy ARM, to have a competitive edge in a world where all they have are essentially non-APU projects.

And no, we are not talking about small iGPUs, integrated in CPUs or CPU packages. We are talking about big and powerful GPUs.

As per above, in an era where Wafers costs are skyrocketing, spending extra silicon on attributes where it's uncertain to be valued by end consumers makes little sense.

To put it in a more concrete example:
- AMD can probably make a hell of an APU if they were willing to go big with ~280mm^2 die on 5nm. That's the sort of space where ~N33 performance in an iGPU would be very possible. The problem is that 280mm^2 is a lot of silicon, equivalent to either:
- 4 Zen 4 CCDs, which is not very far away from 2 7950Xs, selling for ~$600+ each.
- Navi 31 GCD, which is the core chip to the 7900XTX, a product that sells for $1k.
Now, can 8c Zen4 with an N33-class iGPU sell for the sort of premiums which can compete with these options for AMD internally?

Glo. · Jan 21, 2023

insertcarehere said:
Except for Car infotainment, none of the other use cases listed here require, or frankly want a large iGPU that requires more power than what those use cases can provide for the large iGPU to scale, 24CU RDNA3 is a waste running at 15w.

First: they don't need NOW. The use case for large iGPUs is coming soon-ish. Actually - next year.

Secondly. Tell all of that to Intel, who is doing EXACTLY the same thing as AMD. Why is Intel doing the exatly same thing as AMD on this front, hmmm?

insertcarehere said:
As per above, in an era where Wafers costs are skyrocketing, spending extra silicon on attributes where it's uncertain to be valued by end consumers makes little sense.

To put it in a more concrete example:
- AMD can probably make a hell of an APU if they were willing to go big with ~280mm^2 die on 5nm. That's the sort of space where ~N33 performance in an iGPU would be very possible. The problem is that 280mm^2 is a lot of silicon, equivalent to either:
- 4 Zen 4 CCDs, which is not very far away from 2 7950Xs, selling for ~$600+ each.
- Navi 31 GCD, which is the core chip to the 7900XTX, a product that sells for $1k.
Now, can 8c Zen4 with an N33-class iGPU sell for the sort of premiums which can compete with these options for AMD internally?

And how much more feasible financially is designing two separate designs on 3 nm process, than single one, with much simpler implementation, much simpler needs in terms of PCB, controllers, memory, etc?

Aapje · Jan 21, 2023

insertcarehere said:
We live in a world where cost per transistor has basically stagnated and wafer costs are skyrocketing with each new process, and therefore every mm^2 of silicon is precious.

You are forgetting about chiplets. That way they can add fairly small iGPU chiplets that cost just a few bucks to make and have huge yields.

They can also vary the iGPU based on need. So they can add a fairly slow iGPU on a big node if the demands are low and a fairly fast one on a smaller node for more demanding uses.

GodisanAtheist · Jan 21, 2023

Aapje said:
You are forgetting about chiplets. That way they can add fairly small iGPU chiplets that cost just a few bucks to make and have huge yields.

They can also vary the iGPU based on need. So they can add a fairly slow iGPU on a big node if the demands are low and a fairly fast one on a smaller node for more demanding uses.

-This is th root of my initial question on the last page, why do monolithic APUs exist anymore? IMO AMD's next step is to have a heterogeneous compute package that has a CPU, a GCD, and an IO die on package.

The GCD would be an N34 (or N35 even) class die, tiny <100mm2 die that can go on add in cards for more power and bandwidth or right on the CPU package for a more powerful IGP.

Heartbreaker · Jan 21, 2023

GodisanAtheist said:
-This is th root of my initial question on the last page, why do monolithic APUs exist anymore? IMO AMD's next step is to have a heterogeneous compute package that has a CPU, a GCD, and an IO die on package.

The GCD would be an N34 (or N35 even) class die, tiny <100mm2 die that can go on add in cards for more power and bandwidth or right on the CPU package for a more powerful IGP.

Monolithic is still more power efficient, and the APUs are aimed at mobile.

Glo. · Jan 21, 2023

GodisanAtheist said:
-This is th root of my initial question on the last page, why do monolithic APUs exist anymore? IMO AMD's next step is to have a heterogeneous compute package that has a CPU, a GCD, and an IO die on package.

The GCD would be an N34 (or N35 even) class die, tiny <100mm2 die that can go on add in cards for more power and bandwidth or right on the CPU package for a more powerful IGP.

Because its cheaper to design and yield ONE monolithic design, than TWO separate designs.

The only place where, for AMD, its cheaper to break the designs into pieces is like N31 was executed. By moving the cache and memory controllers into seperate chiplet and seperate process.

Two seperate designs, with over 1 bln USD costs, each are going to cost less than one single with development costs of 1 bln dollars, period? iGPU and CPU designs are going to be different, even on the same process.

Thats the whole point why you keep APUs monolithic in the case of AMD.

Intel has its own fabs, for their CPUs and chipsets, and uses TSMC for the iGPUs. In their use case - it will be beneficial for them to break it apart.

Kronos1996 · Jan 21, 2023

TESKATLIPOKA said:
I don't mean that there would be both N24 and this chip. I meant this one should have been designed and released as N24.

N24 was aimed against GA107.
MX models are much weaker, with a single exception being MX570.
GeForce MX570 was announced a month earlier than N24 and has comparable performance to a cutdown N24 is based on GA107.
6500M(Full N24) is comparable to RTX 3050(GA107).
Phoenix should already provide the same level of performance as this cutdown N24 making It rather pointless.
There is absolutely no good reason for N24 to be produced for an additional 3-5 years when even now barely anyone wants them, which is evident by the amount of laptops with It.

Because AMD needs something for <=$249.
That 150mm2 chip wouldn't have worse profits than N24, and It also wouldn't cost much more to make.
It's not like N24 is much cheaper to make than N33, when you compare versions with 8GB Vram, yet price will be very different.
Making a 107mm2 GPU which after a year is made pointless by an IGP from the same company doesn't make much sense to me. At least my beefed up version of N24 would still be >50% faster than Phoenix and could be sold at least until Strix is out.

P.S. I think that 12GB version of mine is too costly to make because of clamshell, I would keep only the 6GB version for $239.

Navi 24 and Navi 33 can coexist because the cost and price point are far enough apart that it makes sense to have both. You’re 150mm2 design has way too much overlap with both and would likely have a worse price/perf then what AMD made. It would probably have the same performance as Navi 24 just with the PCIE lanes, Infinity Cache and/or Bus Width increased. Navi 14 is 158mm2 with similar performance on a similar node. Most of the space savings came from stripping out IO. So if we say that restoring minimum acceptable IO for a desktop chip would add ~50mm2 and then add another ~50mm2 to get that 50% performance improvement guess what we end up with? 200mm2 Navi 33…

You’re looking only at gaming performance and ignoring the largest segment of the laptop market. Business notebooks and micro-desktops. Many businesses need a GPU with professional driver support but not super powerful. Still, Navi 24 with it’s meager 4GB of VRAM is plenty and probably kicks the shit out of iGPU’s in professional applications. That’s almost certainly where most of Navi 24 is being sold along with a huge chunk of all AMD laptops right now. Hence why you don't see many in stores. Business PC’s are like the server market, very high volume and long-term profitable sales. However, if takes years to build up relationships with those customers.

TESKATLIPOKA · Jan 21, 2023

Kronos1996 said:
Navi 24 and Navi 33 can coexist because the cost and price point are far enough apart that it makes sense to have both. You’re 150mm2 design has way too much overlap with both and would likely have a worse price/perf then what AMD made. Forgive me if I give more weight to the opinion of the company that designed and built them.

If you are sure my 150mm2 design priced at $239 would be too close to Navi 33 then you can share with us how much N33 will be sold for.
My design would certainly have much better performance/price than N24. I can't tell how It would fare against N33, because I don't know Its price.

You’re looking only at gaming performance and ignoring the largest segment of the laptop market. Business notebooks and micro-desktops. Many businesses need a GPU with professional driver support but not super powerful. Still, Navi 24 with it’s meager 4GB of VRAM is plenty and probably kicks the shit out of iGPU’s in professional applications. That’s almost certainly where most of Navi 24 is being sold along with a huge chunk of all AMD laptops right now. Hence why you don't see many in stores. Business PC’s are like the server market, very high volume and long-term profitable sales. However, if takes years to build up relationships with those customers.

Mobile N24 is RX 6300M, RX 6450M, RX 6500M, RX 6550M and for mobile workstations Pro W6300M and Pro W6400M

There is such a huge demand for N24 in laptops(consumer + business), that I could find only 3 different laptops with Navi24.
Laptop models with 6500M: HP VICTUS, ThinkPad Z16 G1 and Bravo 15 B5E
Nothing else exist as far as I know.
This doesn't say anything positive about N24's sales.

It would be best If you provided some data to back up what you said.

MrTeal · Jan 21, 2023

Glo. said:
The point of a large iGPU is to scale the usage of such product to larger number of use cases. Car infotainment, wearables, wearable VR, handhelds, push for efficiency, push for reduction of design costs, and manufacturing costs, push for AI expansion and use cases, and plenty more.

People really live in the past thinking that everything will be as always was, when we are on the brink of software/hardware/experience paradigm shift.

If you do not get it, already: scaling iGPUs larger is to increase TAM, and use cases to increase both volume, and profit margins. The goal(of AMD and Intel) is that APUs/SOCs are going to be 90% of all of computing. That is the reason why Nvidia tried to buy ARM, to have a competitive edge in a world where all they have are essentially non-APU projects.

And no, we are not talking about small iGPUs, integrated in CPUs or CPU packages. We are talking about big and powerful GPUs.

I thought we were talking about iGPUs integrated into CPUs or CPU packages. Unless things have changed, Strix Point is going to be a pretty typical APU with a dual channel DDR5/LPDDR5 interface. Rembrandt/6900HX is a 12CU solution, and it already shows good performance scaling moving from DDR5-4800 to DDR5-5600 showing its bandwidth dependency.

A 24CU solution with really no more bandwidth is going to really be starved. Not saying it wouldn't be more performant than the 12CU one, but they're only so much GPU you can shove into an APU with a typical dual channel memory interface before you run into huge diminishing returns. That's even before crappy OEMs ship em with a single 16GB SODIMM populated.

Glo. · Jan 21, 2023

MrTeal said:
A 24CU solution with really no more bandwidth is going to really be starved. Not saying it wouldn't be more performant than the 12CU one, but they're only so much GPU you can shove into an APU with a typical dual channel memory interface before you run into huge diminishing returns. That's even before crappy OEMs ship em with a single 16GB SODIMM populated.

Indeed Strix Point appears to be typical 128 bit bus DDR5 APU.

But the GPU will not be starved thanks to 32 MB L4 cache/System Cache, as per:

https://twitter.com/x/status/1397066029335945216

MrTeal · Jan 21, 2023

I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.

insertcarehere · Jan 21, 2023

Glo. said:
First: they don't need NOW. The use case for large iGPUs is coming soon-ish. Actually - next year.

Secondly. Tell all of that to Intel, who is doing EXACTLY the same thing as AMD. Why is Intel doing the exactly same thing as AMD on this front, hmmm?

Yes, Nostradamus, please tell me how a Steam Deck that can empty its battery in 90 minutes as is benefits from having a 40-50W APU, which 24CU will need to get decent performance benefits. Or the sci-fi batteries that it would take to get a laptop-class GPU powered in a wearable of all things.

And no, Intel is not doing the exactly the same thing given that they're breaking out the GPU into dedicated chiplets, instead of staying with a Monolithic design.

Glo. said:
And how much more feasible financially is designing two separate designs on 3 nm process, than single one, with much simpler implementation, much simpler needs in terms of PCB, controllers, memory, etc?

AMD evidently thought it was worth it to break out their CPUs and GPUs into small modular components (CCDs, IODs...etc) and incur additional design costs there. I struggle to see why that suddenly stops being the case with APUs.

Aapje said:
You are forgetting about chiplets. That way they can add fairly small iGPU chiplets that cost just a few bucks to make and have huge yields.

They can also vary the iGPU based on need. So they can add a fairly slow iGPU on a big node if the demands are low and a fairly fast one on a smaller node for more demanding uses.

Indeed, I think GPU chiplets are increasingly the way to go for iGPUs that aspire to be more than "boot up the computer" and "basic media acceleration", a single die with everything included will inevitably have some parts of it not be valued in a way that a more modular solution with chiplets can mitigate.

Kepler_L2 · Jan 21, 2023

Glo. said:
In the discussion about Strix Point, and 24 CUs.

IF the rumors are true, and if Kepler is correct about RDNA3 architecture being fixed for Strix Point - it would mean that Dual Issue is working properly, and we should expect the fabled 256 ALUs/WGP.

So if SP has 24 CUs/12 WGPs, then it also has 3072 vALUs/1536 ALUs.

That's not what's fixed in RDNA3+

TESKATLIPOKA · Jan 22, 2023

MrTeal said:
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.

It wouldn't look bad as a refresh for Xbox Series S, but this would cost more to make than what's currently inside in Xbox, so I am sceptical. If Microsoft or Sony wants to make a refresh, then I don't see It happening without a price increase.

About that SLC cache, 32MB doesn't look much If It's shared. That's why I calculated with 64MB SLC.
If someone has RX 6600XT, then he could test It by downclocking the 16gbps Vram to see what happens at 1080p in some benchmark.

Glo. · Jan 22, 2023

insertcarehere said:
Yes, Nostradamus, please tell me how a Steam Deck that can empty its battery in 90 minutes as is benefits from having a 40-50W APU, which 24CU will need to get decent performance benefits. Or the sci-fi batteries that it would take to get a laptop-class GPU powered in a wearable of all things.

There is no need for that.

At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.

insertcarehere said:
And no, Intel is not doing the exactly the same thing given that they're breaking out the GPU into dedicated chiplets, instead of staying with a Monolithic design.

AMD evidently thought it was worth it to break out their CPUs and GPUs into small modular components (CCDs, IODs...etc) and incur additional design costs there. I struggle to see why that suddenly stops being the case with APUs.

Indeed, I think GPU chiplets are increasingly the way to go for iGPUs that aspire to be more than "boot up the computer" and "basic media acceleration", a single die with everything included will inevitably have some parts of it not be valued in a way that a more modular solution with chiplets can mitigate.

For that, I can only quote myself in post nr http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...ectures-thread.2589999/page-166#post-40935434

It is full answer and explenation.

At 3 nm process, it would cost AMD 2x the amount it would take to design single monolithic APU. Intel - is a different story on this front.

Intel is doing EXACTLY the same thing as AMD: they are building large iGPUs, for powerful, desktop class graphics performance in a tiny thermal envelope, and integrated into CPU packages. Why they execute it differently - I told you. Because they have their own fabs for CPUs. For Intel it will still be more beneficial to break the designs apart. For AMD - it wont, since they have CPU and GPU designs on TSMC process nodes.

Glo. · Jan 22, 2023

Kepler_L2 said:
That's not what's fixed in RDNA3+

Very interesting. Thanks Kepler.

TESKATLIPOKA said:
It wouldn't look bad as a refresh for Xbox Series S, but this would cost more to make than what's currently inside in Xbox, so I am sceptical. If Microsoft or Sony wants to make a refresh, then I don't see It happening without a price increase.

About that SLC cache, 32MB doesn't look much If It's shared. That's why I calculated with 64MB SLC.
If someone has RX 6600XT, then he could test It by downclocking the 16gbps Vram to see what happens at 1080p in some benchmark.

The only way I can see SLC being 64 MB - 3D VCache.

Glo. · Jan 22, 2023

MrTeal said:
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.

32 MB of L4/IC for 1536 ALUs is more than Navi 23 and 33 have, for 2048 ALUs.

It should be fine(enough).

P.S. For Strix Point, since its rumored to have 24 CUs, and L4/IC cache Im willing to increase the perf. target for highest end SKU(24 Cu clocked at 3 GHz) to at least 8000 pts in 3DMark TS Graphics .

maddie · Jan 22, 2023

MrTeal said:
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.

I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.

TESKATLIPOKA · Jan 22, 2023

Glo. said:
At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.

15W is for the whole SoC.
Phoenix with only 12CU needs 45W for 3GHz boost. It's for both CPU+IGP. Let's say during gaming, It is 1:2 ratio. Then 12CU IGP consumes 30W at <3GHz. Yet, you expect a 3nm 24CU IGP at 2GHz to consume only 15W.
I find It too optimistic, but Phoenix limited to 25W will tell us more.

Glo. said:
32 MB of L4/IC for 1536 ALUs is more than Navi 23 and 33 have, for 2048 ALUs.

It should be fine(enough).

P.S. For Strix Point, since its rumored to have 24 CUs, and L4/IC cache Im willing to increase the perf. target for highest end SKU(24 Cu clocked at 3 GHz) to at least 8000 pts in 3DMark TS Graphics .

If you clock 24CU to 3GHz sustained, then It will be exactly between 7700S and 7600M XT.
The other thing about IC is that It's used as a buffer for the GPU, so It doesn't need to move data from Vram.
32MB IC allows only 55% hitrate at 1080p. If 32MB is shared then even less.
Every time there is a miss, you have to go to a much slower system memory compared to what N33 has.
This is the reason why I wanted a bigger LLC, to increase hit rate.

insertcarehere · Jan 22, 2023

Glo. said:
There is no need for that.

At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.

Phoenix APU needs 45w to clock 12CU RDNA3 at "up to" 3ghz, but apparently all it takes is a die shrink and twice the CUs can be run at one third the power - while leaving the CPU with enough to live on.

Just as a reference, the Steam deck's 8CU RDNA 2 APU runs at ~1.5ghz in 15w power envelope.

Glo. said:
It is full answer and explenation.

At 3 nm process, it would cost AMD 2x the amount it would take to design single monolithic APU. Intel - is a different story on this front.

Intel is doing EXACTLY the same thing as AMD: they are building large iGPUs, for powerful, desktop class graphics performance in a tiny thermal envelope, and integrated into CPU packages. Why they execute it differently - I told you. Because they have their own fabs for CPUs. For Intel it will still be more beneficial to break the designs apart. For AMD - it wont, since they have CPU and GPU designs on TSMC process nodes.

Intel is absolutely not doing the same thing. They're breaking out the GPU and CPU bits out in Meteor lake to be able to pick and match parts according to specific customer requirements - which a monolithic APU cannot achieve.

Building a monolithic APU with a big iGPU means accepting that big iGPU is costing money in terms of extra silicon, buy that extra silicon is something which many OEMs simply won't pay a corresponding premium for. Dell/Lenovo will not pay more money for a laptop CPU with a big iGPU in a Thin & Light business notebook because the Big 4s/Mckinseys of the the world will not pay more money for a Thin & Light business notebook that can also game well. These sorts of considerations are probably at least some of the reason Intel chose the approach it did for MTL.

TESKATLIPOKA · Jan 22, 2023

maddie said:
I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.

It's not a silver bullet solution, but probably the cheapest or easiest one to compensate low BW.
What would you do? Double bus width or use GDDR6 as a system memory?
If It was me, then I would use a single HBM stack.

Glo. · Jan 22, 2023

insertcarehere said:
Intel is absolutely not doing the same thing. They're breaking out the GPU and CPU bits out in Meteor lake to be able to pick and match parts according to specific customer requirements - which a monolithic APU cannot achieve.

Building a monolithic APU with a big iGPU means accepting that big iGPU is costing money in terms of extra silicon, buy that extra silicon is something which many OEMs simply won't pay a corresponding premium for. Dell/Lenovo will not pay more money for a laptop CPU with a big iGPU in a Thin & Light business notebook because the Big 4s/Mckinseys of the the world will not pay more money for a Thin & Light business notebook that can also game well. These sorts of considerations are probably at least some of the reason Intel chose the approach it did for MTL.

As has been explained to you already. The ONLY reason why Intel went for tiles for their mobile MTL and ARL SOCs is because the CPU portion will be manufactured on Intel nodes, and GPU on TSMCs.

And partially, you are correct that Intel is not doing the same thing as AMD. Its the other way around, its AMD who has to compete with Intel's volume and wide availability of their products, which is why they have to build overkill products to sell. If Intel is going to release ARL-P with 384 EUs/3072 ALUs - AMD has to respond.

maddie said:
I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.

It isn't silver bullet. But for such small GPU as SP's - its enough to feed the ALUs.

Heartbreaker · Jan 22, 2023

TESKATLIPOKA said:
It's not a silver bullet solution, but probably the cheapest or easiest one to compensate low BW.
What would you do? Double bus width or use GDDR6 as a system memory?
If It was me, then I would use a single HBM stack.

IMO, this discussion about an APU, with a big GPU, has been happening forever, and is really just pointless wishful thinking.

I'd like one too, but it isn't going to happen. These are laptop chips aimed at millions of generic laptops, there is substantial pressure to make these chips as inexpensive as possible (small) while remaining competitive.

Putting in a large GPU (this time about double the size of what standard BW can supply), is turning it into a more expensive niche part.

IMO the best we can hope for is just slow evolution we have been getting to stay within the generic memory bandwidth.

AMD would happily build a Big GPU part for anyone that wants to pay the costs (just like they do for consoles) but no one believes in this enough to commission it on the PC side, and neither does AMD.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Diamond Member