Discussion Intel current and future Lakes & Rapids thread

Page 790 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,281
136
So far this is the only shot of the Sierra Forest Wafer, Am I seeing a 6 x 4 grid(Quad Core cluster per square) or a 5 x 4 grid? Two blank squares for Memory Controller



 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I also agree that the 30% IPC claims are BS, I think it's going to be in the regular IPC uplift ground. Just curious on how big Lion Cove could end up being lol.
Redwood to Lion Cove is probably close to the norm, but desktop is going to skip Redwood Cove at least for the high end, so it'll be able to claim greater than normal gains, just like Core 2 was said to be a near 2x gain while the real predecessor was Core Duo and was only about 20% faster.

Sierra Forest having 144 cores seems to indicate similar strategy to AMD. And neither E cores sound efficient either. 350W?
 
Reactions: mikk

Hulk

Diamond Member
Oct 9, 1999
4,279
2,099
136
I will have a shot at this, from personal experience. I supported an application that was Nationwide (US) that supported thousands of users. It was an Oracle database with (from 2002-2016) hundreds of terrabytes of data, and required servers that cost in the MILLIONS of dollars. Our first upgrade was in 2004, and it cost $4.6 million just for the server. These were "partitioned" into multiple logical servers to do "regions" independently for performance reasons.

Another example. Anandtech forums. They were run on an x86 platform, and the first upgrade was to AMD Operons somewhere around 2001-2003. No idea if the database goes back that far, but Anand was still running the show then. And I don't remember the hardware, but is was a LOT of cores, and a LOT of disk space/.

NOW you have the "cloud". Cloud providers take large servers with lots of cores, and make virtual machines of whatever size people want. I am not as expert in this area, so maybe others will comment.

So these processors are mainly processing internet requests for data? Is this the reason that over the years, despite increase traffic, the internet seems to be getting "faster?" I mean the amount of compute per user must be much higher than 20 years ago seeing how much data center compute has increased over that time span.

I have a good general understanding of how computers work but the scale of the internet is mind boggling. Especially when I think about all of the video streaming going on.. Netflix, Hulu, ESPN, Disney+, Paramount+, Sling, just to name a few. Not only do they all work but they are pretty much flawless and deliver HD content to millions, no tens of millions of people. It's nuts.

I guess it's these massive data centers that do the heavy lifting for all of this?
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,668
14,676
136
So these processors are mainly processing internet requests for data? Is this the reason that over the years, despite increase traffic, the internet seems to be getting "faster?" I mean the amount of compute per user must be much higher than 20 years ago seeing how much data center compute has increased over that time span.

I have a good general understanding of how computers work but the scale of the internet is mind boggling. Especially when I think about all of the video streaming going on.. Netflix, Hulu, ESPN, Disney+, Paramount+, Sling, just to name a few. Not only do they all work but they are pretty much flawless and deliver HD content to millions, no tens of millions of people. It's nuts.

I guess it's these massive data centers that do the heavy lifting for all of this?
The processors I worked with were processing database records, not Internet stuff. But the use of multicore servers varies from database, to encryption, encoding, computing raw data (not a database), and of course internet stuff, not to mention streaming, etc...

As to internet use/traffic, think of this. around 1985 I was logging into PSU (college) using a 300 baud modem. Around 2000 I was up to 9600 baud. I can't remember exactly when I went from that to a cable modem or fiber optic, but think of the speed increases and what it takes to drive all that ! Now I have gigabit internet ??? And you can get faster than that.
 
Reactions: igor_kavinski

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Yea and the weird split we have now between product lines should all converge in that year with Panther Lake.

Only the Sierra Forest-SP will have 144 at 350W( two compute tile) The AP For can have twice as much per socket
It's ONE compute tile. Where are people getting two compute tiles? Two compute tiles = 288 cores.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
So these processors are mainly processing internet requests for data? Is this the reason that over the years, despite increase traffic, the internet seems to be getting "faster?" I mean the amount of compute per user must be much higher than 20 years ago seeing how much data center compute has increased over that time span.

I have a good general understanding of how computers work but the scale of the internet is mind boggling. Especially when I think about all of the video streaming going on.. Netflix, Hulu, ESPN, Disney+, Paramount+, Sling, just to name a few. Not only do they all work but they are pretty much flawless and deliver HD content to millions, no tens of millions of people. It's nuts.

I guess it's these massive data centers that do the heavy lifting for all of this?
Your question would be best answered asking any AWS architect as all those streaming services rely on AWS.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
The processors I worked with were processing database records, not Internet stuff. But the use of multicore servers varies from database, to encryption, encoding, computing raw data (not a database), and of course internet stuff, not to mention streaming, etc...

As to internet use/traffic, think of this. around 1985 I was logging into PSU (college) using a 300 baud modem. Around 2000 I was up to 9600 baud. I can't remember exactly when I went from that to a cable modem or fiber optic, but think of the speed increases and what it takes to drive all that ! Now I have gigabit internet ??? And you can get faster than that.
Fios sent me an ad around Christmas of residential 2 Gb coming to my area. I didn't keep it but I don't recall the period price bad or the normal price once the promo period ends. It's a lot better than what they were charging years ago at my last residence. I'm currently on 600 Mbps and that's much more than I need.
 

Geddagod

Golden Member
Dec 28, 2021
1,165
1,049
106
Its a Sierra Forest Wafer no doubt about it.

View attachment 78818

As it stand its a pretty big Die, larger than a single Sapphire Rapids compute tile
Oh no I meant it's Sierra Forest, just the medium core count config of the die. Like how SPR has several different dies for different marketing segments.
The thing that makes me hesitate to call that 1/2 of the SRF 2 compute die package is the fact that if that's true, 25% of the cores are going to be disabled.
But if it's larger than 400mm^2, (larger than SPR compute tile) then you might be right. Because a single 144 core tile would then be a bit shy of 600mm^2, and idk if Intel wants to do that...
But then again, Intel did that (and way larger than that lol) with EMR, so who knows.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Oh no I meant it's Sierra Forest, just the medium core count config of the die. Like how SPR has several different dies for different marketing segments.
Actually if you download the images from Intel's site and zoom in, it's pretty clear to me that it's 4x4 config with two tiles missing, meaning there's only 14 clusters.

The bottom rows do not exist. In certain angles it looks that way but that pattern is repeated through the wafer, where the clusters are clear and the supposed "rows" that @nicalandia shows in his shot is much less defined, meaning it's not a core cluster.

This is further supported by the fact that each cluster has a distinctive lighter colored dot towards the top center, while such dot does not exist below 4x4. That dot is likely the routing circuitry. It seems to me if you zoom out a bit it's even clearer that it's 4 rows. So if anything I think that's the 112 core die, and they'll add another row at the top for 18 clusters, making it 144 cores. I do not think that number is a coincidence.

Now consider that each "cluster" is very large at way over 15mm2, possibly large as 25mm2, when Meteorlake's Crestmont on Intel 3 is only about 1mm2 for core sans L2. So there's a possibility that it's something like an 8-core cluster!
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Now consider that each "cluster" is very large at way over 15mm2, possibly large as 25mm2, when Meteorlake's Crestmont on Intel 3 is only about 1mm2 for core sans L2. So there's a possibility that it's something like an 8-core cluster!
Yeah, something about the sizes here just isn't adding up. I almost wonder if we could be looking at 4x4c blocks? But the numbers there seem like they would be too high. Hmm....
Actually if you download the images from Intel's site
Where do you see these wafer shots posted?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Yeah, something about the sizes here just isn't adding up. I almost wonder if we could be looking at 4x4c blocks? But the numbers there seem like they would be too high. Hmm....

Where do you see these wafer shots posted?
Intel Newsroom. It has multiple high resolution photos, including the relevant one.

12 cores could work too but 8 core x 18 cluster just fits so well. We don't know the details of Sierra Forest and whether they are using the same client core as Crestmont or even adding enhancements. Unless they are doing something really fancy like 8x Skymont clusters, but that seems too early especially for server.*

*Though competitively, it would need something like that to beat Bergamo that's coming quite a bit earlier.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
12 cores could work too but 8 core x 18 cluster just fits so well. We don't know the details of Sierra Forest and whether they are using the same client core as Crestmont or even adding enhancements.
I think it's Crestmont based, and probably using 4c clusters. So what I'm having a hard time reconciling is that if you were to pair two together, it would probably have a ~2:1 ratio. Something just isn't quite fitting in my head here. Wish they'd just release the die shots at this point.
*Though competitively, it would need something like that to beat Bergamo that's coming quite a bit earlier.
At least for comparable market segments, SRF should be quite competitive with Bergamo. Honestly, I still think people are underestimating the Forest line. I wonder if they're leading with CWF on 18A because it's enough for a generational upgrade over GNR, not just SRF. IPC should be pretty close, and then add in a full node advantage...
 

Saylick

Diamond Member
Sep 10, 2012
3,269
6,752
136
At least for comparable market segments, SRF should be quite competitive with Bergamo. Honestly, I still think people are underestimating the Forest line. I wonder if they're leading with CWF on 18A because it's enough for a generational upgrade over GNR, not just SRF. IPC should be pretty close, and then add in a full node advantage...
Hmm. Possibly. 144 post-Gracemont cores vs. 128 Zen 4c cores that have half the L3 cache and lower clocks, but has SMT. Both should be ~300-350W at the high end.
 

Geddagod

Golden Member
Dec 28, 2021
1,165
1,049
106
I think it's Crestmont based, and probably using 4c clusters. So what I'm having a hard time reconciling is that if you were to pair two together, it would probably have a ~2:1 ratio. Something just isn't quite fitting in my head here. Wish they'd just release the die shots at this point.

At least for comparable market segments, SRF should be quite competitive with Bergamo. Honestly, I still think people are underestimating the Forest line. I wonder if they're leading with CWF on 18A because it's enough for a generational upgrade over GNR, not just SRF. IPC should be pretty close, and then add in a full node advantage...
INT IPC for Gracemont is ~80% of that of GLC, so it's not that bad. While idk the IPC gain from gracemont to tremont, ik tremont vs goldmont+ was a 30% IPC gain.
FP IPC for Gracemont is ~60% of GLC though. And the problem with adding more FP IPC is that the FP block on modern cores are huge, and IIRC that was one of the blocks that allowed for such great shrinking of the small cores.
On a slight aside, if you look at the Intel slides all the way back from 2020, the "next mont" after Gracemont is ST performance and Frequency, as well as features.
Idk if this is referencing crestmont, which it might be, but what new "features" are being implemented on crestmont? Hmm
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Hmm. Possibly. 144 post-Gracemont cores vs. 128 Zen 4c cores that have half the L3 cache and lower clocks, but has SMT. Both should be ~300-350W at the high end.
I was thinking more SRF-AP, assuming they do release a 288c version. Core to core, Crestmont vs Zen 4c would not go well for Intel, but they should be able to throw in significantly more cores per area.

Though on the topic of TDP, I wonder what SRF-SP TDPs will end up being. We reasonably believe the socket supports up to 350W, but they won't necessarily run every chip at that level. Just thinking back to GRT as a reference, something like 5W/cluster should be a pretty good spot. So maybe 200W for cores? Though the uncore will presumably be power hungry as well.
 
Reactions: mderbarimdiger

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
On a slight aside, if you look at the Intel slides all the way back from 2020, the "next mont" after Gracemont is ST performance and Frequency, as well as features.
Idk if this is referencing crestmont, which it might be, but what new "features" are being implemented on crestmont? Hmm
I think that reference probably predated Crestmont's addition to the roadmap. But let's say Crestmont is smaller, maybe around 10%, and then a big 20% jump with Skymont in 2024, and then a tick Darkmont(?) for 10% in 2025...

Even if those numbers wind up being off, which they probably will, it's quite plausible that Intel will have a RWC-class IPC Atom core in 2025. Add in a full node advantage, and I could totally see it being an upgrade over even GNR at a server operating point. And if DMR isn't coming till 2026, it better have more than just Lion Cove. That won't be competitive if released midway between Zen 6 and 7.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
INT IPC for Gracemont is ~80% of that of GLC, so it's not that bad. While idk the IPC gain from gracemont to tremont, ik tremont vs goldmont+ was a 30% IPC gain.
No the difference is greater than that. The overall difference is 40-50% between Raptormont and Raptor Cove. Remember, Gracemont is uarch Skylake-level, but it has a weaker uncore. Raptorlake's advancements brings it to Skylake level.

Sunny Cove = 18%, Golden Cove = 19%, makes it roughly 40%. The tradeoffs in area/power further increases the difference.

We told you before. Integer performance is uarch. FP doesn't really matter. The Floating Point Unit was added with the Intel 486 chip and previous to that it was on a card, therefore it has accelerator roots. And up until Pentium II era before 3D cards existed, games were using software rendering pipelines using CPU's FP units. That's why Pentium got the MMX. After Voodoo came out, FP kinda went to the back burner again. Sure it mattered, but just a little.

If you double the amount of FP/Vector units, you can potentially get 2x the performance in FP code. Indeed, the HPC market can get that. Such big gains are relatively straightforward since FPU's are essentially accelerators.

If you want to increase Integer performance by just 20%, you need to do all kinds of voodoo. Branch prediction algorithm changes and buffer increases, you need to increase the load/store unit, it cares about the amount of pipeline stages, you have to boost the amount of decoders, the cache latencies really matter. If you double the ALUs, general Integer would say "ALU who?"

Look at pipeline stages. They are talking about Integer pipeline stages. They don't call it integer because Integer = general purpose code. FP pipelines are quite a bit longer but it doesn't matter because it's lot less "branchy" does easier to have instruction parallelism. There's the accelerator part again.
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,165
1,049
106
No the difference is greater than that. The overall difference is 40-50% between Raptormont and Raptor Cove. Remember, Gracemont is uarch Skylake-level, but it has a weaker uncore. Raptorlake's advancements brings it to Skylake level.

Sunny Cove = 18%, Golden Cove = 19%, makes it roughly 40%. The tradeoffs in area/power further increases the difference.

The vector performance between the two is much bigger but that's more situational. Microarchitecture in CPUs is by default Integer performance, and of course that benefits FP performance as it's uarch changes. Integer performance is the foundation, and rest are gravy. FP unit was added to the 486, when prior to that chip it was off-chip. So it's an accelerator(of sorts).

FP difference between Gracemont and Golden Cove is 2x. Whether it's 2x in real world code is another story. They don't generally care about per unit differences as the usage scenarios where FP really mattered, the gains are substantial, while in client and most other code it's a small proportion, so even if you were to rewrite for AVX, the gains are 5-10%.
I'm going off Raichu's spec 2017? 2007? testing for IPC.
In Libx264 testing by ChipsandCheese, IPC for gracemont was ~75% of GLC. I believe he used a 12900k.
Gracemont's chief architect claims it has greater than skylake level INT IPC as well.
Can't find any more IPC testing of gracemont. From what I have seen, IPC difference for gracemont isn't as bad as 40-50%.
Vector performance is also a lot more commonplace in the server market IIRC.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,281
136
This is what I was able to calculate based on High Resolution Wafer Shot of Sierra Forest Compute die.

About 480 mm^2 and 115 fully printed Dies



 
Reactions: Geddagod
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |