Discussion Intel current and future Lakes & Rapids thread

Page 79 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Yeah we needed some silly car analogy. I guess you're not doing software dev or you'd not say something like that .

What we really did not need was Your out of place AVX analogy and attempts to save face. Intel does the right thing here, REP MOV is going to be faster for everyone and everywhere already using it in existing code, without flags. Period.
What You don't seem to get is where and what "checks" will happen for those current designed libraries once/if they get updated.

Hint: Compare the following:

1) Current library implementations that do dozens of both hard to predict size checks and easier to predict feature checks with massive instruction cache footprint and constant need for updates to libraries once CPU with different behavior and extended instruction sets come.
2) Single always predicted feature check to see if REP MOV is to be used always and instruction cache footprint of probably one cache line ? One that will continue to work optimally as long as Intel (and hopefully AMD) maintain REP MOV perf.

We don't know about future performance of REP MOV, it could still be unusable, but some people will find ways to roast Intel even when they do something that looks right at least on paper.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,828
3,659
136
8% faster in ST, 5% faster in MT, with the Ice Lake being clocked 100Mhz (3.8%) lower, during the run.

It certainly is a bigger jump than Haswell to Skylake. Not quite a Sandy Bridge moment though.
The differing Linux kernel versions might be a source of inconsistency in this comparison, but it is the closest we have right now.
 

Nothingness

Platinum Member
Jul 3, 2013
2,494
874
136
What we really did not need was Your out of place AVX analogy and attempts to save face. Intel does the right thing here, REP MOV is going to be faster for everyone and everywhere already using it in existing code, without flags. Period.
What You don't seem to get is where and what "checks" will happen for those current designed libraries once/if they get updated.

Hint: Compare the following:

1) Current library implementations that do dozens of both hard to predict size checks and easier to predict feature checks with massive instruction cache footprint and constant need for updates to libraries once CPU with different behavior and extended instruction sets come.
2) Single always predicted feature check to see if REP MOV is to be used always and instruction cache footprint of probably one cache line ? One that will continue to work optimally as long as Intel (and hopefully AMD) maintain REP MOV perf.

We don't know about future performance of REP MOV, it could still be unusable, but some people will find ways to roast Intel even when they do something that looks right at least on paper.
I'm not disputing anything you write here (except the personal attack I will just ignore), I was just disputing what you previously wrote: that REP MOV will replace existing implementations of memcpy. That's not going to happen and if you think this will happen I have a few things to sell you.

And again, I'm an Intel fan, I'm just not blind. REP MOV is a very good thing if it is available on all upcoming CPU. ISA fragmentation is a pain.
 

DrMrLordX

Lifer
Apr 27, 2000
21,709
10,983
136
Unlike AVX which is and (to a certain extent) always will be a "niche" ISA extension, I would think that rep mov would find its way into the entire Intel lineup. There's practically no reason for them to do otherwise.
 

Nothingness

Platinum Member
Jul 3, 2013
2,494
874
136
Unlike AVX which is and (to a certain extent) always will be a "niche" ISA extension, I would think that rep mov would find its way into the entire Intel lineup. There's practically no reason for them to do otherwise.
I agree. Especially given that it's a very useful improvement
 

name99

Senior member
Sep 11, 2010
410
310
136
8% faster in ST, 5% faster in MT, with the Ice Lake being clocked 100Mhz (3.8%) lower, during the run.

It certainly is a bigger jump than Haswell to Skylake. Not quite a Sandy Bridge moment though.
What I see is concentrated specific wins from AES and AVX-512 (Blur, SFFT, SGEMM, maybe Face detection?). And small overall improvements from faster REP MOV, the larger caches, and better memory (maybe better memory controller, but more likely just connected to higher speed DRAM?).

What I DON'T SEE is anything suggesting a through architectural restructuring. This is significant because, many years ago, when Skylake first appeared and 10nm still seemed in the near future, it was suggested by many that, sure, Skylake was kinda a dud and Apple was making disturbing progress compared to Intel but, just wait, Ice Lake was going to be tres awesome.
I suggested that was extremely unlikely, based on pure name-erology --- the name name Ice Lake suggests we're looking at a simple modification of the underlying Skylake architecture (a Sandy Bridge to Ivy Bridge, or Haswell to Broadwell). And lo and behold, after a dozen intervening Coffee Lakes and Kaby Lakes and Whiskey Lakes and Cannon Lakes, it turns out that, yes, Ice Lake IS just a minor modification of Sky Lake.

Don't get me wrong. Clearly there is occasional value in AVX and more AES capacity; and the memory improvements, while small, are nice. But that's not the point. The point is that the only way Intel makes a BIG improvement in performance (the sort of thing Apple achieves every year) is by a through redesign of their micro-architecture. And for whatever reason, they seem incapable of, or at least unwilling to, deliver that. Ice Lake has been pending for three years now; by the time it ships it will have been pending for five years. And in all that time, no-one at Intel thought maybe let's just scrap it and move on to the design after that?

Or was there NO design after that...?

THIS really is the issue, guys. The Intel plan, insofar as we can determine it from naming, was something like
- Skylake (new architecture), on 14nm
- Ice Lake (tweaks and improvements), on 10nm
- Tiger Lake (more tweaks and improvements) on 10nm
- Sapphire Rapids (presumably FINALLY a new micro-architecture) on 10nm+

OK, so 10nm gets delayed. Sure, we can argue about who's to blame and why, but it happened. What's interesting is the response, namely a series of essentially extremely minor tweaks to Skylake. The obvious questions that arise are
- why not back port Ice Lake? The claim that apologists have told me is that Ice Lake was so new and so tied to 10nm (used many more transistors?) that it wouldn't really work on 14nm. That appears to be nonsense, given that there's nothing obviously in there that desperately requires 10nm.

- by the time 10nm is finally ready, Intel has had plenty of time to not only perfect the Ice Lake design but also the Tiger Lake design and even the Sapphire Rapids design. (These things have all been delayed by three years or more.)
So ask yourself: IF Intel has a slightly better design (Tiger Lake) why are they going to waste a year first shipping Ice Lake? And why are not just skip both of them (god knows we've had enough Lake variants) and go straight to Sapphire Rapids on 10nm?

The issue is not that "Intel wants to maximize profits". Every company wants that. The issue is "is Intel, along the way, creating a better (richer, more powerful, more interesting, more varied) compute eco-system?" Because that's not what I'm seeing. By the time Intel finally gets round to shipping Sapphire Rapids in maybe 2022, Apple will have gone through, what, five more CPU improvements, the least of which will probably boost single-threaded performance by 15%. ARM and Samsung will have gone though at least three, maybe five improvements, probably at the same rate as Apple so that they're always lagging at about 2/3rds or so of Apple. AMD, who knows, but they seem a lot more innovative these days than Intel.

By the time Sapphire Rapids actually ships, the ARM Macs are going to be out, and running at god knows, 50% faster? 100% faster, than Intel? ARM servers will probably have moved from cute to serious machines, at 2x the core density of Intel and each core 30% faster. AMD may well be at Intel parity in single-threaded performance --- but still with twice as many cores for the same price.

The next five years are going to be very interesting for every other company. But for Intel? God knows what they are thinking, but they seem to be locked in an assumption that, no matter how much they delay, and no matter how slowly they roll out true innovation, they'll always have the same customers willing to pay the same prices. Let's see if that's still true in 2022...
 
Last edited:

jpiniero

Lifer
Oct 1, 2010
14,687
5,317
136
- why not back port Ice Lake? The claim that apologists have told me is that Ice Lake was so new and so tied to 10nm (used many more transistors?) that it wouldn't really work on 14nm. That appears to be nonsense, given that there's nothing obviously in there that desperately requires 10nm.

The time and engineering effort required to backport is such that you would only do it if you were convinced (the Real) 10 nm is unfixable.

Sapphire Rapids btw is a server only core. Client is getting more Lakes; Alder and Meteor after Tiger.
 

name99

Senior member
Sep 11, 2010
410
310
136
The time and engineering effort required to backport is such that you would only do it if you were convinced (the Real) 10 nm is unfixable.

How do you know? Apple shipped the same core on two different processes without difficulty, and they executed that on a dime, as soon as it became clear that capacity at SS was constrained. ARM cores can be fabbed everywhere.

I just don’t buy this claim. What I see is a constant attempt to make excuses for Intel rather than facing up to the fact of just how badly they have executed; and what the likely consequence are.
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,152
2,164
136
8% faster in ST, 5% faster in MT, with the Ice Lake being clocked 100Mhz (3.8%) lower, during the run.

It certainly is a bigger jump than Haswell to Skylake. Not quite a Sandy Bridge moment though.


An outlier is most likely not the best comparison because there is only one Icelake entry, so we can't use an outlier from Icelake.
 
Reactions: Ajay

jpiniero

Lifer
Oct 1, 2010
14,687
5,317
136
How do you know? Apple shipped the same core on two different processes without difficulty

It's not that they couldn't have done it; it's that they would have to commit to doing it and there's a long lead time. And if you are being told they can fix 10 nm, would you want to commit the resources and money?
 

jpiniero

Lifer
Oct 1, 2010
14,687
5,317
136
Oh and I do think that the Icelake that we are going to get is essentially backported to 14 nm, but it uses a special low power-high density version that they are calling 10 nm that they should call 12 nm but won't because it's Intel. But there's a reason it's not coming out until the end of next year, because that's how long it takes.
 

name99

Senior member
Sep 11, 2010
410
310
136
It's not that they couldn't have done it; it's that they would have to commit to doing it and there's a long lead time. And if you are being told they can fix 10 nm, would you want to commit the resources and money?

So it takes Apple two years to go from new instructions (ARMv8.3 announced late 2016) to two shipping cores (vortex and tempest, maybe also even chinook) that support them. And maybe six months, maybe less, to cross port from SS to TSMC (which they’d not used before).
But Intel can’t backport an already existing design in two years?
Does it matter EXACTLY where the problem is? Lack of will, chronic inability to make a decision, terrible design infrastructure... If you operate at one quarter the speed of your opponent, how do you think that plays out long term?
 
Reactions: dacostafilipe
Mar 10, 2006
11,715
2,012
126
Sapphire Rapids btw is a server only core. Client is getting more Lakes; Alder and Meteor after Tiger.

"Rapids" and "Lakes" are SoC names, not core names. At the SoC level, server and client have been diverged for a while.
 
Last edited:

Nothingness

Platinum Member
Jul 3, 2013
2,494
874
136
Here is the highest 7130U Linux score compared against this ICL-U sample:
https://browser.geekbench.com/v4/cpu/compare/9473563?baseline=10445533
IMHO you should not compare against the highest score which might be an outlier, but against a middle of the pack one. This for example: https://browser.geekbench.com/v4/cpu/compare/8410277?baseline=10445533

As already noted the 7130u is handicapped by lower memory speed and lack of AVX-512, used in FFT and GEMM. Looking at the score I wonder if Gaussian Blur also uses AVX-512.

Taking frequency into account the integer score is ~12% higher and the FP score is ~15% higher. Not as great as I had hoped, but let's wait for more (reliable) results from non ES chips.

Raw data below. Sorry I don't know how to properly format it

Code:
7130u IceLake 7130/GHz IceLake/Ghz Ratio
Single-Core Score 3501 4151 1296.67 1596.54 1.23
AES 3002 4501 1111.85 1731.15 1.56
LZMA 2998 3006 1110.37 1156.15 1.04
JPEG 3590 3754 1329.63 1443.85 1.09
Canny 3610 3906 1337.04 1502.31 1.12
Lua 3900 4453 1444.44 1712.69 1.19
Dijkstra 4252 4388 1574.81 1687.69 1.07
SQLite 3845 4081 1424.07 1569.62 1.10
HTML5 Parse 3781 4231 1400.37 1627.31 1.16
HTML5 DOM 3467 3248 1284.07 1249.23 0.97
Histogram Equalization 2948 3386 1091.85 1302.31 1.19
PDF Rendering 4030 4786 1492.59 1840.77 1.23
LLVM 6490 7021 2403.70 2700.38 1.12
Camera 3109 4025 1151.48 1548.08 1.34
SGEMM 3496 3757 1294.81 1445.00 1.12
SFFT 3875 5174 1435.19 1990.00 1.39
N-Body Physics 3417 3304 1265.56 1270.77 1.00
Ray Tracing 2958 2760 1095.56 1061.54 0.97
Rigid Body Physics 3469 3644 1284.81 1401.54 1.09
HDR 4366 4517 1617.04 1737.31 1.07
Gaussian Blur 3619 4901 1340.37 1885.00 1.41
Speech Recognition 3662 3479 1356.30 1338.08 0.99
Face Detection 3057 3661 1132.22 1408.08 1.24
Memory Copy 3110 5520 1151.85 2123.08 1.84
Memory Latency 4112 4546 1522.96 1748.46 1.15
Memory Bandwidth 2173 4028 804.81 1549.23 1.92
INT 3817 4099 1413.61 1576.72 1.12
FP 3481 3859 1289.25 1484.37 1.15
 
Reactions: Arachnotronic

Ajay

Lifer
Jan 8, 2001
15,624
7,951
136
So it takes Apple two years to go from new instructions (ARMv8.3 announced late 2016) to two shipping cores (vortex and tempest, maybe also even chinook) that support them. And maybe six months, maybe less, to cross port from SS to TSMC (which they’d not used before).
But Intel can’t backport an already existing design in two years?
Does it matter EXACTLY where the problem is? Lack of will, chronic inability to make a decision, terrible design infrastructure... If you operate at one quarter the speed of your opponent, how do you think that plays out long term?
Probably why Jim Keller was brought in. One of his major contributions at AMD was improving design and development process flows. Intel is so huge that I’m sure that decisions get mired in a bureaucratic morass. At AMD, he had the advantage of working with a lean design team, Intel will be a bigger challenge.
 
Reactions: Vattila

Ajay

Lifer
Jan 8, 2001
15,624
7,951
136
Oh and I do think that the Icelake that we are going to get is essentially backported to 14 nm, but it uses a special low power-high density version that they are calling 10 nm that they should call 12 nm but won't because it's Intel. But there's a reason it's not coming out until the end of next year, because that's how long it takes.
We are a year out from Icelake's release. I’m inclined to think that clock speed will go up quite a bit from the 2.6 GHz in this engineering sample. Also, as this is a logic heavy CPU, I don’t think it will be high density. Anything else would be a massive failure for Intel and an utterly pointless product.
 

jpiniero

Lifer
Oct 1, 2010
14,687
5,317
136
We are a year out from Icelake's release. I’m inclined to think that clock speed will go up quite a bit from the 2.6 GHz in this engineering sample. Also, as this is a logic heavy CPU, I don’t think it will be high density. Anything else would be a massive failure for Intel and an utterly pointless product.

It's not going to clock anywhere near 14++ does, but it doesn't have to for mobile, especially with this decent IPC gain.
 

beginner99

Diamond Member
Jun 2, 2009
5,211
1,582
136
- by the time 10nm is finally ready, Intel has had plenty of time to not only perfect the Ice Lake design but also the Tiger Lake design and even the Sapphire Rapids design. (These things have all been delayed by three years or more.)
So ask yourself: IF Intel has a slightly better design (Tiger Lake) why are they going to waste a year first shipping Ice Lake? And why are not just skip both of them (god knows we've had enough Lake variants) and go straight to Sapphire Rapids on 10nm?

Probably because the 10nm they will release will not be the 10nm designed for so the design will need to be changed/updated as well. And who knows when they finalized their 10 nm that will come end of next year.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,828
3,659
136
IMHO you should not compare against the highest score which might be an outlier, but against a middle of the pack one. This for example: https://browser.geekbench.com/v4/cpu/compare/8410277?baseline=10445533

As already noted the 7130u is handicapped by lower memory speed and lack of AVX-512, used in FFT and GEMM. Looking at the score I wonder if Gaussian Blur also uses AVX-512.

Taking frequency into account the integer score is ~12% higher and the FP score is ~15% higher. Not as great as I had hoped, but let's wait for more (reliable) results from non ES chips.

Raw data below. Sorry I don't know how to properly format it

Code:
7130u IceLake 7130/GHz IceLake/Ghz Ratio
Single-Core Score 3501 4151 1296.67 1596.54 1.23
AES 3002 4501 1111.85 1731.15 1.56
LZMA 2998 3006 1110.37 1156.15 1.04
JPEG 3590 3754 1329.63 1443.85 1.09
Canny 3610 3906 1337.04 1502.31 1.12
Lua 3900 4453 1444.44 1712.69 1.19
Dijkstra 4252 4388 1574.81 1687.69 1.07
SQLite 3845 4081 1424.07 1569.62 1.10
HTML5 Parse 3781 4231 1400.37 1627.31 1.16
HTML5 DOM 3467 3248 1284.07 1249.23 0.97
Histogram Equalization 2948 3386 1091.85 1302.31 1.19
PDF Rendering 4030 4786 1492.59 1840.77 1.23
LLVM 6490 7021 2403.70 2700.38 1.12
Camera 3109 4025 1151.48 1548.08 1.34
SGEMM 3496 3757 1294.81 1445.00 1.12
SFFT 3875 5174 1435.19 1990.00 1.39
N-Body Physics 3417 3304 1265.56 1270.77 1.00
Ray Tracing 2958 2760 1095.56 1061.54 0.97
Rigid Body Physics 3469 3644 1284.81 1401.54 1.09
HDR 4366 4517 1617.04 1737.31 1.07
Gaussian Blur 3619 4901 1340.37 1885.00 1.41
Speech Recognition 3662 3479 1356.30 1338.08 0.99
Face Detection 3057 3661 1132.22 1408.08 1.24
Memory Copy 3110 5520 1151.85 2123.08 1.84
Memory Latency 4112 4546 1522.96 1748.46 1.15
Memory Bandwidth 2173 4028 804.81 1549.23 1.92
INT 3817 4099 1413.61 1576.72 1.12
FP 3481 3859 1289.25 1484.37 1.15
I don't think it's fair to compare scores which do not show the expected 1:2 ratio between ST and MT. These are laptop chips so there is little scope in the clock speeds being out of spec, and whenever I run geekbench on a laptop I end up on the higher side of the results, which to me shows that the higher scores are usually the cleaner runs, hence more representative.
 

Nothingness

Platinum Member
Jul 3, 2013
2,494
874
136
I don't think it's fair to compare scores which do not show the expected 1:2 ratio between ST and MT.
That result is odd but only the MT result looks odd.

These are laptop chips so there is little scope in the clock speeds being out of spec, and whenever I run geekbench on a laptop I end up on the higher side of the results, which to me shows that the higher scores are usually the cleaner runs, hence more representative.
In fact the scores are more or less clustered by machine for this i7130u CPU. I guess we are seeing the effect of either different memory sticks of BIOS tweaks made by manufacturer.

But your point is interesting. I measured my CPU and indeed I get better results than others. I have changed the cpufreq policy, which basically forces turbo on all cores by default. This might have a non negligible impact on Geekbench which has short tests.

So as I wrote, let's wait for final chips and more results before drawing a conclusion
 
Reactions: Arachnotronic
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |