Question ARM 2023 IP CPU/GPU news - Cortex X4/A720/A520 + Immortalis G720/Mali G720/G620

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
ARM claims X4 has 15% better performance with 40% better power efficiency.

I can't find anything about performance improvements for A720 and A520, just claims of 20% better power efficiency for A720 and 22% better power efficiency for A520.

😐

Well I don't enjoy getting egg on my face, but it looks like that's where we are regardless.
 

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
Last point, all the new CPU cores are ARMv9.2-A ISA, stepping up from the v9.0-A of the previous 2 generations.
 

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
Apparently an area optimised variant of A720 will take up the same die area as A78 while offering 10% more performance at iso process and frequency - should go down well for the budget performance options.

Here's some slides from wikichip for the X4:



 
Reactions: ikjadoon

Abwx

Lifer
Apr 2, 2011
11,143
3,840
136
ARM claims X4 has 15% better performance with 40% better power efficiency.

I can't find anything about performance improvements for A720 and A520, just claims of 20% better power efficiency for A720 and 22% better power efficiency for A520.

😐

Well I don't enjoy getting egg on my face, but it looks like that's where we are regardless.

If the comparison is made at different nodse then 15% better perf is not that impressive, that s what they would get for say a 5-6nm to 4nm shrink at same uarch.

Edit : 10% better perf at isoprocess is doubtfull, likely that it s 10% better perf/watt at isoprocess and and isoperf.
 

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
If the comparison is made at different nodse then 15% better perf is not that impressive, that s what they would get for say a 5-6nm to 4nm shrink at same uarch.
Quoted from the bottom of the X4 announcement relative to the 15% perf bump over X3:

Performance claims are for SPECRate®2017_int_base. Comparing Peak SPECRate®2017_int_base performance for Cortex-X3-based Android flagship device shipping as of March 2023 vs Cortex-X4: 2MB L2, 8MB L3, 3.4GHz, 100ns latency
 

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
There does also seem to be a considerable attention paid to security features in this generation:

Through TCS23, Arm remains committed to evolving platform security through new advanced technologies and techniques to increase security assurance. TCS23 is designed to support the Android Virtualization Framework (AVF), which was introduced with Android 13, as one of its key security features. AVF, which is only supported on ARM64-based devices, provides secure and private execution environments for executing code. This is ideal for advanced use cases that require stronger security and privacy assurance to user data.


For Pointer Authentication (PAC) and Branch Target Identification (BTI), which work together to improve control flow integrity by eliminating almost all ROP and JOP attacks, we managed to reduce the performance cost associated with both security features, so it is negligible for the new Cortex-X4 and Cortex-A720 CPU cores. Moreover, through PAC enhancements, including the new QARMA3 algorithm, the performance impact of PAC and BTI is now reduced to less than one percent for Cortex-A520 CPU cores.
 

Abwx

Lifer
Apr 2, 2011
11,143
3,840
136
Quoted from the bottom of the X4 announcement relative to the 15% perf bump over X3:

I ll look at the details, but a bigger uarch providing 10% better perf at isofrequency will inherently consume at least 10% more power, there s no miracle.

Edit : they are talking of tape out using N3E process, guess that those figures are extracted from a NE3 test chip, they also talk of reduced leakage, this cant be obtained only with a new process compared to the reference :



 
Last edited:
Reactions: Tlh97 and soresu

soresu

Platinum Member
Dec 19, 2014
2,888
2,098
136
I ll look at the details, but a bigger uarch providing 10% better perf at isofrequency will inherently consume at least 10% more power, there s no miracle.
I didn't imply otherwise.

I think you confused my comment about the area optimised A720 with X4.



Comparing Arm Cortex-A720 "area optimized" SPECint_base2006 performance and Cortex-A78. Cortex-A720 using 32KB L1, 128KB L2, 2MB L3 and Cortex-A78 using 32KB L1, 256KB L2, 2MB L3 (iso-process, iso-frequency).
Edit: Though to be sure there are no claims of a power advantage made here, only area parity and perf superiority over A78.
 

Abwx

Lifer
Apr 2, 2011
11,143
3,840
136
I didn't imply otherwise.

I think you confused my comment about the area optimised A720 with X4.

View attachment 81156


Edit: Though to be sure there are no claims of a power advantage made here, only area parity and perf superiority over A78.

Actually they state better perf at same frequency and process for the A720 efficency core, i thought that it was a statement for all cores including the X one.




 

Lodix

Senior member
Jun 24, 2016
340
116
116
@Abwx If you read the article, it will be much better to understand and much better than trying to guess things on your own.

ARM usually makes their numbers/presentations very similar every year.

They mostly compare things at ISO conditions because their architecture is Process agnostic, and each client ( Qualcomm, Samsung, Mediatek ) will use a different configuration.

Other examples are comparing the whole implementation of one year versus the next one. In this case, the 15% performance increase in single core for the X4 is a configuration of 3'4GHz and 2MB L2 cache compared to Mediatek or Qualcomm flagship of 2023. At SPECRate 2017_int_base.

The TSMC 3nm tape out is just an announcement because of their partnership as a demonstration. ARM sells "packages" of their IP already implemented with different configurations in different processes node. But no comparison was made with a 3nm SoC.
 

Attachments

  • Arm Client Tech Days CPU Presentation_Final-22.png
    646.6 KB · Views: 13
  • Arm Client Tech Days CPU Presentation_Final-09.png
    1.7 MB · Views: 15

Lodix

Senior member
Jun 24, 2016
340
116
116
Mediatek and Samsung did. I'm not sure about Qualcomm, but probably.

Enviado desde mi SM-S918B mediante Tapatalk
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Mediatek and Samsung did. I'm not sure about Qualcomm, but probably.

Enviado desde mi SM-S918B mediante Tapatalk
Qualcomm did it too.

Actually...
- X4 seems to be interesting, is the only one who got real performance improvements
- A720 seems to be the new "small" core since only improved efficiency.
- A520 is only having nerfs (one less ALU) and claims better energy comsuption... which seems to be not so good after all. And even more: is only recommended to use just 2.

Seems that the A520 "small" core is having a death sentence.
 
Reactions: Tlh97 and NTMBK

Abwx

Lifer
Apr 2, 2011
11,143
3,840
136
@Abwx If you read the article, it will be much better to understand and much better than trying to guess things on your own.

ARM usually makes their numbers/presentations very similar every year.

They mostly compare things at ISO conditions because their architecture is Process agnostic, and each client ( Qualcomm, Samsung, Mediatek ) will use a different configuration.

Other examples are comparing the whole implementation of one year versus the next one. In this case, the 15% performance increase in single core for the X4 is a configuration of 3'4GHz and 2MB L2 cache compared to Mediatek or Qualcomm flagship of 2023. At SPECRate 2017_int_base.

The TSMC 3nm tape out is just an announcement because of their partnership as a demonstration. ARM sells "packages" of their IP already implemented with different configurations in different processes node. But no comparison was made with a 3nm SoC.

Thanks for the tips, that make sense, but at the same time they state in the first slide that you linked that these projections are pre silicon estimations, so that s not based on actual physical implementations on some test chips using same processes as previous designs.

Indeed the new uarch is undoubtly quite bigger and not sure that it would make sense to use previous processes, if anything was using say a 5nm node it would be logical to implement the new designs on 3nm, hence the 3nm tape out to get functional SDKs for future products.

IIRC ARM use Synopsis and Cadence design flows for their uarches, wich should cover TSMCs 3nm, as stated, and Samsung s equivalent node wich is mainly for in house designs but was also used by some other SoCs designers.
 

ikjadoon

Member
Sep 4, 2006
126
194
126
Not sure why so many are down on the A520. Arm released its iso-node perf/W graphs.

These cores' only goal is ultra-low-power and the tiniest space possible. They need to get licensed 1000s of times in $20 - $40 devices. I tend to think of them as "always-on co-processors for workloads without dedicated silicon".



The headline iso-node #s have also been released for the A720.

Iso-node
A520: +8% perf at same power; -22% power at same perf (vs A510)
A720: +4.5% perf at same power; -20% power at same perf (vs A715)

These are micro-architectural + cache efficiency improvements and not from TSMC N3E. To me, these are significant efficiency improvements in a single generation.
 

hemedans

Senior member
Jan 31, 2015
207
102
116
Not sure why so many are down on the A520. Arm released its iso-node perf/W graphs.

These cores' only goal is ultra-low-power and the tiniest space possible. They need to get licensed 1000s of times in $20 - $40 devices. I tend to think of them as "always-on co-processors for workloads without dedicated silicon".



The headline iso-node #s have also been released for the A720.

Iso-node
A520: +8% perf at same power; -22% power at same perf (vs A510)
A720: +4.5% perf at same power; -20% power at same perf (vs A715)

These are micro-architectural + cache efficiency improvements and not from TSMC N3E. To me, these are significant efficiency improvements in a single generation.
A510 was huge downgrade compare to A55,

So A520 being better than A510 means nothing, it should be more efficient than A55 so that Soc makers would use it.
 

NTMBK

Lifer
Nov 14, 2011
10,264
5,117
136
Not sure why so many are down on the A520. Arm released its iso-node perf/W graphs.

These cores' only goal is ultra-low-power and the tiniest space possible. They need to get licensed 1000s of times in $20 - $40 devices. I tend to think of them as "always-on co-processors for workloads without dedicated silicon".



The headline iso-node #s have also been released for the A720.

Iso-node
A520: +8% perf at same power; -22% power at same perf (vs A510)
A720: +4.5% perf at same power; -20% power at same perf (vs A715)

These are micro-architectural + cache efficiency improvements and not from TSMC N3E. To me, these are significant efficiency improvements in a single generation.
Because we fully expect "8 core" Android phones using 8 A520 cores and nothing else. It's just too damn slow.
 
Reactions: Tlh97 and hemedans

ikjadoon

Member
Sep 4, 2006
126
194
126
A510 was huge downgrade compare to A55,

So A520 being better than A510 means nothing, it should be more efficient than A55 so that Soc makers would use it.

From the data Arm shared, the A510 isn't less efficient than the A55. At the same perf, A510 uses -20% less power than the A55. It extends perf at higher clocks & worse power, but clocks would seemingly be an OEM choice (and has some power benefits, according to Andrei & Arm vis a vis stopping workloads from jumping to a higher voltage plan on the middle-core cluster).



I've seen Geekerwan's data, but he's showing entire platform power, which seemingly adds other power draw from Qualcomm/MediaTek or even Arm, but unrelated to the A510 compute cores.

So it's weird why Arm shows an improvement in perf / W, but others seemingly (though not conclusively) show a regression. Too high clocks? Worse cache?

For example, from Geekerwan & AnandTech, compare the dark blue (Qualcomm 8G1's A510 @ 1.8 GHz w/ ?? KB L2$ on Samsung 4LPE) vs the light blue (Qualcomm 888's 4x A55 @ 1.8 GHz w/ 128KB L2$ on Samsung 5LPE). Unsure why Qualcomm refused to disclose the L2 cache on the A510, but they did implement the merged cores. Eerily, the only L2 cache option below 128KB for the A510 is "0KB" L2 cache, but...surely Qualcomm isn't that insane.

A510 L2 cache arrangements: Optional,128KB, 192KB, 256KB, 384KB, 512KB

The platform power draw on the A510 on the 8G1 is noticeably higher power draw for the same perf vs the A55 on the 888.

However, if Geekerwan's methodology lines up and Qualcomm didn't go postal on the A510 L2 cache, I'd agree there was a regression.


//

Because we fully expect "8 core" Android phones using 8 A520 cores and nothing else. It's just too damn slow.

Right, I sympathize there. And perhaps only 4x merged cores, at that.

But, these A510 cores aren't really designed to solely power an entire modern smartphone: that seems like OEM's penny pinching more than Arm's mistake, as there are A700-class cores that are a lot closer in perf to what Apple delivers in its cores.
 
Reactions: Tlh97 and hemedans

hemedans

Senior member
Jan 31, 2015
207
102
116
From the data Arm shared, the A510 isn't less efficient than the A55. At the same perf, A510 uses -20% less power than the A55. It extends perf at higher clocks & worse power, but clocks would seemingly be an OEM choice (and has some power benefits, according to Andrei & Arm vis a vis stopping workloads from jumping to a higher voltage plan on the middle-core cluster).



I've seen Geekerwan's data, but he's showing entire platform power, which seemingly adds other power draw from Qualcomm/MediaTek or even Arm, but unrelated to the A510 compute cores.


Talking about Geekerwan, Dr Ian also uploaded video about sd 8 gen 2 and improved Andrei Graph from Anandtech reviews.


And this is Andrei graph


In Dr ian video not much difference between A55 and A510, but someone told me in Andrei Graph we should look at watts instead of joules.
 
Reactions: ikjadoon

BorisTheBlade82

Senior member
May 1, 2020
667
1,022
136
In Dr ian video not much difference between A55 and A510, but someone told me in Andrei Graph we should look at watts instead of joules.
Then these people don't know what they are talking about. Using joules consumed and performance for the axes is absolutely the right thing to do and the watts bubbles are of much less relevance.
In a fixed workload like SPEC joules give you a simple and undisputed answer about the energy efficiency.

BTW
I did something similar with CBR23 and you can find it in this very forum: http://www.portvapes.co.uk/?id=Latest-exam-1Z0-876-Dumps&exid=thread...-efficiency-of-x86-cpu-architectures.2597905/
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |