Discussion Future ARM Cortex + Neoverse µArchs Discussion

Page 16 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

FlameTail

Diamond Member
Dec 15, 2021
4,094
2,465
106
consumer ARM CPUs (Cortex, Apple Silicon, Oryon) ever get 256 bit vector SIMD? If so, when?
 

soresu

Diamond Member
Dec 19, 2014
3,273
2,549
136
consumer ARM CPUs (Cortex, Apple Silicon, Oryon) ever get 256 bit vector SIMD? If so, when?
Not in a single cycle, but in total ops yes.

X1 has 4x 128 bit NEON units as did Apple's equivalent at the time.

Vector SIMD is also pretty much redundant language I believe.

SIMD is basically identical math operations on multiple numbers/data in 1 instruction.

No idea what Oryon/Phoenix µArch specifics are at this point, probably we will find out at this years HotChips.
 

naukkis

Senior member
Jun 5, 2002
930
808
136
Simply not feasible without destroying clock frequency scaling.

We only need ISA to split register file for such a many in-clock instruction support for making it feasible. Only Risc-V do support such a schemes - at least somehow.
 
Reactions: lightmanek
Jul 27, 2020
20,419
14,087
146
Even worse would be power consumption.
They could use a thermoelectric thingamajig to use the heat from one core to power another core

 

naukkis

Senior member
Jun 5, 2002
930
808
136

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
Even worse would be power consumption. And the fact you'd have to predict 3 or 4 branches per cycle on average...

Just elide short branches to predicates to increase median BB length to 8-12 or higher, and software-pipeline the crap out of loops as a source of cheap low-branch ILP. Foolproof. Worked great last time.

Well, except when it didn't.
 
Last edited:

FlameTail

Diamond Member
Dec 15, 2021
4,094
2,465
106
If going wider is out of the table, then what are ARM CPU makers gonna do with their flagship cores to get significant increases in IPC?

ARM Cortex X4 : 10 wide
Apple A17 P : 9 wide
Qualcomm Oryon : 8 wide (guess)

These ARM CPUs are already much wider than x86 rivals.
 

soresu

Diamond Member
Dec 19, 2014
3,273
2,549
136
If going wider is out of the table, then what are ARM CPU makers gonna do with their flagship cores to get significant increases in IPC?
ARM got a lot out of 4 wide µArch, as have AMD.

It's far too early to talk about what they all need to do in the future given a much wider foundation to build from.

Likewise Apple were pretty stable at 6 wide for years and got good gains for a while.
 

SarahKerrigan

Senior member
Oct 12, 2014
735
2,035
136
If going wider is out of the table, then what are ARM CPU makers gonna do with their flagship cores to get significant increases in IPC?

ARM Cortex X4 : 10 wide
Apple A17 P : 9 wide
Qualcomm Oryon : 8 wide (guess)

These ARM CPUs are already much wider than x86 rivals.

Prefetch
Branch prediction
Cache capacity and performance
Functional unit mix and latencies
Structure sizes
Improved fusion
Multithreading?

You know, nothing much. Clearly, going 130-wide is all that's needed.
 

naukkis

Senior member
Jun 5, 2002
930
808
136
Main bottleneck for going wider is register file. If whole register file is general purpose for all instructions going wider needs more ports to that big fat uniform register file. Solution is simple, split register file so every instruction doesn't have access to whole register file. Doing so mean for hardware that alus and register files became independent from each other and increasing such a separate complexes to execution core won't increase it's complexity pretty much at all. Basically for software execution that means that thread has sub-threads that can have fast access to their parts of registers and bit slower interaction for other parts of register file. Risc-V has given a though to that register file splitting scheme - so far all risc-v hardware is still made for unified register file model but risc-V at least gives a pathway to do split register file designs too.
 

FlameTail

Diamond Member
Dec 15, 2021
4,094
2,465
106
3 more months to go until Cortex X5 launch.

There was a leaked GB6 score of D9400 with X5 scoring only 2700 in ST. That's disappointing and not the kind of performance doesn't sound like one befitting the title of "custom-core killer".

Even if the performance is not gonna catch upto Apple (assuming the leak holds to be true), I'd still call it a W if ARM can match or exceed Apple's efficiency.

Because, while X4 brought a substantial performance improvement, they sold their efficiency to the devil to do so.
 

soresu

Diamond Member
Dec 19, 2014
3,273
2,549
136

Repository containing architecture details of cores.

Cortex X4 is 10 wide executor and 10 wide dispatcher.

The widest core in this list.

Crazy.

Will Blackhawk (Cortex X5) go even wider?
IMHO unlikely.

When we saw A72 -> A73 it was Austin -> Sophia and we saw a 'regression' in core width from 3 wide to 2 wide, but a more efficient core overall, and Sophia got even more still out of 2 wide with A75.

If X5/Blackhawk is truly Sophia's ground up core then I would not be surprised to see a similar change back to 7-9 wide design, or at worst no change from 10 wide.

I would hope also that the A730/Chaberton core is a significant leap in perf/watt over A720 given how meager the improvements have been to their big A cores of late - as it would be a shame to catch Apple on the high end only to lose miserably to them on the low end.
 
Reactions: Tlh97

soresu

Diamond Member
Dec 19, 2014
3,273
2,549
136
They could use a thermoelectric thingamajig to use the heat from one core to power another core
Sadly thermoelectric efficiency is truly terrible for either cooling or power generation - a problem that NASA has been seeking a definitive solution to since likely before any of us were born.

If it were even just as efficient as a modern steam turbine generator system then radioisotope thermoelectric generators (RTGs) for Mars rovers could be at least 3 times lower volume/mass for the same output wattage.
 

naukkis

Senior member
Jun 5, 2002
930
808
136
IMHO unlikely.

When we saw A72 -> A73 it was Austin -> Sophia and we saw a 'regression' in core width from 3 wide to 2 wide, but a more efficient core overall, and Sophia got even more still out of 2 wide with A75.
If I remember correctly they increased decode width to 3 with A75.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |