Will x86 be here "forever"?

jpiniero · Jul 21, 2013

That's the problem - the premium market for smartphones and tablets is quickly eroding. CPU performance is becoming less important. Take the Chromebooks. They are selling really well, but it's the ARM+SSD models that are selling, not the Intel+HDD ones.

However, I am yet to see ARM competing in high-end desktop or server market. I will believe it when I see it, as all of this hype today has no actual substance, but it is just based on vapourware.

Except the high end desktop and servers isn't large enough to sustain the kinds of costs that Intel has.

cmdrdredd · Jul 21, 2013

Sheep221 said:
This is bad comparison to BD,
while BD was slower than SB, it doesnt mean it is as fast as P4 that is just false.

I don't think you read that correctly. If I may, I read that the A9 is about as fast as a P4. The A15 will be faster but still slower than Bulldozer which in turn is slower than Intel's offerings in terms of IPC.

I don't think it said anything about BD being P4 level of performance.

psyq321 · Jul 21, 2013

jpiniero said:
That's the problem - the premium market for smartphones and tablets is quickly eroding. CPU performance is becoming less important. Take the Chromebooks. They are selling really well, but it's the ARM+SSD models that are selling, not the Intel+HDD ones.

Except the high end desktop and servers isn't large enough to sustain the kinds of costs that Intel has.

http://www.intc.com/financials.cfm

So, right now the client group commands most of the profit - but the growth of the server group is growing healthier (6.1% vs. 1.4%).

While there is a danger that Intel cannot prevent the churn to ARM architecture in the tablet market, I somehow doubt that ARM will become relevant in higher-end deskop and business PC market enough to become a problem for Intel to justify their R&D costs.

Intel is yet to show what it can really do in the ultra-mobile segment. I think that Silvermont Atom is going to be the first architecture where they are really trying. While it is arguable that x86 is not really required for tablets and phones, for ultra-mobile laptops and entry-level business devices I doubt that x86 will be dethroned anytime soon.

TidusZ · Jul 21, 2013

frozentundra123456 said:
Well, some day the sun will go supernova and then collapse into a black hole..., but then again the universe could be an endless cycle that given enough random events will be reborn exactly as it is today so who knows.

My understanding is that the sun wont supernova or become a black hole. In around 4-5 billion years it will expand rapidly (red giant) and overtake mercury/venus and just barely the earth, despite the loosening of their orbits as the sun will be much larger but less massive at that point. However, within 500 million years the sun will be ~10% more luminous which will cause the temperature here to increase to the point that liquid water will no longer exist and all life will perish.

It would be pretty cool if big bang was followed by big crunch over and over again, or maybe there are many universes and only the successful ones continue to exist, like evolution for universes.

Fjodor2001 · Jul 21, 2013

I'd like to add an important question:

If a completely new ISA was designed from scratch today, how much could that increase performance compared to x86 if it was really well done? Are we talking about 10%, 50%, 100% performance increase or more for general code (i.e. not special cases)?

bullzz · Jul 21, 2013

@jpiniero
"That's the problem - the premium market for smartphones and tablets is quickly eroding. CPU performance is becoming less important. Take the Chromebooks. They are selling really well, but it's the ARM+SSD models that are selling, not the Intel+HDD ones"

there is only 1 ARM chromebook, the samsung one. its selling more than intel chrombooks because of 1 reason - design. not for performance or arch reason. with bay trail, that may change
also chromebooks market share is miniscule at this point. which is why we dont see qualcomm or nvidia running to port chrome OS

ph2000 · Jul 21, 2013

Fjodor2001 said:
I'd like to add an important question:

If a completely new ISA was designed from scratch today, how much could that increase performance compared to x86 if it was really well done? Are we talking about 10%, 50%, 100% performance increase or more for general code (i.e. not special cases)?

latest ISA that i know is IA-64 which is only used in Itanium processor

Cerb · Jul 21, 2013

Fjodor2001 said:
I'd like to add an important question:

If a completely new ISA was designed from scratch today, how much could that increase performance compared to x86 if it was really well done? Are we talking about 10%, 50%, 100% performance increase or more for general code (i.e. not special cases)?

Probably closer to to the 10%. Special cases allow ISA to matter (FI, MIPS, PPC, and ARM+NEON w/ vector FP), but for general cases, there's not going to be too much, as long as the ISA is fairly sensible, which both x86 and ARM are. A new ISA could allow a bit better cache behavior in some cases, or plain smaller code (a bigger deal for lower-end parts than high-end, TBF, though), but there's not really anything to fix from x86, at this point, except for special cases. The power-hungry front-end wasn't easy to deal with, I'm sure, but Intel and AMD have both successfully tamed it (Intel better than AMD). Past that, there's really not much going against x86--at least not in any practical sense, now that x86 has 8+ GPRs.

While the resultant implementations tend to be ISA and uarch specific, sometimes including new instructions, registers, etc., there's nothing ISA-determined about better branch prediction, better prefetching, good value prediction, more efficient cache coherency, etc.. It's all looking at a recent history of what's been done, sometimes including yet-to-be-completed work (decoded instructions, and determined values, not yet executed or retired), and speculating, or sometimes perfectly predicting, about what's going to happen next.

TuxDave · Jul 21, 2013

Cerb said:
Probably closer to to the 10%. Special cases allow ISA to matter (FI, MIPS, PPC, and ARM+NEON w/ vector FP), but for general cases, there's not going to be too much, as long as the ISA is fairly sensible, which both x86 and ARM are. A new ISA could allow a bit better cache behavior in some cases, or plain smaller code (a bigger deal for lower-end parts than high-end, TBF, though), but there's not really anything to fix from x86, at this point, except for special cases. The power-hungry front-end wasn't easy to deal with, I'm sure, but Intel and AMD have both successfully tamed it (Intel better than AMD). Past that, there's really not much going against x86--at least not in any practical sense, now that x86 has 8+ GPRs.

While the resultant implementations tend to be ISA and uarch specific, sometimes including new instructions, registers, etc., there's nothing ISA-determined about better branch prediction, better prefetching, good value prediction, more efficient cache coherency, etc.. It's all looking at a recent history of what's been done, sometimes including yet-to-be-completed work (decoded instructions, and determined values, not yet executed or retired), and speculating, or sometimes perfectly predicting, about what's going to happen next.

Even I would say 10% is a stretch. Depends on what the question is. New instructions get added normally to accelerate new algorithms but you can do that in x86 or ARM. If you wanted to go backwards and remove all useless instructions, you're not really going to get much. Leaps in general purpose performance happens outside of ISA.

100% agree with the bolded.

Fjodor2001 · Jul 21, 2013

Cerb said:
Probably closer to to the 10%. Special cases allow ISA to matter (FI, MIPS, PPC, and ARM+NEON w/ vector FP), but for general cases, there's not going to be too much, as long as the ISA is fairly sensible, which both x86 and ARM are. A new ISA could allow a bit better cache behavior in some cases, or plain smaller code (a bigger deal for lower-end parts than high-end, TBF, though), but there's not really anything to fix from x86, at this point, except for special cases. The power-hungry front-end wasn't easy to deal with, I'm sure, but Intel and AMD have both successfully tamed it (Intel better than AMD). Past that, there's really not much going against x86--at least not in any practical sense, now that x86 has 8+ GPRs.

While the resultant implementations tend to be ISA and uarch specific, sometimes including new instructions, registers, etc., there's nothing ISA-determined about better branch prediction, better prefetching, good value prediction, more efficient cache coherency, etc.. It's all looking at a recent history of what's been done, sometimes including yet-to-be-completed work (decoded instructions, and determined values, not yet executed or retired), and speculating, or sometimes perfectly predicting, about what's going to happen next.

Well, if that is the case I guess we'll be stuck with x86 more or less forever, unless something changes. No point taking the penalty of switching ISA if the benefit is a mere 10% performance increase.

But what about power efficiency then? Could a completely new ISA make any gains there? I mean for example ARM is chosen in more or less all really low power devices (smart phones and similar), so I guess its ISA must have an advantage compared to x86 when it comes to power efficiency? Is it e.g. easier to implement a CPU with the ARM ISA in a power efficient way compared to x86 ISA?

TuxDave · Jul 21, 2013

Fjodor2001 said:
Well, if that is the case I guess we'll be stuck with x86 more or less forever, unless something changes. No point taking the penalty of switching ISA if the benefit is a mere 10% performance increase.

But what about power efficiency then? Could a completely new ISA make any gains there? I mean for example ARM is chosen in more or less all really low power devices (smart phones and similar), so I guess its ISA must have an advantage compared to x86 when it comes to power efficiency? Is it e.g. easier to implement a CPU with the ARM ISA in a power efficient way compared to x86 ISA?

There are things that are easier to do because of the ARM ISA. But overall, you get this:

http://research.cs.wisc.edu/vertical/papers/2013/isa-power-struggles-tr.pdf

It appears decades of hardware and compiler research has enabled efﬁcient handling of both RISC and CISC ISAs and both are equally positioned for the coming years of energy- constrained innovation

So I wouldn't say ARM was chosen because its ISA lends itself better to the low power market. I can only speculate that it was chosen because there wasn't low power/cost x86 product in that segment and the performance requirement was low enough that they could just license an ARM chip and run with it. (just my biased view of things )

Cerb · Jul 21, 2013

Fjodor2001 said:
Well, if that is the case I guess we'll be stuck with x86 more or less forever, unless something changes. No point taking the penalty of switching ISA if the benefit is a mere 10% performance increase.

If you run Windows, and use software from a variety of years of publishing, x86 is all you want, either way.

If not that, what's the big deal with switching? It's a pain for unpopular chips, since much software is allowed to be packaged so long as it compiles, but if many others have the same chips (or comparable chips), what's this penalty you're worried about? You're not alone in that, but geez, come on over the FOSS side, and whip yourself up a little desktop or server on an old Mac, or a RPi, or something. It's not that bad. Really.

But what about power efficiency then? Could a completely new ISA make any gains there? I mean for example ARM is chosen in more or less all really low power devices (smart phones and similar), so I guess its ISA must have an advantage compared to x86 when it comes to power efficiency? Is it e.g. easier to implement a CPU with the ARM ISA in a power efficient way compared to x86 ISA?

Yes, but Intel has been willing to throw lots of R&D at the problem of x86's front end, which has historically been usually on at full speed, and low-ILP*. ARM's ISA has several features, most notably the shift+add and conditional instructions, that allowed it offer decent performance w/o cache, and allowed it to use fewer instructions per some unit of work compared to other simple in-order RISCs. RISCs in general, with fixed instruction sizes, make it easier to have wide decoders, too. But, want a wide OOOE CPU, that needs to stay 100+ cycles ahead of your RAM? That all amounts to a lot of nothing, and RISCS need a bit more cache bandwidth and size (L1D for constants and such, L1I for bigger instructions), so minor advantages will even out.

ARM has had advantages in their CPU designs, which cyclically have gone to also being ISA advantages for those kinds of CPUs. But, when you're talking about 1GHz+ 2+ wide OOOE CPUs, that's all crap. x86 was designed for low-memory cost-sensitive devices, and has some benefits and problems from it. ARM was designed for low-power cost-sensitive devices, and has some pros and cons from that. But, neither can your RAM faster.

Power with has virtual memory, or MIPS/SPARC implemented in a way that respects register windows, would have some difficulties. But other than that, what needs to be worked out will be different, more than one ISA being flat superior. And, both x86 and ARM have some legacy from the trappings of expensive xtors and RAM, but both also either did important things pretty well (like virtual memory), or left them open for programmer/compiler interpretation (both allow more than one way to handle memory moving, setting, and copying, FI).

IoW, the big issue with performance is that changing addresses of DRAM arrays is slow, because as they get smaller, while R may get much smaller, L and/or C get relatively larger as it's all packed closer together. More pins for less wide channels to allow more IO gets more expensive per bandwidth (and won't use standard DIMMs), and still only offers minor gains, since you won't know which channel what you need next will be on. The farther ahead of RAM you need to stay, which affected by clock speed and desired IPC, the more cache you'll need, and the better your speculation performance will need to be (also, a bigger cache than your workload actually needs can allow for more aggressive and/or sloppier prefetching). The actual concepts behind those are surprisingly simple, though far from intuitive, but actually implementing in a way that they can advance every clock with a rolling history, switching fairly few gates/cycle...frankly, that's impressive as all hell, to me.

Many of the older advantages and disadvantages, like register and stack allocation, slow memory copying**, arithmetic and address operations having some interlocking, and so on, have largely been either designed away, thanks to research and cheaper xtors, or compiled away, thanks to research and cheaper memory.

* You must decode instruction 1's length, before you can start with instruction 2, and 2 before 3, and so on, where with ARM, you just have to watch for thumb, and otherwise do 32 bits at a time.

** well, this isn't a problem that's gone away, but it's now a software/hardware optimization problem, not a, "these CPUs are really bad at this, because of the ISA," thing.

Ajay · Jul 21, 2013

Well, if quantum computing becomes are reality, a different ISA will be needed at the hardware level (although, ISA may not be the correct word for it). x86 could be supported in some sort of emulation mode, but would be clumsy compared to the new types of algorithms necessary to extract maximum performance out of a quantum computer. QC will initiate, well, a quantum leap in computer science and software engineering (since the target hardware will ultimately be so different).

Then again, allot of effort is being put into making quantum computers that behave like current digital architectures, it's a waste, IMO, but if that happens, meh.

Exophase · Jul 22, 2013

TuxDave said:
There are things that are easier to do because of the ARM ISA. But overall, you get this:

http://research.cs.wisc.edu/vertical/papers/2013/isa-power-struggles-tr.pdf

I love the raw data they've gathered but the conclusion they've made is some really awful science. There are so many variables involved behind the results. For example, GCC 4.4.x is known to generally be well behind in ARM codegen quality than more recent versions, since this is before CodeSourcery's modifications were really integrated. And they used platforms like Pandaboard which are known to have atrocious memory performance (like, substantially worse than other Cortex-A9 SoCs like Exynos 42xx/44xx or even Tegra 2/3 for the matter - I guess that's what you get for using OMAP4430 engineering samples).

The end result of the different platforms ending up with similar perf/W is interesting on its own but you can't just say from that that proves that ISA has zero impact on it.

The truth is it's almost impossible for us to really know. The best test would be if a company like Intel made x86 and ARM (or whatever) cores that were as close to the same design as possible and used all of the same supporting hardware, then used a compiler that was well suited for both.

You could argue that if another ISA offered a tangible advantage then Intel would be using it but it's really not that simple. Nor is AMD apparently ditching Jaguar for Cortex-A57 in their server lineup necessarily that simple.

TuxDave · Jul 22, 2013

Exophase said:
The end result of the different platforms ending up with similar perf/W is interesting on its own but you can't just say from that that proves that ISA has zero impact on it.

I mentioned earlier that ARM has some nice things. A fixed instruction length is nice. But on the other hand they have some tradeoffs regarding decoder efficiency by not supporting several common complex instructions. I think for people who are not privy to hardware development knowledge can take this away from the research. ISA is not a significant factor. I didn't say zero, but it's really not much. Implementation of that ISA is far more important.

Exophase · Jul 22, 2013

TuxDave said:
I mentioned earlier that ARM has some nice things. A fixed instruction length is nice. But on the other hand they have some tradeoffs regarding decoder efficiency by not supporting several common complex instructions. I think for people who are not privy to hardware development knowledge can take this away from the research. ISA is not a significant factor. I didn't say zero, but it's really not much. Implementation of that ISA is far more important.

You may not have said that ISA has zero importance but the study more or less does. Seriously, that is what it says:

Our study suggests that at performance levels inthe range of A8 and higher, RISC/CISC is irrelevant for perfor-
mance, power, and energy.

And that's why I criticized it and not you

Yeah ARM has some good and bad vs x86.. maybe you could do a lot better than it. But if you ask me, if you did a new ISA from scratch today you would be crazy to do something with byte variable lengths supporting up to 15 byte instructions and with instruction size not being easily determinable by the encoding. Having to store a start bit for every icache byte and having to scan those bits and handle a ton of permutations even for two-way decode is a real cost. There are much more sensible solutions for variable length encoding giving you the meat of what sort of instructions x86 benefits from. What tiny amount you lose in code density is totally not worth it.

In the real world there are a ton of market factors that impact ISA prominence beyond would could be the best, and that varies tremendously depending on workload so I doubt you'd ever measure that even if you could freely experiment with any ISA given optimal engineering time. So I don't think we'll ever have a great answer, except it's obvious that something that was a big deal when you spent most of your core on fetch/decode and a ton of stuff was microcoded is going to be a much smaller deal when you spend most of your code on big buffers, prediction/speculation, and reordering logic.

Idontcare · Jul 22, 2013

Exophase said:
The truth is it's almost impossible for us to really know. The best test would be if a company like Intel made x86 and ARM (or whatever) cores that were as close to the same design as possible and used all of the same supporting hardware, then used a compiler that was well suited for both.

You know they do do this, right? They just don't tell the world the results.

You've got a company that could afford a 10yr R&D pipeline so as to take finfets from lab prototype at a university to production worthy at an integration scale of billions per IC.

A company that could afford to take a $1B gamble on building a discrete graphics competitor.

You can be reasonably sure, even if you can't find someone who will tell you they do it personally as their job at Intel, that Intel has a parallel characterization and evaluation pipeline for all things ARM.

Given their R&D budget, and the stakes, wouldn't you find it odd if they weren't?

That doesn't help us outsiders though, we aren't going to be told those results outside of some "read between the lines" cryptic responses. But we might just get to find out from AMD

Eventually they are going to be producing both ARM and x86 chips both with HSA compatibility and that is going to result in a fair amount of cross-benching within AMD that is going to become public if for no other reason than their marketing teams will feel it necessary as part of their job description.

Exophase · Jul 22, 2013

Intel has some very good reasons for using x86 that have nothing to do with innate technical superiority of the ISA. Yes I would find it odd if they are making different versions of their cores supporting different ISAs when x86 is a foregone conclusion.

I especially doubt the compiler part.

Idontcare · Jul 22, 2013

I don't know what any of that has to do with my post, it sounds like you are making an assumption regarding the conclusion of the internal benchmarking results and then using that assumed conclusion as justification for arguing that the tests themselves are not undertaken.

That isn't why competitive analyses and internal benchmarking are done.

They are done because you need to know your enemy when drafting the project goals for your products that are in R&D pipes now while trying to narrow the error bars on the extrapolations of where you think the competition are going to be.

Exophase · Jul 22, 2013

I'm not sure what I'm saying here is totally clear. Yes I'm sure Intel does plenty of analysis and simulation to try to see what kind of impacts different choices here. And I'm sure they're familiar enough with what they're doing to get a good idea of what the maximum possible benefit of a different ISA could be. What I'm saying is that I don't think they have completed a design of some full blown ARM (or other) core that's just as serious of an engineering effort as Silvermont is. But that's what they'd have to do to really get a full idea of what the impact would be. And when I say full idea I mean knowing the difference between 5% and 10%, not knowing the difference between whether or not it's worth doing.

If they were to do a highly competitive ARM core they couldn't just drop a different decoder into Silvermont and call it done. The entire core is tailored for x86. There are lots of things that would be dead weight for an ARM (or other) implementation, and there are parts that would have to be modified to support critical ARM (or other) instructions without dropping into microcode.

Given the enormous costs and resources (their most talented engineers aren't exactly a commodity) for something that I very much doubt they'd bother with even if it did have slightly better efficiency because of a variety of other reasons (never said benchmark cooking, I think you're the one making that assumption here) no, I would not find it surprising if they didn't do this.

Intel has spent a lot of money on risky things but I doubt any of that started with "well, we're almost definitely not going to actually make a product out of this, we're just curious." Not for things that'd cost hundreds of millions.

sontin · Jul 22, 2013

Arkaign said:
How fast is the fastest ARM cpu you know of?

It's as fast as the CPU cores in the next gen consoles. :awe:

And how much faster is a 9590 or 4770 or 3960? Hell, how much faster is an i3 or A10? INSANELY faster. You can argue that ARM is efficient on a per-transistor basis and be quite correct for well matched software, but the fact of the matter is that they are simple, basic, low-performing processors compared to full fledged x86 processors. They simply have entirely different design philosophies.

What is a "full fledged x86 processors"? Atom?! Jaguar?! Pildedriver?! IB?! Haswell?!

ARM only exists today because it snuck into a market that the big guns were all ignoring, and because it was a stripped down product aimed at performing extremely simple workloads. As people begin to expect more and more from their mobile devices, the performance target will increase. Intel x86 products at 10nm and beyond, probably still x86 based, will give ludicrous performance at a power usage that makes everything today look like a coal-fired power plant.

So a future product will be better than a present ARM core? Wow... :$

If the ARM group could match Intel $ for $ in a unified way, and simultaneously keep up with fabrication technology, then I don't think Intel would crack the market like they will. But with those concrete and massive advantages, it's just a matter of time.

What a luck for ARM that nobody is selling standalone CPUs. We talking here about SoCs. So the CPU design is only one part of the product.

Cut to 2016, you'll almost certainly see all of the flagship phones/tablets with Intel inside. Meanwhile, you'll see decent cheap smartphones in the $49-$99 range with ARM chips that would be considered very good today.

Nobody will use Intel in the future because they do not want to be tied to one manufacture. That's the reason why ARM is winning this war: If you don't like your partner go to another one or design an own chip.

NTMBK · Jul 22, 2013

sontin said:
It's as fast as the CPU cores in the next gen consoles. :awe:

No, it isn't.

sontin · Jul 22, 2013

It is. Shield gets the same 3DMark score like the 8W Kabini SoC.
There is no difference in CPU performance between Jaguar and A15. So i guess Jaguar is not a "full fledged x86 processors"?

ShintaiDK · Jul 22, 2013

sontin said:
It is. Shield gets the same 3DMark score like the 8W Kabini SoC.
There is no difference in CPU performance between Jaguar and A15. So i guess Jaguar is not a "full fledged x86 processors"?

3DMark sounds awfully alot like GPU...

sontin · Jul 22, 2013

I know it's unfair because Kabini offers much more GPU performance.

Will x86 be here "forever"?

Lifer

Lifer

Junior Member

Golden Member

Diamond Member

Senior member

Member

Elite Member

Lifer

Diamond Member

Lifer

Elite Member

Lifer

Diamond Member

Lifer

Diamond Member

Elite Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member