The Official AVX2 Thread

beginner99 · Jun 16, 2012

beginner99 said:
Noob QUestion:

If AVX2 really is such a killer, why hasn't it been implemented like 10 years ago?

ShintaiDK said:
Same reason we didnt have SB/IB 10 years ago.

And that reason is?

The reason for not having an SB 10 years ago is process technology. 4 cores with integrated GPU ain't really great on like 130 nm.

However if AVX2 increases performance 3x or more (without using 3x more die space) one would think it should have been implemented a very long time ago?

bronxzv · Jun 16, 2012

piesquared said:
Where's the official HSA and OpenCl threads?

here is one OpenCL thread: http://forums.anandtech.com/showthread.php?t=2251392

it looks like it's a case of Open CL using mostly the CPU, one more example showing well that we must be very careful when comparing native code on the CPU with Open CL implementations

Magic Carpet · Jun 16, 2012

beginner99 said:
However if AVX2 increases performance 3x or more (without using 3x more die space) one would think it should have been implemented a very long time ago?

GPU competition?

ShintaiDK · Jun 16, 2012

beginner99 said:
And that reason is?

The reason for not having an SB 10 years ago is process technology. 4 cores with integrated GPU ain't really great on like 130 nm.

However if AVX2 increases performance 3x or more (without using 3x more die space) one would think it should have been implemented a very long time ago?

I thought it was because we didnt have the design knowledge at the time.

Maybe you should call all the other companies that makes different products too. They obviously also wasted alot of time on middle designs.

Martimus · Jun 16, 2012

CPUarchitect said:
Only one of these technologies will prevail. There is no room for both since both attempt to cover the need for general purpose throughput computing. History shows that incompatible competing technologies cannot coexist. Think about AMD64 versus IA64: Itanium is practically dead. Think about 3DNow! versus SSE: Bulldozer no longer supports 3DNow!.

So the question now is which is the superior throughput computing technology: homogeneous AVX2+ or heterogeneous GPGPU? And yes, both companies will support both for a while, but they have a different idea of what to focus on. There's a lot at stake for AMD since it's sacrificing CPU performance to make the GPU more powerful, in an attempt to make GPGPU more attractive. Not just that, it's also sacrificing graphics performance. As illustrated by NVIDIA's Fermi and Kepler, graphics and GPGPU require different architectures. HSA leans very much toward GPGPU, which compromises graphics.

Intel doesn't make any sacrifices. It already has a superior CPU architecture and it will be the first to add high throughput performance to it using AVX2. Even when AMD implements AVX2, there will still be a big difference in computing density because of Bulldozer's shared SIMD cluster architecture. There's also no sign of Intel sacrificing graphics performance for the sake of GPGPU. And last but definitely not least, AVX2 is much easier to adopt by developers than GPGPU, and will offer more consistent performance across system configurations.

Easier, yes, but it will never be easy. In fact heterogeneous computing becomes harder when things scale up. So they're fighting an uphill battle. The only way to guarantee that it doesn't suffer from bad latency and bandwidth scaling, is to fully merge the GPU technology into the CPU. And that's what AVX2 already does!

It's no coincidence that Intel's Knights Corner chip, which is pretty much a GPU architecture (minus the graphics components), uses an instruction set that has a very close resemblance to AVX2.

So it's inevitable that things will converge into a single architecture. All general purpose computing will happen on the CPU. The GPU either has to become fully focused on graphics, or the programmable shaders too get processed on the CPU and the GPU decays into some fixed-function units that act as peripheral components which assist the CPU in graphics processing.

AMD desperately wants the CPU and GPU to remain heterogeneous, but in doing so it ironically converges them closer together, making the case for AVX2 and its successors.

I completely disagree. The two are very different and aren't competing at all. They even work together. I just don't understand the AVX2 or Open CL talks, since Open CL is a method for many different processors to use the same code, while AVX2 is just an Intel construct that Open CL can use. Maybe someone can explain why they feel the two things are competing?

denev2004 · Jun 16, 2012

ShintaiDK said:
I thought it was because we didnt have the design knowledge at the time.

Maybe you should call all the other companies that makes different products too. They obviously also wasted alot of time on middle designs.

I guess the ability of designing DLP software could also be an issue..

denev2004 · Jun 16, 2012

Martimus said:
I completely disagree. The two are very different and aren't competing at all. They even work together. I just don't understand the AVX2 or Open CL talks, since Open CL is a method for many different processors to use the same code, while AVX2 is just an Intel construct that Open CL can use. Maybe someone can explain why they feel the two things are competing?

AVX represents a simple vector way by applying it to normal x86 CPU to enhance theoretical performance. OpenCL can use it but mostly OpenCL refer to the way the nonhomogeneous way AMD use.

Martimus · Jun 16, 2012

If it is an either or thing, which it is not, then I would expect Open CL to take greater hold than AVX2. Open CL can be run on millions of current processors and devices while AVX2 isnt even available yet and will only run on future devices. We have seen how such things have fared in the past.

Ofcourse the two arent even competing in the same space so that isn't an issue.

CPUarchitect · Jun 16, 2012

Martimus said:
I completely disagree. The two are very different and aren't competing at all. They even work together. I just don't understand the AVX2 or Open CL talks, since Open CL is a method for many different processors to use the same code, while AVX2 is just an Intel construct that Open CL can use. Maybe someone can explain why they feel the two things are competing?

OpenCL isn't even mentioned in my post. And that's because the discussion is not about AVX2 against OpenCL at all. It's about homogeneous versus heterogeneous throughput computing. That said, AVX2 is homogeneous and OpenCL is aimed at heterogeneous computing.

But OpenCL will likely become a victim of AVX2. Homogeneous computing devices can run heterogeneous code, but heterogeneous ones can't run all homogeneous code. Basically, OpenCL is an API and language specification which imposes necessary limitations so it can run on a GPU. Since homogeneous computing with AVX2 doesn't have the GPU's limitations, it also doesn't require OpenCL.

Of course nothing is preventing developers to continue using OpenCL as an abstraction layer for homogeneous throughput computing. But I don't think that's going to happen. AVX2 can be used to accelerate existing programming languages. So there's no need for developer to take a detour and rewrite code for OpenCL.

Hence you shouldn't think of it as AVX2 versus OpenCL. OpenCL is not the enemy of AVX2 in any way. But it will become redundant once the dominant throughput computing technology turns out to be homogeneous instead of heterogeneous.

CPUarchitect · Jun 16, 2012

Martimus said:
If it is an either or thing, which it is not, then I would expect Open CL to take greater hold than AVX2. Open CL can be run on millions of current processors and devices while AVX2 isnt even available yet and will only run on future devices. We have seen how such things have fared in the past.

The past doesn't support that expectation. For instance AMD's 3DNow! capable CPUs were released more than a year before Intel added support for SSE. But SSE became the dominant SIMD technology, simply because it was superior. AMD ended up also implementing SSE, and with Bulldozer the 3DNow! support was removed entirely.

Likewise we're still in the early days of throughput computing technology. In fact AMD's HSA roadmap won't even be finished before 2014. They need that technology to make GPGPU computing more widely applicable. AVX2 will arrive in 2013, and is more versatile than HSA will ever be.

bronxzv · Jun 16, 2012

deleted

Martimus · Jun 16, 2012

CPUarchitect said:
The past doesn't support that expectation. For instance AMD's 3DNow! capable CPUs were released more than a year before Intel added support for SSE. But SSE became the dominant SIMD technology, simply because it was superior. AMD ended up also implementing SSE, and with Bulldozer the 3DNow! support was removed entirely.

Likewise we're still in the early days of throughput computing technology. In fact AMD's HSA roadmap won't even be finished before 2014. They need that technology to make GPGPU computing more widely applicable. AVX2 will arrive in 2013, and is more versatile than HSA will ever be.

3D Now was a AMD only thing, and only a small percentage of processors supported it. OpenCL is supported by nearly every modern processor and GPU, so a very large percentage of systems support it. In this situation Intel is in the position of weakness like AMD was with 3D Now.

Also MMX was released before 3D Now. Neither of those extensions actually gained any traction though.

Also, AMD is by no means the driving force behind Open CL. I find this AMD vs. Intel debate of Open CL to be hilarious. Both Intel and AMD will support both AVX2 and Open CL. The major difference is that current processors from both camps already support Open CL and current GPUs from both nVidia and AMD support Open CL.

podspi · Jun 16, 2012

CPUarchitect said:
But OpenCL will likely become a victim of AVX2. Homogeneous computing devices can run heterogeneous code, but heterogeneous ones can't run all homogeneous code.

Wouldn't it be the opposite?

Homogeneous computing devices can only run the code designed for them.

For example, as far as I know, an x86-CPU cannot perform, in Windows, hardware accelerated graphics. An APU (heterogeneous device) can.

CPUarchitect · Jun 16, 2012

Martimus said:
3D Now was a AMD only thing, and only a small percentage of processors supported it. OpenCL is supported by nearly every modern processor and GPU, so a very large percentage of systems support it. In this situation Intel is in the position of weakness like AMD was with 3D Now.

Once again you're making this about AVX2 versus OpenCL, which it isn't. But even if it was, "supporting" OpenCL doesn't mean a thing. The GTX 680 supports OpenCL but it loses against a quad-core CPU, even before AVX2. And the average GPU is much weaker.

It's really about whether homogeneous or heterogeneous throughput computing is superior. And for heterogeneous computing to have any chance at all, it needs affordable GPUs to have higher efficiency. This requires support for things like a uniform address space and preemption. These features won't be available until 2014. CPUs with AVX2 support will arrive a year earlier.

Last but not least, Intel has an 80% market share, so they're definitely not in a position of weakness. In the not too distant future, 80% of all new x86 systems will have great homogeneous throughput computing performance. Sooner or later AMD will have to implement AVX2 as well and it will be 100% of all systems. Meanwhile how many systems will have worthy heterogeneous computing support? The desktop Haswell chip will be limited to a GT2, which means the CPU with AVX2 will have more raw computing power than the iGPU. And that's without taking into account that the CPU has far fewer bottlenecks, and is much easier to develop for.

Also MMX was released before 3D Now. Neither of those extensions actually gained any traction though.

MMX was used extensively in multimedia codecs and audio drivers. It was a monumental success.

Martimus · Jun 16, 2012

CPUarchitect said:
Once again you're making this about AVX2 versus OpenCL, which it isn't. But even if it was, "supporting" OpenCL doesn't mean a thing. The GTX 680 supports OpenCL but it loses against a quad-core CPU, even before AVX2. And the average GPU is much weaker.

It's really about whether homogeneous or heterogeneous throughput computing is superior. And for heterogeneous computing to have any chance at all, it needs affordable GPUs to have higher efficiency. This requires support for things like a uniform address space and preemption. These features won't be available until 2014. CPUs with AVX2 support will arrive a year earlier.

Last but not least, Intel has an 80% market share, so they're definitely not in a position of weakness. In the not too distant future, 80% of all new x86 systems will have great homogeneous throughput computing performance. Sooner or later AMD will have to implement AVX2 as well and it will be 100% of all systems. Meanwhile how many systems will have worthy heterogeneous computing support? The desktop Haswell chip will be limited to a GT2, which means the CPU with AVX2 will have more raw computing power than the iGPU. And that's without taking into account that the CPU has far fewer bottlenecks, and is much easier to develop for.

MMX was used extensively in multimedia codecs and audio drivers. It was a monumental success.

I have a computer that can run Open CL right now. I am pretty sure you will too. I wont have one that will run AVX2 for who knows how long. My agency sure won't write code for AVX2 if it means they need to upgrade every single computer in 2013 to make use of it. The fact that practically everything built within the last few years will bet able to run Open CL code and make use of it would make companies far more likely to code for it. 80% of new computers is one thing, but that is still a rather small percentage of total userbase.

CPUarchitect · Jun 16, 2012

podspi said:
Wouldn't it be the opposite?

Homogeneous computing devices can only run the code designed for them.

For example, as far as I know, an x86-CPU cannot perform, in Windows, hardware accelerated graphics. An APU (heterogeneous device) can.

A CPU can execute absolutely anything. Including advanced graphics, e.g. through SwiftShader. It even runs Crysis if you want, and variants of it power WebGL in Google Chrome and Adobe's Stage3D for Flash when your GPU isn't adequate.

AVX2 will massively improve the CPU's ability to perform tasks the GPU used to be better at. In particular the gather support means the CPU will no longer have to read non-sequential memory elements one at a time, but can read 8 of them with a single instruction.

denev2004 · Jun 16, 2012

podspi said:
Wouldn't it be the opposite?

Homogeneous computing devices can only run the code designed for them.

For example, as far as I know, an x86-CPU cannot perform, in Windows, hardware accelerated graphics. An APU (heterogeneous device) can.

I think nether would be true. OpenCL is design to use JIT to make most modern CPU or GPU can run OpenCL program

CPUarchitect · Jun 17, 2012

Martimus said:
I have a computer that can run Open CL right now. I am pretty sure you will too. I wont have one that will run AVX2 for who knows how long. My agency sure won't write code for AVX2 if it means they need to upgrade every single computer in 2013 to make use of it. The fact that practically everything built within the last few years will bet able to run Open CL code and make use of it would make companies far more likely to code for it. 80% of new computers is one thing, but that is still a rather small percentage of total userbase.

Great. Like I said before, OpenCL is not the enemy of AVX2.

But you have to look longer term. If homogeneous computing prevails over heterogeneous computing, then sooner or later there will be no point in using OpenCL. It imposes limitations, and homogeneous throughput computing technology can be used with any programming language.

The "it's not available yet" argument never pans out. Before you know it, competing technologies are both widely available, and the superior one prevails.

bronxzv · Jun 17, 2012

CPUarchitect said:
e.g. through SwiftShader. It even runs Crysis if you want

do you have support for DX10/DX11 now ?

beginner99 · Jun 17, 2012

Martimus said:
My agency sure won't write code for AVX2 if it means they need to upgrade every single computer in 2013 to make use of it.

AFAIK a recompile is enough. No need to change any code or specifically code for it

Arzachel · Jun 17, 2012

CPUarchitect said:
The GTX 680 supports OpenCL but it loses against a quad-core CPU, even before AVX2.

BenchPress said:
Else why is it that a 3000 GFLOPS GPU loses against a 230 GFLOPS CPU? And that's still a CPU without AVX2, and the average system will have a much weaker GPU!

It's a bit hypocritical to knowingly cherry pick to prove a point.

GTX680
I7-3770K

Add the 30% gains from the AMD APP and the GTX680 beats the i7 in two benchmarks out of three. What's even funnier is that the HD7750 scores pretty close to the GTX680 in the first two benchmarks

pelov · Jun 17, 2012

CPUarchitect said:
But you have to look longer term. If homogeneous computing prevails over heterogeneous computing, then sooner or later there will be no point in using OpenCL. It imposes limitations, and homogeneous throughput computing technology can be used with any programming language.

Limitations on hardware support, for instance? That's a limitation. In fact, that's quite a big limitation, wouldn't you say? x86-only proprietary ISA means it's only limited to Intel/AMD chips. Ease of programming/language be damned if your approach only works on 1% of hardware and outside of your target area (target area being mobile).

It's not HSA that's closing the doors here. HSA is meant to allow openCL to prosper and openCL is ubiquitous and has and will always outpace AVX2 and any other x86-derived ISA and nor does it need a Microsoft crutch as it lends itself across OSes.

You seem to have things a bit backwards here.

edit - just to clarify some things that people are misinterpreting with regards to openCL/HSA.

Heterogeneous systems architecture (HSA) is meant to provide a stable and rough guideline for architectures so openCL, a programming language, can thrive.

Now you don't need to apply a HSA approach to your architecture to enjoy openCL programming because openCL is pretty much supported everywhere nowadays, whether ARM to x86, AMD or nVidia, desktops to phones to car entertainment systems. As was already shown, openCL can run on x86 Intel chips but not on its on-die GPU. Thus with openCL you can run code without the GPUs involvement at all or it can be coded in such a manner where the workload is mixed (think HPC or CUDA acceleration if you're more familiar with that).

What you don't get with openCL/HSA that you do with AVX2 is "ease of programming." The issue with this "ease of programming" is that the programming is wrapped around x86 (the other issue is that it's not "ease of programming" but familiarity), a set of ISAs and a subsequently derived architecture that most developers nowadays are avoiding in favor of ARM due to its inherent difficulties. With AVX2 you're only given one option and that's x86. That's it. If you want a wide range of hardware you must rely on only two hardware manufacturers, AMD and Intel, and due to their absence in everything mobile (tablets and phones) developers would lose out on revenue because they would only be targeting x86-based hardware. Furthermore, it's also extremely unlikely we see anything with AVX2 down at the phone/tablet level until 2-3 years down the road and even then you've got to wait for the market share to increase (IF it increases. As that in itself is a big bet) in order for x86-relevancy to come to fruition and AVX2 to take hold. That's a lot of ifs and relying on a currently unfavorable ISA where one manufacturer enjoys a monopoly. So why in the world would developers outside of workstation/server applications bother with AVX2 and x86?

They won't. The answer is as easy as opening up your android/apple app store and realizing just how much work has been put into those and just how far they've gotten in the past 3-4 years compared to the ages its taken x86 to get where it is today. Throw in that most devices are tablets/phones that are sold (and for software people pay for. When's the last time you paid for x86 software outside of Windows?) along with ARM now part of the HSA foundation and an already ubiquitous programming language then you'll realize the odds aren't in favor of a proprietary ISA that's sitting on hardware in the wrong segment of the market.

AVX2 isn't going to take hold outside of anywhere but the desktop and server. No one questions that. x86 derived hardware is essentially the only choice you've got for that segment (although ARM's market share now surpasses AMD's in server which means you've got an option now). The reliance on x86 is already way too embedded for it to change overnight. The issue AVX2 has is if openCL takes off then you'll see even more GPU leveraging in the same segment of the market -- though I suppose we've already seen this in HPC where CUDA dominates and openCL has also stuck its foot in recently. But where x86 dominates in desktop and server the roles are reversed in the tablet/phone segment where it's ARM that dominates with no x86 alternatives (there technically are but they are even less meaningful than the ARM-based alternatives for server and the market share is smaller).

At the end of the day, AVX2 has one huge hindrance that people overlook: it requires Intel/AMD hardware but we don't live in an Intel/AMD-only world anymore and it's not the desktop and server that's sexy but rather ARM and mobile as the booming market.

CPUarchitect · Jun 17, 2012

Arzachel said:
It's a bit hypocritical to knowingly cherry pick to prove a point.

What counts is that it does prove the point. A high-end GPU is losing, in an OpenCL benchmark, against a CPU that doesn't even have AVX2 yet. That's no less than three reasons why GPGPU will have a really hard time going mainstream:

1) The average GPU is much weaker.
2) OpenCL is specifically designed for the GPU, not the CPU.
3) AVX2 doubles the throughput and adds gather support.

In other words, right now these three arguments should make this a cherry picked benchmark strongly in favor of GPGPU, yet some high-end GPUs are losing!

pelov · Jun 17, 2012

Technically that GPU is actually much faster in GPU-accelerated tasks but it's CUDA rather than openCL which still doesn't bode well for AVX2 because it's not just about AVX2 vs. openCL/HSA but rather AVX2 vs. GPGPU. Now in terms of GPGPU battles it's CUDA vs openCL where openCL has more traction outside of HPC.

And you know you're cherry-picking yet still denying it

It's a topic that's been beaten to death but if your argument is that a GTX680 sucks in openCL therefore GPGPU has no future then you're either

A - completely clueless, or

B - cherry-picking hardware and situations and purposely ignoring the bigger picture.

CPUarchitect · Jun 17, 2012

pelov said:
Limitations on hardware support, for instance? That's a limitation. In fact, that's quite a big limitation, wouldn't you say? x86-only proprietary ISA means it's only limited to Intel/AMD chips.

AVX2 won't be the only homogeneous throughput computing technology. ARM's NEON extension already has FMA, and adding gather support should be relatively straightforward. Also, from a performance/Watt perspective it makes more sense to widen the vectors than to add more cores. The end result would be the ARM equivalent of AVX2. Let's call this NEON2. They'd be stupid not to already started working on this.

Also, while there wouldn't be binary compatibility between AVX2 and NEON2, that's not the case for OpenCL either. The compatibility is achieved at the programming language level. And since homogeneous throughput computing supports any programming language, it's far more versatile than OpenCL!

HSA is meant to allow openCL to prosper and openCL is ubiquitous and has and will always outpace AVX2...

No it doesn't. Some high-end GPUs running OpenCL are losing against CPUs, prior to AVX2. You can't ignore this. Mainstream GPUs are much weaker too. And while HSA indeed aims to improve GPGPU efficiency, it will not be widely supported.

The problem with HSA is that it requires sacrifices to graphics performance. NVIDIA has decided to back away from GPGPU in the consumer market, as evident by the Kepler design sacrifices. And Intel isn't going to implement HSA either; it wants their iGPUs to concentrate on graphics, while for general purpose throughput computing they have AVX2. Hence HSA will be limited to a fraction of the market, also decreasing the "cross-platform" value of OpenCL.

Homogeneous throughput computing, spearheaded by AVX2, is far more promising than GPGPU will ever be.

The Official AVX2 Thread

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Member

Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member