The Official AVX2 Thread

Munky · Jun 15, 2012

It's funny to watch people preach AVX2 as some sort of holy grail. Like Intel preached MMX about 15 years ago, that was supposed to be the holy grail back then. Or SSE 1,2 and 3. Or Larabee. That worked out real well, didn't it.

GPGPU has already proven itself in high performance applications, especially those that lend themselves to parallel computing. The move the HSA will only solidify its adoption by developers.

Munky · Jun 15, 2012

bronxzv said:
in modern engines the trend is clearly toward more complex hybrid renderers with not only triangles but PBR (point based representation) and/or voxels for light source rays / GI effects, IBR (image based rendering) to capitalize on temporal coherence for far details + a lot more tricks, the new frontier is programmer's productivity, not raw FLOPS or "triangles per second" anymore

Modern engines are all about pixel shaders. Game engines have always been about approximation of the real world, favoring speed over accuracy. No one cares if an engine realistically models the real world, as long as it looks real enough and runs fast enough. I already said the same thing back when Intel was promising to turn the world upside down with Larrabee and real time ray tracing, and I will say the same thing now.

podspi · Jun 15, 2012

Munky said:
It's funny to watch people preach AVX2 as some sort of holy grail. Like Intel preached MMX about 15 years ago, that was supposed to be the holy grail back then. Or SSE 1,2 and 3. Or Larabee. That worked out real well, didn't it.

GPGPU has already proven itself in high performance applications, especially those that lend themselves to parallel computing. The move the HSA will only solidify its adoption by developers.

To be fair, MMX, SSE, etc WERE pretty useful for some specific tasks. I'm sure AVX2 will be very useful as well. The only point of my post was that AVX2 and HSA are NOT competing technologies... AVX2 is a set of new instructions (that will likely be adopted by all x86 vendors eventually), while HSA tries to tie together multiple architectures. Again, people are focusing on APUs, but this also includes hardware off die. For example, if you have some frankenchip with x86 cores, ARM cores, and GPU cores, what is the sanest way to program for such a thing?

And before people say that is stupid, then why are there large, respected companies that aren't AMD (who atm isn't really that large or respected) signing up as well? I won't pretend I'm some programming guru or hardware designer, I'm not. Most of my programming is done in various scripting languages and I'm doing lots of data manipulation and analysis. But I am going to assume that these people see a future in heterogeneous computing.

While Intel is certainly following everybody into heterogeneous computing, I can fully understand why they are not interested in it. They are very close to an effective monopoly for x86, and they like it like that. A system that allows for different architectures (not just x86 and GPU, but also ARM, MIPs, whatever) isn't in their best interest (hence x86 GPUs and HPC boards). But that has nothing to do with AVX2, at least with regards to AMD, because AMD will have that too, because they make x86 CPUs. And yes, I suppose you could make the argument they are competitors in that Intel wants to kill off all competing architectures, but I do not think they will be successful.

ShintaiDK · Jun 15, 2012

Munky said:
It's funny to watch people preach AVX2 as some sort of holy grail. Like Intel preached MMX about 15 years ago, that was supposed to be the holy grail back then. Or SSE 1,2 and 3. Or Larabee. That worked out real well, didn't it.

GPGPU has already proven itself in high performance applications, especially those that lend themselves to parallel computing. The move the HSA will only solidify its adoption by developers.

SSE2 is now mandatory in Win64 and have replaced x87. Tried AVX with for example Linpack?

Kinda silly to read comments like yours.

And HSA aint even completed. Its still on the drawing board and it might not even materialize.

Munky · Jun 15, 2012

ShintaiDK said:
SSE2 is now mandatory in Win64 and have replaced x87. Tried AVX with for example Linpack?

Kinda silly to read comments like yours.

And HSA aint even completed. Its still on the drawing board and it might not even materialize.

Link to your silly assertion? It just so happens that all x86 64-bit processors support SSE2, where's the link that says SSE2 is explicitly required?

ShintaiDK · Jun 15, 2012

Munky said:
Link to your silly assertion? It just so happens that all x86 64-bit processors support SSE2, where's the link that says SSE2 is explicitly required?

http://msdn.microsoft.com/en-us/library/windows/hardware/ff545910(v=vs.85).aspx

You could also simplify it with that MMX, 3Dnow and x87 aint supported in long mode arcording to AMDs AMD64 specs.

Olikan · Jun 15, 2012

SocketF said:
Yes, Sandra only runs on the CPU-Cores and didnt utilize the GPU-memory controller.
Just compare Sandra scores of the FX4xxx series to the FX8xxx series, the memory bandwidth doubles, however both have the very same memory controller type.
Conclusion: The memory controller is one of the best parts, no problems there ;-)

IIRC, Trinity have the same memory controler for the GPU and CPU...
unlike Llano...

Munky · Jun 15, 2012

ShintaiDK said:
http://msdn.microsoft.com/en-us/library/windows/hardware/ff545910(v=vs.85).aspx

That's for drivers. Regular applications with x87 code run just fine on Windows 7 64-bit.

AtenRa · Jun 15, 2012

Olikan said:
IIRC, Trinity have the same memory controler for the GPU and CPU...
unlike Llano...

Llano has a single IMC for both the CPU and iGPU cores, just like Trinity.

Llano

http://www.anandtech.com/show/4476/amd-a83850-review

Trinity

http://www.anandtech.com/show/5831/amd-trinity-review-a10-4600m-a-new-hope

ShintaiDK · Jun 15, 2012

Munky said:
That's for drivers. Regular applications with x87 code run just fine on Windows 7 64-bit.

64bit applications? No. 32bit applications? Yes.

http://web.archive.org/web/20030916...wnloadableAssets/AMD_TechEdEMEA2003_Final.pdf

Olikan · Jun 15, 2012

AtenRa said:
Llano has a single IMC for both the CPU and iGPU cores, just like Trinity.

what i was trying to say... is about the Unified North Bridge

yes, i had to google

bronxzv · Jun 15, 2012

Munky said:
No one cares if an engine realistically models the real world, as long as it looks real enough and runs fast enough.

which is the exact purpose of using voxels for near-field global illumination approximation, I was just pointing out that triangles aren't the only primitive anymore

lamedude · Jun 15, 2012

You can use x87 if you really want to. MS's and Intel's compiler force at least SSE2 in 64bit so unless a program requires extended precision your probably not going to see it.

BenchPress · Jun 15, 2012

Munky said:
It's funny to watch people preach AVX2 as some sort of holy grail. Like Intel preached MMX about 15 years ago, that was supposed to be the holy grail back then. Or SSE 1,2 and 3. Or Larabee. That worked out real well, didn't it.

AVX2 is not comparable to MMX/SSE. The latter are for vertical vectorization, AVX2 is for horizontal vectorization. And it's horizontal vectorization that used to give GPUs an advantage at SPMD throughput computing.

Even so, MMX and SSE worked out really well. They've been indispensable for the PC multimedia revolution (music and video), and the majority of games depend on them for greatly improving performance. As explained above, Larrabee didn't succeed as a GPU, but it did succeed as a HPC device! And so there's no reason to doubt AVX2's ability to revolutionize the CPU's throughput computing capabilities.

GPGPU has already proven itself in high performance applications, especially those that lend themselves to parallel computing. The move the HSA will only solidify its adoption by developers.

GPGPU hasn't proven a whole lot. Else why is it that a 3000 GFLOPS GPU loses against a 230 GFLOPS CPU? And that's still a CPU without AVX2, and the average system will have a much weaker GPU!

For HSA to succeed, AMD needs good GPGPU performance from all vendors. NVIDIA's consumer Kepler architecture has taken a step back from GPGPU, to concentrate on better graphics performance. And Intel's iGPUs aren't great at GPGPU either. So why would a developer spend lots of effort developing for HSA, when that only results in reaching a fraction of the market? HSA won't even be ready before 2014. AVX2 will become available in 2013. AMD is facing an impossible task.

pelov · Jun 15, 2012

BenchPress said:
GPGPU hasn't proven a whole lot. Else why is it that a 3000 GFLOPS GPU loses against a 230 GFLOPS CPU? And that's still a CPU without AVX2, and the average system will have a much weaker GPU!

If you're going to compare GPGPU performance with anything nVidia then it had better be Tesla or it isn't worth mentioning. It had also better be CUDA or the benchmark isn't worth mentioning. It would be like comparing a CUDA benchmark with GCN GPUs. It makes no sense.

Compare that CPU to the 7970 at the top or the equally priced 7870. That's a fair fight.

Who's the winner there?

BenchPress · Jun 15, 2012

pelov said:
If you're going to compare GPGPU performance with anything nVidia then it had better be Tesla or it isn't worth mentioning.

Know a lot of consumers with a Tesla card then?

GPGPU isn't going to succeed in the consumer market unless all mainstream consumer GPUs become much more flexible for general purpose computing. But that's not going to happen. NVIDIA isn't going to sacrifice graphics performance, and neither will Intel.

Compare that CPU to the 7970 at the top or the equally priced 7870. That's a fair fight.

A cutting-edge 3000 GFLOPS GPU is losing from a 230 GFLOPS CPU. That's the only relevant reality here. You can't say GPGPU is a success and then ignore this. And again, the average GPU isn't a GTX 680 or HD 7970. It's much weaker. So a CPU with AVX2 is going to win across the whole board.

Seriously, if it takes that much raw power to defeat a 230 GFLOPS CPU, how can you claim GPGPU is the future? Doesn't it seem more promising to give the CPU some more raw power. Something like AVX2, perhaps?

pelov · Jun 15, 2012

BenchPress said:
A cutting-edge 3000 GFLOPS GPU is losing from a 230 GFLOPS CPU. That's the only relevant reality here. You can't say GPGPU is a success and then ignore this. And again, the average GPU isn't a GTX 680 or HD 7970. It's much weaker. So a CPU with AVX2 is going to win across the whole board.

So I'm the one ignoring the benchmarks yet you won't look up at AMD's scores with the 7870, a GPU that costs the same as your CPU, and only pay attention to the GPUs optimized with CUDA in mind?

Got it. Thanks for clarifying. I knew I was a little confused.

AtenRa · Jun 15, 2012

BenchPress said:
Know a lot of consumers with a Tesla card then?

GPGPU isn't going to succeed in the consumer market unless all mainstream consumer GPUs become much more flexible for general purpose computing. But that's not going to happen. NVIDIA isn't going to sacrifice graphics performance, and neither will Intel.

A cutting-edge 3000 GFLOPS GPU is losing from a 230 GFLOPS CPU. That's the only relevant reality here. You can't say GPGPU is a success and then ignore this. And again, the average GPU isn't a GTX 680 or HD 7970. It's much weaker. So a CPU with AVX2 is going to win across the whole board.

Core i7 3820 is not an average CPU either. You also dont take in to consideration the HD7870 that is in the same price tag as Core i7 CPUs. Is it because it just obliterate the Core i7 3820 in this application ??

BenchPress · Jun 15, 2012

pelov said:
So I'm the one ignoring the benchmarks yet you won't look up at AMD's scores with the 7870, a GPU that costs the same as your CPU, and only pay attention to the GPUs optimized with CUDA in mind?

Again, it doesn't matter how well AMD does. For HSA to succeed, they need GPGPU to be adopted by lots of developers. And this means it has to become worthwhile across all vendors involved. But the vast majority of developers won't rewrite their software for HSA, when they can get a bigger performance boost for more consumers when simply recompiling for AVX2.

Just look at the ROI. HSA doesn't stand a chance.

Got it. Thanks for clarifying. I knew I was a little confused.

You're welcome.

Abwx · Jun 15, 2012

BenchPress said:
Else why is it that a 3000 GFLOPS GPU loses against a 230 GFLOPS CPU?

Let s see the charts on this link :

This is a score , not a rendering time.....

So much for the 230Gflops Cpu...

pelov · Jun 15, 2012

I wonder if we can find some openCL GPU-accelerated programs that are already here that outperform optimized CPU-only software on comparable platforms and $$$...

Hmmm... well that's quite an improvement, wouldn't you say? And unlike AVX2 which is sometime in the future, you can have this now. Also unlike AVX2, only AMD supports this and Intel doesn't whereas AMD will almost certainly support AVX2.

But wait! there's more!

Photoshop too? Wh0ataete~!@!@

What was the max theoretical throughput of AVX2? 4x throughput of SSE4? Looks like that won't be enough to match these results we've seen here on a single APU without discrete graphics, never mind an AMD APU with AVX2 as well and coupled with a GCN GPU, or, God forbid we see an AMD APU with AVX2 and shared memory address space with GCN on-die graphics!!! Could you imagine the slaughter?

Developers are already supporting openCL and will likely support HSA. In fact, if these are the improvements with GPGPU on a Llano without HSA can you imagine the Trinity benchmarks? Or the Kaveri benchmarks? Sweet Jesus! And openCL already has more traction in the consumer space than CUDA has ever had.

Tip from somebody who's been hanging around tech forums since the 90s:

Don't ever get excited about an instruction set. They always promise to deliver the world (or so the fanboys claim) and they always disappoint. I can't for the life of me remember the last time an instruction set delivered on its purported promises yet every single time we see a new instruction set, regardless of the microarchitecture, there's always someone thumping it as the best thing since the fleshlight.

BenchPress · Jun 15, 2012

AtenRa said:
Core i7 3820 is not an average CPU either.

Indeed, but cheaper quad-cores exist that achieve nearly the same performance.

You also dont take in to consideration the HD7870 that is in the same price tag as Core i7 CPUs. Is it because it just obliterate the Core i7 3820 in this application ??

No, it's because I'm evaluating the claim that GPGPU is more promising than AVX2. This can't be true unless all GPUs in this price class outperform the CPU. Note once more that this is a CPU without AVX2!

BenchPress · Jun 15, 2012

Abwx said:
This is a score , not a rendering time.....

Yes, now look at the score for the 3000 GFLOPS GTX 680. Pathetic! How can GPGPU possibly be the future when such a last-gen high-end GPU loses against a 230 GFLOPS CPU without AVX2?

pelov · Jun 15, 2012

BenchPress said:
Yes, now look at the score for the 3000 GFLOPS GTX 680. Pathetic! How can GPGPU possibly be the future when such a last-gen high-end GPU loses against a 230 GFLOPS CPU without AVX2?

Easy. Compare a GTX680 or GTX580 in CUDA and that 3820.

I'll help you out here.

The score for the 3820 is 0. With AVX2 it would also be 0.

BenchPress · Jun 15, 2012

pelov said:
I wonder if we can find some openCL GPU-accelerated programs that are already here that outperform optimized CPU-only software on comparable platforms and $$$...

Sigh. I've already covered why those benchmarks are worthless: First of all, they're comparing a 125 Watt FX-8150 + 200 Watt 7970, against a 35 Watt mobile dual-core i5. And despite that, in several cases the GPU's lead really isn't that big, and you have to keep in mind that AVX2 doubles the arithmetic throughput but also requires 18 times fewer instructions for gather! In other cases the results can really only be explained by running blatantly unoptimized CPU code, just like NVIDIA has done with PhysX. They should at the very least have taken the effort to run OpenCL on the CPU to be able to tell how it compares.

The Official AVX2 Thread

Diamond Member

Diamond Member

Golden Member

Lifer

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Platinum Member

Senior member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Senior member

Lifer

Diamond Member

Senior member

Senior member

Diamond Member

Senior member