hamunaptra
Senior member
- May 24, 2005
- 929
- 0
- 71
Interlagos has 16 cores.
128 bit mode = 4 x 16 = 64 single precision operations per cycle
256 bit mode = 8 x 8 = 64 single precision operations per cycle
So, whether you are in legacy mode or AVX mode, the number of operations are the same.
Now, for the fun part.
If you are utilizing FMA4, you can do a fused multiply accumulate (a=b+c*d) all in one cycle, where it would take SB 2 cycles to achieve the same thing.
And here are a few of the things that BD's FMACs can do that SB can't:
1. Run a 128-bit AVX and an SSE operation on the same cycle
2. Run two 128-bit AVX operations on the same cycle
3. Run two 128-bit AVX and an SSE operation on the same cycle
4. Run a 256-bit AVX and an SSE on the same cycle
I know so little about the folding software, but if it utilizes a lot of SSE, you should see a big boost from BD because intel recommends recompiling for AVX and changing all SSE instructions to AVX-128. Plus, for AVX-128 they recommend actually padding the instruction (adding all zeros between 128 and 256) which means that even though you have 256-bit wide registers, you can only run one 128-bit through at a time.
Plus, with FMACs being more flexible (they can do an FADD or an FMUL) on any cycle, you are better off because you get higher efficiency.
Whoa, what it can run a 256bit AVX and SSE in parallel in a single clock cycle?? thats news to me!
I thought it wouldnt be able to do such a thing. I know there are 2 other pipes other than the FMAC but afaik those were for INT SIMD MMX type workloads...
How can the BD do a single 256bit AVX and an SSE alongside each other?