Because the CPU can't handle such HPC workloads. You need a massive bus width, gigabytes of attached very quick ram and thousands of processors! Furthermore, you can tally up the GPUs much as HPCs do today, scaling to hundreds of GPUs across PCIE lanes. You can at best hope to mimic that with CPUs but AMD has already snatched up SeaMicro for that. You can count cores however you want, at the end of the day the GPU has many many more than does the on-die CPU.
In contrast, for the GPU to become any better at GPGPU it has to sacrifice a considerable amount of graphics performance. It basically has to become more like a CPU. But that's downright silly. If becoming more like the CPU is the answer then why not let the CPU handle these workloads in the first place? AVX2 was the only missing bit to make that happen.
It does sacrifice performance and that's partly why GCN is behind Kepler as far as gaming goes (but still kicks the CPU's ass). Tesla, otoh, is completely HPC focused and trounces everything. Unlike the CPU, which is a general processing unit (jack of all trades and it has to be), HPC GPUs do only one thing well and that's raw GFLOPS in either single or double precision -- yes, that's FMA there too. Instruction sets as a CPU advantage only work as a CPU advantage if they stay in the CPU but GPUs adopt them as well so any potential advantage there disappears. Unlike CPUs they don't lug around useless ISAs and are strictly limited to FP-tasks, thus adopting AVX2 for GPGPU is not just likely but almost a certainty. Using AVX2 as a reason why CPUs will catch up won't work then.
Your benchmark only showed the same thing I mentioned in the post you're arguing with:
They are co-processors with specific purposes and the CPU was never and will never be able to reach those same levels of performance or efficiency. For moderate openCL/CUDA stuff like photoshop/video editing, GPU-accelerated browsing and light gaming on-die GPUs are more than capable but past that it doesn't make sense.
Then there's this gem off Intel's own site...
Intel® MIC products give developers a key advantage: They run on standard, existing programming tools and methods.
Intel® MIC architecture combines many Intel® CPU cores onto a single chip. Developers can program these cores using standard C, C++, and FORTRAN source code. The same program source code written for Intel® MIC products can be compiled and run on a standard Intel® Xeon processor. Familiar programming models remove developer-training barriers, allowing the developer to focus on the problems rather than software engineering.
So while CPUs are getting better at tasks that are almost entirely GPU/GPGPU based, so are GPUs. One of those, though, isn't being hamstrung by ISAs, TDP, socket compatibility and various other hardware. If you saturate the PCIE lanes then just increase the bandwidth and keep going or keep adding GPUs. If Intel was as confident in Haswell and AVX2 as you are then they wouldn't have bothered with Knight's Corner co-processor.