These are good points. I'm actually quite curious how Atom team put everything into such a small area. Some parts of cores are even bigger than Golden Cove (L1 cache and issue ports) We already know that the Willow cove(which is almost the same as Sunny cove) core uses more than 10mm2 of die space. 2015 Skylake used 8mm2 of die space, it'll be 3~4mm2 with 10ESF assuming 50~70% of shrink. But Gracemont uses 1.5~2mm2. Surprising.
The caches at that small of a capacity takes up minimal space. Part of the reason it has more issue ports is because on the Cove cores the port is more multi-function, while it's dedicated on the atom-based chips.
10nm Skylake should be close to 4.5mm2.
There's also a significant die and performance per clock penalty in reaching insane clocks. Goldmont and after chips have a 13 stage pipeline. And Cove cores are at 18 with uop cache miss and around 14 with a hit. Golden Cove has 1 more. So you are not only adding a bit of extra logic, it reduces performance per clock by a noticeable amount.
Another reason for the larger area is simply the spacing required. The GPUs and Atom cores shrunk by 2.5x+ on 14nm, and probably bit over 2x on 10nm but the main cores always ended up in the 2x range. The spacing I speculate reduces hotspots which I suspect at such ridiculous frequencies reduce the clock headroom.
Of course if you want the absolute max performance and you can have it use 200-plus watts for a desktop chip, that's the way to go for higher performance.
Maybe for Desktops they'll continue to use super high clock, high power consumption chips but for mobile it'll be dominated by -mont successors that perform only slightly below Cove in perf/clock.
With this, I agree that SandyBridge is revolutionary in these respects.
Considering Sandy Bridge's gains are similar to what we'll get with Golden Cove, I'd say it's a way better way of doing things. Of course new ideas don't just fall from the sky.
Just expanding it is how you fall into the square root law of returns.
Sandy Bridge's new ideas:
-uop cache
-Physical Register Files
-Rethinking of the branch target buffer to increase effective history size
-Using existing integer SIMD ports to double FP performance in a die efficient manner
-Use of an efficient, simple interconnect called the Ring Bus
-Significantly improved Turbo mode, Turbo 2.0
Due to those changes, in the mobile space we saw amazing gains. 60% in the H space and 30%-plus in the U space at lower power and better battery life.
They overhauled almost every aspect in an efficient manner. Golden Cove just does more of the same.