Hmm, isn't 5 twice that of 2.5? The graphs which you are linked are informative. But even more informative would be simulations of a choice of different workloads, with different cache hit rates and different cache latencies.
[Shared L2…]
This paints a too simplistic picture of what they actually did. First, they still have L2 and L3 (and L4 even), just that L3 and L4 are now "virtual" by way of sharing SRAM with L2 dynamically. Still, L3 (let alone L4) latency is higher than L2 latency — typically
much higher. Second, and this is what your argument neglects, there is a bunch of QoS
not-so-secret sauce in place, to keep the extent of sharing in check, IOW to maintain a prioritized rest of low-latency L2 cache for each active core. If a core is not completely idle, it
does have L2 cache which other cores can't displace. So in this regard, these mainframe CPUs don't really differ from what
@Win2012R2 claimed about server CPUs.
Meanwhile, Xeon's shared L3 cache has become so unwieldy, i.e. cache latency has become so nonuniform across the entirety of the L3 cache, that they now offer the option of dividing the SoC into NUMA domains (
clustering modes), such that groups of cores preferrably access nearest subsets of the overall L3 cache. Oh and this option is on by default. Turns out, making shared caches bigger gets tricky if you also want to keep them fast.