I wanted him to answer the question as the claim was that the front end was not efficient because the SMT yield was greater.What do you mean by this? GF cancelled 7nm finfet.
SMT is symmetric. The front end and L1 has changed in Zen2, and it's much more capable despite only 4 decode/cycle. They almost doubled up op-cache.
https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/8
They feed the L2 instruction code effectively using a new unit. Then have shrunk L1i (halved?) but doubled op-cache size.
The op cache can also now feed up to 8 ops per cycle versus the old 6 ops maximum; link above is new zen2 and this link is zen1 https://www.anandtech.com/show/1057...lers-micro-op-cache-memory-hierarchy-revealed
The decoder still chugs only 4. It may or may not bottleneck the performance. It just depends how the op cache is performing. They did a great job so I think it's likely rare that the 4-wide decode is an issue. Because Zen3 likely is mobile focused I think doubling up decoder might not happen unless there are energy efficiency tricks. (I remember Kaveri getting doubled up decode and I think it has a lot to do with why these little APU's made such good space heaters and why 8c Steamroller was cancelled.) Maybe they are designing a 6-wide decode.
This is total speculation and likely totally wrong; but it's my best guess.
I think they will widen the core a little more and do 4-way multithreading, so with four threads and wider core they may need to widen the decoder. I don't think it will be SMT4 though, but think they will add a "Threadrip" mode, that allows a pair of opportunistic threads to run on top of SMT2 and help keep the execution units busy. This would be similar to big-little in the acorn world. These small threads would run completely without speculation (taking turns pausing on branches) and out-of-order execution would be very limited.
For consumer enabling Threadrip would be a benefit; with quadcore APU having 8 strong threads, and 8 "small" threads. Building a kernel with -j16 would be a big speedup over a kernel build with only SMT2 enabled. Little threads would also have no vulnerability to spec execution. The OS can use it for itself and system processes. Browsers can be made to use it (eg offloading incessant and useless javascript threads associated with bg tabs). Little threads would be most useful for high latency and high FPU code and could be useful in parallel compute datacentres.
Zen3 would be primarily mobile focused, secondarily server focused; and hopefully the first to see this core would be a quadcore sub 10W APU, followed by 2 CCX chiplets for the server and consumer markets.
As far as product lines, I think unlike 3000 gen, consumer 4000 MCM's would be strictly single CPU chiplet APUs, with 8 CU Vega built into the IOX, and available for both Zen2 and Zen3 chiplets--Zen2 arriving late H1 and Zen2 in late H2 or early 2021.
statement I replied to:
"The SMT yield is higher, which means that the front end is still not efficiently feeding the execution units."