Too bad about 7nm+ at GloFo => power-performance-area (PPA) target of 40% power reduction, 10% performance boost and 10% area compaction through standard cell library richness focusing on parasitic reduction, physical design incorporate with EUV element and cell drive strength granularity
What do you mean by this? GF cancelled 7nm finfet.
So what is feeding the 2nd thread in SMT? Isn't it the same front end? IPC has increased on top of SMT yield has also increased. This equates to an improved front end from the previous generation.
SMT is symmetric. The front end and L1 has changed in Zen2, and it's much more capable despite same old 4 decode/cycle. They almost doubled up op-cache.
I think Zen 3 goes wider. Right now it can do what, decode 4 as well as dispatch 2 ups for a total of 6? I could see them adding a decoder. Beyond that I'd have to look at the slides to see where some of the bottlenecks might be.
https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/8
They feed the L2 instruction code effectively using a new unit. Then have shrunk L1i (halved?) but doubled op-cache size.
The op cache can also now feed up to 8 ops per cycle versus the old 6 ops maximum; link above is new zen2 and this link is zen1
https://www.anandtech.com/show/1057...lers-micro-op-cache-memory-hierarchy-revealed
The decoder still chugs only 4. It may or may not bottleneck the performance. It just depends how the op cache is performing. They did a great job so I think it's likely rare that the 4-wide decode is an issue. Because Zen3 likely is mobile focused I think doubling up decoder might not happen unless there are energy efficiency tricks. (I remember Kaveri getting doubled up decode and I think it has a lot to do with why these little APU's made such good space heaters and why 8c Steamroller was cancelled.) Maybe they are designing a 6-wide decode.
This is total speculation and likely totally wrong; but it's my best guess.
I think they will widen the core a little more and do 4-way multithreading, so with four threads and wider core they may need to widen the decoder. I don't think it will be SMT4 though, but think they will add a "Threadrip" mode, that allows a pair of opportunistic threads to run on top of SMT2 and help keep the execution units busy. This would be similar to big-little in the acorn world. These small threads would run completely without speculation (taking turns pausing on branches) and out-of-order execution would be very limited.
For consumer enabling Threadrip would be a benefit; with quadcore APU having 8 strong threads, and 8 "small" threads. Building a kernel with -j16 would be a big speedup over a kernel build with only SMT2 enabled. Little threads would also have no vulnerability to spec execution. The OS can use it for itself and system processes. Browsers can be made to use it (eg offloading incessant and useless javascript threads associated with bg tabs). Little threads would be most useful for high latency and high FPU code and could be useful in parallel compute datacentres.
Zen3 would be primarily mobile focused, secondarily server focused; and hopefully the first to see this core would be a quadcore sub 10W APU, followed by 2 CCX chiplets for the server and consumer markets.
As far as product lines, I think unlike 3000 gen, consumer 4000/5000 MCM's would be strictly single CPU chiplet APUs, with 8 CU Vega built into the IOX, and available for both Zen2 and Zen3 chiplets--Zen2 arriving late this year and Zen2 5000 in late H2 or early 2021. Monolithic mobile quadcore Zen3 4000 APU arriving mid 2020 and mid H2 for AM4.