- Mar 3, 2017
- 1,623
- 5,894
- 136
Oh so a current gen Zen 4 APU would have access to the full RAM for running ollama? That's great to learn. So it's better to wait for a large RAM Halo than going for an Nvidia GPU laptop.That's everything since Kaveri.
That's aperture. It slides.
Kaveri long predates Apple efforts.
Gotta give Timmy props for marketing their stuff like magic.
Kinda the selling point yes.Oh so a current gen Zen 4 APU would have access to the full RAM for running ollama? That's great to learn. So it's better to wait for a large RAM Halo than going for an Nvidia GPU laptop.
Sure, that's why I'm interested in Strix Halo, it seems it'll be the only contender to Macs for "medium" sized models requiring 64-128GB of RAM and good speeds. (It's either that or 4x4090, which is another product entirely. Also inference speeds are quite good with the M1...3 GPUs and NPUs, 4x4090 would be overkill in that sense.)Kinda the selling point yes.
Bandwidth is lacking though since it's only a 256b L5x setup.
That's everything since Kaveri.
That's aperture. It slides.
Kaveri long predates Apple efforts.
Gotta give Timmy props for marketing their stuff like magic.
While N4X offers significant performance enhancements compared to N4 and N4P, it continues to use the same SRAM, standard I/O, and other IPs as N4P, which enables chip designers to migrate their designs to N4X easily and cost effectively. Meanwhile, keeping in mind N4X's IP compatibility with N4P, it is logical to expect transistor density of N4X to be more or less in line with that of N4P. Though given the focus of this technology, expect chip designers to use this technology to get extreme performance rather than maximum transistor density and small chip dimensions.
I think this is also an interesting tidbit:That would be interesting, especially since we have not had a consumer product use an X node.
TSMC Details N4X Process for HPC: Extreme Performance at Minimum Leakage
www.anandtech.com
View attachment 97455
In particular, N4X adds four new devices on top of the N4P device offerings, including ultra-low-voltage transistors (uLVT) for applications that need to be very efficient, and extremely-low threshold voltage transistors (eLVT) for applications that need to work at high clocks. For example, N4X uLVT with overdrive offers 21% lower power at the same speed when compared to N4P eLVT, whereas N4X eLVT in OD offers 6% higher speed for critical paths when compared to N4P eLVT.
That's not how OS-managed memory allocation works!RAM has to be temporarily dedicated to the GPU, unlike Macs?(Could be misunderstanding something?
Oohhhh, you're way smarter than the average poster here.Now that AMD has separated Zen 5 Client into Classic and Dense, they can customize Classic to use eLVT for higher clocks while Dense use uLVT for higher efficiency at lower clocks.
Zen 3 added IO, it didn't add ALUs. Is there a reason why you left that out of their quote? More IO is a wider design but it isn't the same as more execution units. Neither of you are wrong.Zen3 literally piled on the port count.
There's a ton more EUs, just more specialized ones.
Their point is pretty simple so idk why ppl are being overly dramatic.What did I just read...? )
It already made sense considering everyone knows about zen4c and it would be really unwise to assume AMD aren't going to use every tool TSMC has to make zen5c as efficient as possible. No company would let tech sit on the table when it's available to them and their competitors are also using it.It makes even more sense with Adroc remarks that the delta between Z5 Mobile and Snap Elite in battery life isn't as big.
...I'm basing this off what I think I know how Windows allocates memory between the CPU and iGPU but I should've asked others about this instead.That's not how OS-managed memory allocation works!
Oohhhh, you're way smarter than the average poster here.
Zen6 takes this even further, if you can figure out the one area AMD is not entirely leading in.Now that AMD has separated Zen 5 Client into Classic and Dense, they can customize Classic to use eLVT for higher clocks while Dense use uLVT for higher efficiency at lower clocks.
But they didn't increase the PRF read/write port count which is the fundamental limit of execution concurrency within the Core. Which is why i say Zen1->4 are fundamentally the same execution width , With the Work in Zen3 they made the average port usage closer to the Peak.Zen3 literally piled on the port count.
There's a ton more EUs, just more specialized ones.
Yes that's is true of any Core , because it takes 1000's of engineering hours per 0.1% of performance , so it is truly a function of time. You have to bank what you have got and ship at some time. This is true of any engineering exercise.From generation to generation, Zen was larger in terms of the logic used and the number of transistors used for it. Zen 3 compared to Zen 2 is generally a redesign of the control logic and the algorithms contained in it, and an expansion of about 14%. This proves that the logic in Zen 2 was not designed optimally for the amount of resources.
In what way does Zen6 take it further? And are such details already known about Zen6?Zen6 takes this even further, if you can figure out the one area AMD is not entirely leading in.
Almost 2 years if not more i'm afraid. Best bet are used genoa or regular zen 4 threadripper.I haven’t been following this thread so I have a quick question for the forum members. I’m looking to acquire an HP Z6 G5 7995WX workstation, but before I do I’m curious when the Zen 5 version (128 cores?) of Threadripper chip is expected. Thanks!
Aren’t the Zen 5 EPYCs coming later this year? Since the platform doesn’t change(?) from Zen4 to Zen5, I was hoping next Threadripper is released early to mid next year. I can’t wait even that long anyways. Thanks!Almost 2 years if not more i'm afraid. Best bet are used genoa or regular zen 4 threadripper.
Just a total dart throw, but one thing it would be nice to see AMD do is to match Intel in reducing their idle power consumption via the use of a low power core. I recall there being an AMD patent a ways back where they had the cache of a big core be shared with a small core or something to that effect where in essence the workload could be passed between the cores without much penalty. If Zen 6’s IOD uses N4X then you could make that small core pretty power efficient by using those uLVT transistors and shutting down the compute die. Plus, the use of Infinity Link or whatever AMD calls it to connect the compute die to the IOD may allow for the small core approach to work. Given that Adroc hinted that Zen 6 desktop is more mobile-like than ever, this idea is not too far fetched in my opinion.Zen6 takes this even further, if you can figure out the one area AMD is not entirely leading in.
If you can get one then sure, Turin is always the 1st choice.Aren’t the Zen 5 EPYCs coming later this year? Since the platform doesn’t change(?) from Zen4 to Zen5, I was hoping next Threadripper is released early to mid next year. I can’t wait even that long anyways. Thanks!
I wouldn't be holding my breath.Aren’t the Zen 5 EPYCs coming later this year? Since the platform doesn’t change(?) from Zen4 to Zen5, I was hoping next Threadripper is released early to mid next year. I can’t wait even that long anyways. Thanks!
...I recall there being an AMD patent a ways back where they had the cache of a big core be shared with a small core or something to that effect where in essence the workload could be passed between the cores without much penalty.
...
Edit: found the article.
AMD patents a task transition method between BIG and LITTLE processors - VideoCardz.com
AMD “big.LITTLE” aka heterogeneous computing in Ryzen 8000 series The next decade will no longer be dictated by the number of cores, but rather the processor’s fabrication node, packaging method, and power effciency. A big role will also be played by heterogeneous architectures. Later this year...videocardz.com
you forgot to mention the funniest part, A10 had LITTLEs grow off the bigs like a cancerous tumor.This patent reminds me a lot of A10 Fusion, where the little cores ended up being almost transparent to software because if a workload was heavy enough it would transition from the little cores to the big cores, and primarily use those instead (not sure how that was determined by the OS/hardware, but I wouldn't be surprised if it relied upon the types of instructions run (like in that patent) or the workload duration.
But they didn't increase the PRF read/write port count which is the fundamental limit of execution concurrency within the Core. Which is why i say Zen1->4 are fundamentally the same execution width , With the Work in Zen3 they made the average port usage closer to the Peak.
They did do what I said , that's why they had dedicated branch execution units. I'm also not 100% sure on some people's thinking on zen3, AMD have had Mop and op fusion for along time, I wouldn't be so quick to assume that mops and op fusion Didn't exist in zen 1/2 because they existed as far back as bulldozer.No, what they did do with Zen3 is to use marco-ops instead of micro-ops - they reduced PRF usage by letting macro-ops transfer data directly between execution units so they could increase concurrency without increasing PRF throughput.