Search results

N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Absolutely other way around, buffering is there to reduce signaling latencies.
- naukkis
- Post #11,007
- Wednesday at 11:40 AM
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

Those spec-results are excellent and pretty much in line with GB6 results.
- naukkis
- Post #6,833
- Tuesday at 9:47 AM
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Netburst was little bit of braindead design as most of resources that could be benefit SMT was used by stupid recall mechanism - if instruction data wasn't ready instead of waiting data to be ready Netburst re-issued instruction repeatedly until it's data arrived. No wonder that those cpu's run...
- naukkis
- Post #7,712
- Monday at 11:59 AM
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Willamette Xeon was called Foster - and it had HT enabled.
- naukkis
- Post #7,711
- Monday at 11:49 AM
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Old one threaded game engines were synced to fps. Such a case fps per frame is pretty much constant. Multithreaded engines do run game/physical engines asyncronously to rendering engine so instructions aren't totally tied to fps, but for fps mattering visual part they still pretty much are, at...
- naukkis
- Post #10,835
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Game is doing given numbers of instructions per frame if stupid things like lock spinning is excluded. And if it's included - performance that matters is that fps not non-useful instruction count executed. So yeah, when comparing game performance measure fps over anything non revealing measurements.
- naukkis
- Post #10,833
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

IPC can be calculated for game too including stalled cycles by locking as told before. But case with comparing 5700 to 5700x removes that argument, both CPU's share same 8c-ccx with similar clocks and 5700 actually have little less locking penalty as it have built-in northbridge vs external in...
- naukkis
- Post #10,829
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Rdram was only used on Willamette and early Northwood - they swiched to DDR-ram with Granite bay long before Prescott. 32-bit Xeons did have 36 bit physical address space so 64GB ram limit.
- naukkis
- Post #7,678
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Need only compare 5700 against 5700x. 5700 has a bit better memory latencies from unified design but only half a cache. It's can also compared to Zen2 with comparable 16mb CCX caches. https://www.techspot.com/review/2802-amd-ryzen-5700/
- naukkis
- Post #10,816
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Zen3 doubled 1-thread cache. That's always massive uplift for cache sensitive applications and gave some places 100% IPC uplift. Direct AMD quote from link posted before: "It also transitioned to a new "unified complex" design that brought 8 cores and 32MB of L3 cache into a single group of...
- naukkis
- Post #10,814
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Yes, Willamette server cpu's had it enabled.
- naukkis
- Post #7,673
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Separate cache pools waste most of cache capacity to duplication. And AMD did make it clear that unified 32MB cache pool of Zen3 is responsible for most part of game speed up. https://www.amd.com/en/technologies/zen-core
- naukkis
- Post #10,811
- May 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Uop cache is about that energy efficiency difference between decoding instruction and fetching it from mop cache. MOP cache will take silicon space so for area efficiency it's better without but for efficiency simple instructions sets like arm can live without mop cache but x86 - just need it to...
- naukkis
- Post #7,627
- May 7, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Didn't AMD tell that 512-bit FPU is optional in Zen5 designs? Using 512bit FPU pipelines and load/store engine and trying to optimize that design to max density is little bit retard way to optimize designs?
- naukkis
- Post #10,454
- May 7, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Problem is that Intel did extract everything they could from chip - and gone too far. They become unstable. Big part of that is that SMT - removing SMT in chip design will decrease chips hot spots by simplying critical path routings. Have to actually wonder if anybody is actually tested what...
- naukkis
- Post #7,581
- May 5, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Point that should be seen is that in Intel hybrid cpu designs HT gain in best case scenario isn't 30% but at best 10%. It should be plain obvious that SMT should be dropped from P-cores and instead focus targeting best possible 1T-speed with those P-cores - and let highly-threaded workloads...
- naukkis
- Post #7,567
- May 5, 2024
- Forum: CPUs and Overclocking
N
Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

No, what they did do with Zen3 is to use marco-ops instead of micro-ops - they reduced PRF usage by letting macro-ops transfer data directly between execution units so they could increase concurrency without increasing PRF throughput.
- naukkis
- Post #9,524
- Apr 22, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

In hybrid cpu configuration big cores are there for best per thread performance. If they want still to utilize SMT right core to have it is those e-cores - splitting slow core performance to half for best n-thread performance but still maintaining good 1-thread performance. Having SMT on their...
- naukkis
- Post #6,298
- Apr 4, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

You do know that what you supposed means disabling HT. Splitting each core to two virtual cores = HT on, one thread per core = HT off. HT can of course "disabled" by parking one core from core pair. But to have 100% one thread performance HT have to be disabled totally as some hardware resources...
- naukkis
- Post #6,296
- Apr 4, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

This specially was solutions for utilizing wider cores. Vectorization(SIMD) is working only when there's no dependencies between data - basically dependencies have to be resolved compile time. With loop unrolling it's also possible resolve dependencies (calculate or predict variables from...
- naukkis
- Post #6,287
- Mar 30, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

Many part of loops are't vectorizable but can be unrolled. Compilers do unroll loops to extract parallelism but compile time unrolling is more limited than runtime. Hardware loop unrolling is pretty complicated scheme but known for ages - and todays hardware already has loop caches which is...
- naukkis
- Post #6,283
- Mar 28, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

There's something that might give good results from very wide cores that aren't yet utilized - like hardware loop unrolling. Complex to do - but when done it makes possible to run every iteration of loop in it's own hardware making well use of very wide execution hardware. Though proper ISA...
- naukkis
- Post #6,266
- Mar 27, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

Everything can be buggy or just doesn't work so usually cpu's have ability to switch off almost all performance options. But loading data with speculated pointers - why the hell everything has to be general purpose on todays cpus - why ISA doesn't implement different registers for data and...
- naukkis
- Post #6,243
- Mar 22, 2024
- Forum: CPUs and Overclocking
N
Discussion Apple Silicon SoC thread

The most interesting part of that side-channel is that Apple is using pointer finder in their prefetchers and prefetching suspected pointer locations.
- naukkis
- Post #6,240
- Mar 22, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed...
- naukkis
- Post #6,728
- Mar 20, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Speculation was about replacing FPU with NPU. FPU doesn't usually even support FP16 math, it's single precision is 32 bit and double precision 64 bit(or more). OK, FPU SIMD today do support also FP16 but that's just outlier for something like AI where NPU is aimed at. So no, you can't replace...
- naukkis
- Post #6,718
- Mar 20, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

NPU is exact opposite of FPU. Floating point instructions are varying point so number expression range can be huge, like from 2^-64 to 2^64 and calculations can be done between opposite extremes. NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values. If we...
- naukkis
- Post #6,695
- Mar 19, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Actually higher leakage in cpu means higher clocking potential. Cpu manufacturers sometimes give golden overclocking samples away that are way too leakage to be sold. Metal resistance will grow with temperature so keeping silicon as cold as possible gives transistors more current to switch -...
- naukkis
- Post #6,504
- Mar 10, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

I speculated in risc-v threads that cpu designs should go towards split register file designs. VISC is nothing more than split register file core with software abstraction layer - which isn't way I see split register cpu should be made but instead it should rely on ISA that allows using split...
- naukkis
- Post #6,471
- Mar 9, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Yeah, Intel should probably just made GoldenCove much bigger and run 40 of them @2Ghz with at least 4-way SMT to be competitive with rival's smaller cores. With such a strategy they are soon out of business. Actually they already are - have to wonder which parties are actually buying their...
- naukkis
- Post #6,439
- Mar 9, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

That's absolutely not core's fault. Intel does pack them that way for being easily add 4-core complexes to existing ring/mesh networks but nothing is in their way to implement different L2-cache versions. They could for example made 60-core e-core solution which have fast 12MB L2 per core in...
- naukkis
- Post #6,437
- Mar 9, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Intel big cores are just wasted in their server grade cpu's - they are enormous as they are designed to run 6GHz and they are clocked about 3Ghz in their big configurations. Running them at 3GHz with HT means that their single-thread performance is actually lower than their small-core rivals...
- naukkis
- Post #6,435
- Mar 9, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Intel HT is symmetrical threading. Both threads are equal, every other clock cycle instructions are feed from different thread. There aren't primary/secondary thread but both threads are executed at speed that is a bit more than half of that execution of single thread on that core.
- naukkis
- Post #6,410
- Mar 8, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Those cases where single-thread speed doesn't matter they should drop big cores and use just more e-cores. Actually Intel is doing it right now.
- naukkis
- Post #6,408
- Mar 8, 2024
- Forum: CPUs and Overclocking
N
Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Big cores in hybrid designs are there to offer better thread performance. Using HT will nullify that as HT will split core thread performance to about half. Only beneficial case for HT in those hybrid designs are massively parallelized loads where single thread performance won't matter - and if...
- naukkis
- Post #6,403
- Mar 8, 2024
- Forum: CPUs and Overclocking
N
Question Greatest x86 innovations?

Actually 386 protected mode is just same as 286, just expanded to 32bits, with paging unit added. 386 just allows misusing it's segmentation by allowing overlapping segments - which can be as big as addressable memory. It sure makes programming easier but that's actually a shame because 286/386...
- naukkis
- Post #6
- Mar 2, 2024
- Forum: CPUs and Overclocking
N
Discussion Future ARM Cortex + Neoverse µArchs Discussion

x86 is hard to translate to any other ISA but Apple and Windows are doing it fine today for aarch64. Arm to risc-V and other way around isn't big deal at all, should except near native performance.
- naukkis
- Post #454
- Feb 25, 2024
- Forum: CPUs and Overclocking
N
Discussion Future ARM Cortex + Neoverse µArchs Discussion

Legacy support ain't so important nowadays - it's pretty much enough that there is emulation layer to support existing software. Google is building risc-V version from Android and when it's ready and there's competitive hardware RV will be pretty much viable alternative for ARM for phones...
- naukkis
- Post #450
- Feb 25, 2024
- Forum: CPUs and Overclocking
N
Discussion Future ARM Cortex + Neoverse µArchs Discussion

Basic principle should be KISS - keep it simple stupid. At Intel they time after time develop exact opposites of that - IAPX432, Itanium and so on, which is exactly opposite. Probably there's no-one really on top of design so they put everything ever invented in designs list and try to...
- naukkis
- Post #447
- Feb 24, 2024
- Forum: CPUs and Overclocking
N
Discussion Future ARM Cortex + Neoverse µArchs Discussion

They made ISA to implement everything compiler could do to made their core forward-proof without recompilation and to make execution hardware as simple as possible - strictly in-order-design. They failed in both - code was not optimized for future without recompiling and executing hardware...
- naukkis
- Post #445
- Feb 24, 2024
- Forum: CPUs and Overclocking

RESOURCES

Top Bottom