Recent content by camel-cdr

  1. Discussion RISC V Latest Developments Discussion [No Politics]

    I ask Greg a few things about Veyron afterward: They can fuse non-adjacent instructions, so no compiler support is needed. LMUL is handled late in the pipeline, so LMUL>1 instructions are scheduled as single instructions and only split at issue.
  2. Discussion RISC V Latest Developments Discussion [No Politics]

    The talk is up on youtube: Biggest additional info, not present in the slides: They plan to release a Athena (8x Ascalon) devboard and even laptop for people to buy as a development platform.
  3. Discussion RISC V Latest Developments Discussion [No Politics]

    I used the numbers from the M4 Geekerwan video, which was 11.72@4.47GHz.
  4. Discussion RISC V Latest Developments Discussion [No Politics]

    This graph is actually the biggest source of SPEC2006/GHz estimates for newer processors I've seen so far. Since RISC-V companies still mostly publish SPEC2006/GHz, I added some additional cores into the graph (without release date):
  5. Discussion RISC V Latest Developments Discussion [No Politics]

    Full slides here: https://riscv.or.jp/wp-content/uploads/Japan_RISC-V_day_Spring_2025_compressed.pdf The reported SPECint scores for Ascalon don't really match up with "Projected Zen5 performance in 2024". Callandor looks insane though, 16 wide decode with 2-ahead branch predictor and...
  6. Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

    It's worse than that, apart from the a64fx every SVE implementation reuses the NEON ALUs for their SVE implementation AFAIK. So on the Neoverse V1, you can use four issue 128-bit NEON or two issue 256-bit SVE. Which makes the gain from SVE minimal.
  7. Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

    Take the AVX10 spec, change encodings of AVX10/128, AVX10/256 and AVX10/512 to overlap, remove the 0.1% of instructions that don't make sense anymore, add instruction that returns the vector length. Now you have a scalable vector ISA and it's possible to write length agnostic code. This maps...
  8. Discussion ARM Cortex/Neoverse IP + SoCs (no custom cores) Discussion

    https://godbolt.org/z/13xc73T3n You just need to include v in the march string. I also added -mrvv-max-lmul=dynamic, because that ends up with better codegen (tries to maximize LMUL). IMO it should be the default option.
  9. Discussion RISC V Latest Developments Discussion [No Politics]

    @Nothingness Give me some micro benchmarks where you see RISC-V codegen as lacking, and I'll try to benchmark it on XiangShanV2 and XiangShanV3 in the next couple of days.
  10. Discussion RISC V Latest Developments Discussion [No Politics]

    clang O2 vectorizes it on both arm and RISC-V: https://godbolt.org/z/TGvWKWch3 Both do 4 adds per loop, but Arm takes 10 instructions, while RISC-V takes 8. If we are fair, and expand the load pair, and LMUL=2 instructions, then we got Arm 12 uops, and RISC-V 20 uops. clang currently defaults...
  11. Discussion RISC V Latest Developments Discussion [No Politics]

    Except the important part, the inner loop, is 8 instructions for RISC-V and 6 instructions for Arm, however it's 8 uops for both: https://godbolt.org/z/vMv4G98zf Oh, and the RISC-V inner loop is 22 bytes, while the Arm inner loop is 24 bytes.
  12. Discussion RISC V Latest Developments Discussion [No Politics]

    I've responded to the original twitter thread, so I'll just copy past my comment on r/riscv that paraphrases the answers: Regarding RVC decode complexity I think the decode is missing part of the picture. For a fixed size isa to be competitive it needs to have more more complex...
  13. Discussion RISC V Latest Developments Discussion [No Politics]

    I think it's because that's what the other RISC-V vendors use, as soon as one of them switches to SPEC2017 the others will likely follow. Keep in mind that Tenstorrent for Ascalon and SiFive for the P870 both report >18 SPECint2006/GHz, but tenstorrent has 8 wide, while P870 6 wide decode (and...
  14. Question Are scalable vectors the ultimate solution to fixed width SIMD?

    The lane size usually the vector length though. Or to put it differently, the width of the vector execution units, specifically for vector permutations, is usually the same as the vector length. Look at AVX512, there you've got a low latency high throughput vpermb, heck even vpermi2b, which...
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |