Info PrimeGrid Challenges 2025

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

StefanR5R

Elite Member
Dec 10, 2016
6,392
9,857
136
How is ~8 hours for the type units in the upcoming contest ?
PrimeGrid 9.51 Generalized Cullen/Woodall (LLR) (mt) llrGCW_662112230_0 02:03:44 (15:55:12) 96.49 25.935 05:52:17 09d,21:56:08 8C Running Turin
Closer to 8.6 hours average, that is with lasso in 8 x 8 config.
As far as I can tell from merely 8 tasks which are running here for parsnip, the workunit sizes of current GCW tasks are varying a lot.
Bash:
cd boinc/; grep FFT slots/*/stderr.txt
slots/0/stderr.txt:Using zero-padded AVX-512 FFT length 3200K, Pass1=1K, Pass2=3200, clm=1, 8 threads.
slots/1/stderr.txt:Using zero-padded AVX-512 FFT length 2880K, Pass1=192, Pass2=15K, clm=4, 8 threads.
slots/2/stderr.txt:Using zero-padded AVX-512 FFT length 3200K, Pass1=1K, Pass2=3200, clm=1, 8 threads.
slots/3/stderr.txt:Using zero-padded AVX-512 FFT length 3200K, Pass1=1K, Pass2=3200, clm=1, 8 threads.
slots/4/stderr.txt:Using zero-padded AVX-512 FFT length 3M, Pass1=192, Pass2=16K, clm=2, 8 threads.
slots/5/stderr.txt:Using zero-padded AVX-512 FFT length 2880K, Pass1=192, Pass2=15K, clm=4, 8 threads.
slots/6/stderr.txt:Using zero-padded AVX-512 FFT length 3456K, Pass1=192, Pass2=18K, clm=4, 8 threads.
slots/7/stderr.txt:Using zero-padded AVX-512 FFT length 3200K, Pass1=1K, Pass2=3200, clm=1, 8 threads.
The client's task duration estimations 1½ hours after start:
slot 0 – 09:26
slot 1 – 07:39
slot 2 – 08:58
slot 3 – 09:15
slot 4 – 07:35
slot 5 – 07:39
slot 6 – 09:01
slot 7 – 09:16
This is a 9554P at 400W PPT limit. Core clock is 3.5 GHz (median), power draw at the wall is 480 W. Linux + affinity_mgr.sh.

Out of curiosity, is anyone else who runs PrimeGrid on Windows (rather than on Linux) using Process Lasso also? And if so, does it offer an explicit "bind to last level cache domains" setting, or does the user have to come up with something else which yields the same effect?
Anybody? No?

I don't expect anyone to post their potentially super-secret :-) Process Lasso recipe here. My curiosity would already be satisfied if anybody could confirm with authority that it is in fact possible to configure Process Lasso precisely for the purpose of binding all threads of one process to the logical CPUs of one CCX (that is, of course, for multiple concurrent processes which start and end at arbitrary times, on machines with multiple CCXs).

From a quick look at bitsum.com, it seems as if the closest thing one could do is to define "CPU sets" for each CCX, then add all of these CPU sets to the "process match" corresponding with PRST, and then hope that the 8 CPU time consuming threads of each PRST process are indeed latching onto one and the same CPU set most of the time. (There are more than 8 threads spawned by PRST, but only 8 — or however many the user configured at the PrimeGrid site or via app_config — are performing the actual computation.) But the CPU sets feature of Process Lasso is limited to "current" settings in the gratis trial version; "always" rules are reserved to the paid Pro version.

@Markfw, if you haven't done so already in the past, one way to check if Process Lasso is likely doing what it is meant to do is:
  • Suspend all but one PrimeGrid task.
  • Watch per-CPU utilization in task manager. On Windows, the result should be (I think) that the logical CPUs 0, 2, 4, 6, 8, 10, 12, 14 are used almost all the time, and others only minimally.
  • Unsuspend another PrimeGrid task, and logical CPUs 16, 18, 20, 22, 24, 26, 28, 30 should get busy.
  • And so on, when you unsuspend more tasks, or suspend some again. Could also be that tasks stay running on CCXs with higher CPU IDs; the ones above are of cores the IDs of the first two CCXs.

Besides, at a computer which is running PrimeGrid PRST, and which does not run e.g. Folding@Home on the side, and which is plugged into a wall power meter, the power meter will show a high and very steady power consumption when CPU affinities are set suitably, and a lower and notably fluctuating power consumption if no CPU affinities are set. The more CCXs a computer has got, the wilder will the power fluctuations be without CPU affinities.

A reminder of the most prominent alternatives on Windows:
  • EPYCs and Threadrippers can be set to 1 CCX = 1 NUMA domain in the BIOS. Caveat: I don't know how effective Windows' NUMA handling is; this method works nicely with Linux at least.
  • Computers with reasonably low amount of CCXs could run one BOINC client instance per CCX, with CPU affinity defined for the BOINC client process.
  • Pavel Atnashev's AffinityWatcher (github link)
  • pschoefer's Powershell script (SG forum, 2022 season thread, #535), perhaps with xii5ku's extras (SG forum, 2023 season thread, #485)
If I had a Windows computer at PrimeGrid or other projects which profit from CPU affinities, I would strongly gravitate to the last option in this list, somehow. :-)
I still am a fan of tools which are made to do specifically what they are meant to achieve. :-)
 
Reactions: TennesseeTony

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
Well, since I am using simple settings, here is the Turin lasso screen. No SMT (simulated by lasso) and as I said the average time is somewhere close to or less than I posted before. We will see how things are going in about 3 more hours.

Bad news is, its 35F outside, and with the AC on, and almost nothing else running (F@H will be very bad for me this week) and the AC is running and its 79F inside. I can't stand it much cooler. God help my electric bill this week. and most of the windows are open too, to assist in cooling ! and same ghz as yours ~3.5 ghz, except the Turin runs 2.1. Thats with an AIO and a fan cooling the VRMs to 70c, cpus at 60c. Without the fan, vrms were pegged at 100c !!!

Lastly, That picture is of remote control using supermicro software.

 

StefanR5R

Elite Member
Dec 10, 2016
6,392
9,857
136
Well, since I am using simple settings, here is the Turin lasso screen.
The topmost CPU utilization graph looks good insofar that all physical cores are running one software thread each.

The process list below this is looking both good…
  • The CPU sets should follow the formula
    [ (0;2;4;6;8;10;12;14) + {0 or 16 or 32 or 48 or 64 or 80 or 96 or 112} ].
    Only then each of these eight CPU sets would correspond to one CCX exactly. At least that's according to what I understood how Windows is numbering the logical CPUs. The CPU affinity lists which your screenshot is showing are indeed following this formula.
…and bad at the same time:
  • The CPU sets to which PRST processes are bound each occur twice. Best would be if each PRST process had an individual CPU set which differs from all others.
It seems like Process Lasso is doing the desired job only partially. Edit: Or Process Lasso has got some sort of UI bug in which it displays CPU affinities of the 1st NUMA node only, not of the second NUMA node. After all, the topmost CPU utilization graph clearly shows that both NUMA nodes are loaded.

One step which *may* (perhaps) help with that would be to reboot into BIOS, go to "Advanced" --> "ACPI Settings" --> set "ACPI SRAT L3 Cache As NUMA Domain" to "Enabled", and boot back into Windows. After that, the bottom line of Process Lasso should say 8 NUMA nodes instead of the current 2 NUMA nodes. Each CCX will be a NUMA node then. And I hope that the NUMA node boundaries directly translate to process scheduling boundaries. (On Linux, the latter would be true by default without extra tools. As soft boundaries though, not as hard boundaries; but the overall performance effect would be close to as if they were hard boundaries.)

Do you have the computer plugged into a power meter? Or is HWMonitor or a similar software showing CPU socket power consumption? — If yes, scheduling optimizations would show up as lower power fluctuations and higher steady power draw.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
The topmost CPU utilization graph looks good insofar that all physical cores are running one software thread each.

The process list below this is looking both good…
  • The CPU sets should follow the formula
    [ (0;2;4;6;8;10;12;14) + {0 or 16 or 32 or 48 or 64 or 80 or 96 or 112} ].
    Only then each of these eight CPU sets would correspond to one CCX exactly. At least that's according to what I understood how Windows is numbering the logical CPUs. The CPU affinity lists which your screenshot is showing are indeed following this formula.
…and bad at the same time:
  • The CPU sets to which PRST processes are bound each occur twice. Best would be if each PRST process had an individual CPU set which differs from all others.
It seems like Process Lasso is doing the desired job only partially. Edit: Or Process Lasso has got some sort of UI bug in which it displays CPU affinities of the 1st NUMA node only, not of the second NUMA node. After all, the topmost CPU utilization graph clearly shows that both NUMA nodes are loaded.

One step which *may* (perhaps) help with that would be to reboot into BIOS, go to "Advanced" --> "ACPI Settings" --> set "ACPI SRAT L3 Cache As NUMA Domain" to "Enabled", and boot back into Windows. After that, the bottom line of Process Lasso should say 8 NUMA nodes instead of the current 2 NUMA nodes. Each CCX will be a NUMA node then. And I hope that the NUMA node boundaries directly translate to process scheduling boundaries. (On Linux, the latter would be true by default without extra tools. As soft boundaries though, not as hard boundaries; but the overall performance effect would be close to as if they were hard boundaries.)

Do you have the computer plugged into a power meter? Or is HWMonitor or a similar software showing CPU socket power consumption? — If yes, scheduling optimizations would show up as lower power fluctuations and higher steady power draw.
Its 400 watts TDP, thats all I know about power, no meters on any boxes right now. Since the tasks are so similar in elapsed time, I can only assume lasso is working as intended.

I don't even have all CPUs done with one unit yet, but the top 5 users worldwide are 3 of our team ! So I don't want to mess with success right now.
1​
markfwTeAm AnandTech
3 088 689.19​
58​
2​
IcecoldTeAm AnandTech
2 012 475.08​
40​
3​
tngAntarctic Crunchers
1 767 318.29​
38​
4​
EA6LERomania
1 126 570.67​
21​
5​
crashtechTeAm AnandTech
826 747.07​
15​
 

TennesseeTony

Elite Member
Aug 2, 2003
4,290
3,751
136
www.google.com
How is ~8 hours for the type units in the upcoming contest ?
PrimeGrid 9.51 Generalized Cullen/Woodall (LLR) (mt) llrGCW_662112230_0 02:03:44 (15:55:12) 96.49 25.935 05:52:17 09d,21:56:08 8C Running Turin

8 hours? Ha! My Gold 6138 runs one task in 7 hours, using, uhm, all 20 cores. lol

I just ain't got the 'cache' to play with the big dogs.

edit: Just for giggles, Broadwell running 21 cores (avx-2) is coming in at about 9 hours. I expect to come in 2nd (from last) this challenge.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
8 hours? Ha! My Gold 6138 runs one task in 7 hours, using, uhm, all 20 cores. lol

I just ain't got the 'cache' to play with the big dogs.

edit: Just for giggles, Broadwell running 21 cores (avx-2) is coming in at about 9 hours. I expect to come in 2nd (from last) this challenge.
I thought the 3.5 ghz and avx-512 (at half speed) was what gave by "herd" the power, but this must not use avx-512 at all. Why the Turin does almost as well as the Genoa, I have no idea since it runs a lot slower (3.5 vs 2.1)
 

Icecold

Golden Member
Nov 15, 2004
1,142
1,083
146
I thought the 3.5 ghz and avx-512 (at half speed) was what gave by "herd" the power, but this must not use avx-512 at all. Why the Turin does almost as well as the Genoa, I have no idea since it runs a lot slower (3.5 vs 2.1)
It heavily uses AVX 512. A 9950x is significantly faster on these tasks than a 7950x would be, for example.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,566
4,483
75
I thought the 3.5 ghz and avx-512 (at half speed) was what gave by "herd" the power, but this must not use avx-512 at all. Why the Turin does almost as well as the Genoa, I have no idea since it runs a lot slower (3.5 vs 2.1)
Genoa has emulated AVX512. It runs each AVX512 instruction in two parts, which is only slightly better than no AVX512 at all. Turin has real AVX512. That's why it's almost as fast as the Genoa.
 
Reactions: Markfw

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
Genoa has emulated AVX512. It runs each AVX512 instruction in two parts, which is only slightly better than no AVX512 at all. Turin has real AVX512. That's why it's almost as fast as the Genoa.
I guess mine impression is based on my experience, in that my Milan was WAY overpowered by Genoa, and Rome was also way down there. I wish I could afford a non-ES Turin as they are faster than Genoa. And unlike some previous competitions, By Turin is running way below 95c and 100c respectively at 60 and 65, so that not the reason for the speed. (cpu and vrm temps). The below pic is fully loaded with 8x8 tasks of the current competition.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
blew a circuit breaker. Not sure how much I lost. Up now.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,566
4,483
75
Day 1 stats:

Rank___Credits____Username
1______10181995___markfw
2______10159765___Icecold
6______4215596____crashtech
8______1846502____cellarnoise2
20_____880013_____w a h
42_____371246_____Orange Kid
62_____235007_____Ken_g6
78_____173166_____waffleironhead
90_____120644___10esseeTony
140____49384______johnnevermind

Rank__Credits____Team
1_____28233321_>_TeAm AnandTech
2_____7962360_+__Czech National Team
3_____7282494_+__Antarctic Crunchers
4_____7108837__[H]ard|OCP
 

TennesseeTony

Elite Member
Aug 2, 2003
4,290
3,751
136
www.google.com
"Ruuun Forest! Ruuun!" - Jenny, aka, Jen-naaaa.

"...and the race is on, and here comes Pride in the back-stretch..." - Don Rollins, song writer

Perseverance is not a long race, it is many short races one after the other- Walter Elliot

Plodding wins the race- Aesop

The trouble with the rat race is that, even if you win, you're still a rat- Lily Tomlin

I always try to get the best result out of it, I'm not there for to just sit in second or third....I want to win every single race, and I will always go for it.- Max Verstappen
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,392
9,857
136
I guess mine impression is based on my experience, in that my Milan was WAY overpowered by Genoa, and Rome was also way down there.
As far as PrimeGrid LLR/LLR2/PRST/Genefer-CPU are concerned, this is what the Zen generations brought to the table:
  • Zen 2 over Zen 1: 7nm process instead of 14nm, enabled twice the SIMD width per core and much higher perf/W to go with it
  • Zen 3 over Zen 2: CCX size = 8 cores & 32 MB LLC instead of 4 cores & 16 MB LLC; this is important for most PrimeGrid subprojects but the smallest (not many PG subprojects are left which work well with 16 MB cache per task)
  • Zen 4 over Zen 3: 5nm process instead of 7nm and respectively improved perf/W, introduction of AVX512 instruction support (yet unchanged SIMD execution width per core) netting lower frontend pressure in SIMD workloads
  • Zen 5 over Zen 4: 4nm process instead of 5nm with a corresponding small perf/W increase, SIMD width per core doubled to 512 bits wide pipelines (only usable if the program emits actual AVX512 code; that is, AVX2 programs get to see about the same width as with Zen 4 — it's not exactly the same width as other core structures and datapaths changed also, e.g. cache bandwidths)
The above pertains desktop Ryzen, EPYC, and Threadripper. Mobile Ryzen is in many cases cut down versus the above (cache; SIMD execution units width). OTOH sometimes mobile is a process step ahead of desktop.

except the Turin runs 2.1
Ah, I missed this. Do you remember what clocks this Turin makes when it runs something like WCG MCM? I am asking because in a power-limited scenario like with all mid- to high-core-count EPYCs, if PrimeGrid is not "lassoed" optimally, the CPU core clocks are higher than if properly optimized. Not optimized: Cores wait for RAM a lot and don't spend much energy hence are automatically clocked higher. Optimized: Cores spend a lot of energy quickly due to heavy SIMD utilization, and are therefore clocked lower compared to more generic program code.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
Ah, I missed this. Do you remember what clocks this Turin makes when it runs something like WCG MCM? I am asking because in a power-limited scenario like with all mid- to high-core-count EPYCs, if PrimeGrid is not "lassoed" optimally, the CPU core clocks are higher than if properly optimized. Not optimized: Cores wait for RAM a lot and don't spend much energy hence are automatically clocked higher. Optimized: Cores spend a lot of energy quickly due to heavy SIMD utilization, and are therefore clocked lower compared to more generic program code.
I rarely have the Turin turned on, to save power in total, mostly only 7950x and 9950x. I can't remember what speed it runs in WCG.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,954
15,931
136
well, I had a 7950x offline for god knows how long. It rebooted trying to force win 11 on me. Damn windows.

edit: was just checking the box again, and noticed: we have approx as many points as the next 4 teams put together !!
 
Last edited:
Jul 27, 2020
24,142
16,837
146
I kinda wish I had my own property here in UAE with a solar panel array. The one thing you can almost bet on having every single day here is sunshine. Lots and lots of it!

And maybe a thermoelectric generator to convert the heat from the sun into electricity.

Now if I only had a few million lying around here somewhere, I could get started on such a project
 
Reactions: TennesseeTony

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,566
4,483
75
I kinda wish I had my own property here in UAE with a solar panel array. The one thing you can almost bet on having every single day here is sunshine. Lots and lots of it!

And maybe a thermoelectric generator to convert the heat from the sun into electricity.

Now if I only had a few million lying around here somewhere, I could get started on such a project
Here's a start:


It may not be available in your area quite that way by electrical code, though.
 
Reactions: igor_kavinski
Jul 27, 2020
24,142
16,837
146
It may not be available in your area quite that way by electrical code, though.
Yeah, haven't seen that anywhere here. It's crazy how open-minded and future-interested Europeans are, though. If they had a population approaching that of China+Russia+India, the European Union would've already established a moonbase by now and working on starting a Mars colony.
 
Reactions: TennesseeTony

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,566
4,483
75
Day 2 stats:

Rank___Credits____Username
1______22264004___Icecold
2______21656564___markfw
6______9321612____crashtech
7______7503827____cellarnoise2
11_____4695133____w a h
18_____2681471____ChelseaOilman
36_____1131428____Orange Kid
91_____463135_____waffleironhead
103____392114_____Ken_g6
105____389991_____mmonnin
107____385859_____johnnevermind
161____120644___10esseeTony
248____864________[TA]Skillz

Rank__Credits____Team
1_____71006652___TeAm AnandTech
2_____24455062___Czech National Team
3_____18396609___[H]ard|OCP
4_____14886156___SETI.Germany

Somebody, who shall remain nameless (but who is currently in last place on our team) issued a challenge on the PrimeGrid Discord, and now they're trying to get everybody else to join one team to challenge us. I guess we'll see what happens.
 

cellarnoise

Senior member
Mar 22, 2017
807
436
136
Day 2 stats:

Rank___Credits____Username
1______22264004___Icecold
2______21656564___markfw
6______9321612____crashtech
7______7503827____cellarnoise2
11_____4695133____w a h
18_____2681471____ChelseaOilman
36_____1131428____Orange Kid
91_____463135_____waffleironhead
103____392114_____Ken_g6
105____389991_____mmonnin
107____385859_____johnnevermind
161____120644___10esseeTony
248____864________[TA]Skillz

Rank__Credits____Team
1_____71006652___TeAm AnandTech
2_____24455062___Czech National Team
3_____18396609___[H]ard|OCP
4_____14886156___SETI.Germany

Somebody, who shall remain nameless (but who is currently in last place on our team) issued a challenge on the PrimeGrid Discord, and now they're trying to get everybody else to join one team to challenge us. I guess we'll see what happens.
I don't see a "deep sheet" on this list? Guess we will find out
 
Reactions: Ken g6
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |