milkyway_nbody bogus credits?

StefanR5R · Nov 18, 2023

On a whim, I started running MilkyWay@Home's "N-Body Simulation" today.

On a dual-socket EPYC 7452 (2x 155 W PPT), I am running 4-threaded tasks, all SMT threads used, and CPU affinity of tasks set to be aligned with CCXs. Average core clocks are 2.6 GHz. Edit, for the time being, the host runs 50% MilkyWay and 50% Asteroids (2 client instances with MW@H with socket affinities, one client instance with A@H without affinities).

On a single-socket EPYC 9554P (360 W PPT), I am running 8-threaded tasks, all SMT threads used, and CPU affinity of tasks set to be aligned with CCXs. Average core clocks are 3.6 GHz. A small part of the current workload on this host consists of Asteroids@home.

(I configured the thread count per task via app_config.xml. There is no setting for this on the MW@H web site. By default, the application would start either 16 threads or as many threads as the number of logical CPUs which BOINC is allowed to use, whichever is less. The thread count refers to the number of computational worker threads; each task launches one more thread in addition but this consumes only a sub-seconds amount of CPU time i.e. just sleeps all the time.)

Results from client instance on socket 0 of the dual-7452:

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
4630.24	14132.19	493.01	3.05	0.106	0.035
4467.86	13834.82	494.6	3.10	0.111	0.036
4511.08	13898.48	585.07	3.08	0.130	0.042
4350.68	13644.57	487.95	3.14	0.112	0.036
4653.54	14272.68	475.13	3.07	0.102	0.033

Results from client instance on socket 1 of the dual-7452:

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
4427.66	14053.83	498.26	3.17	0.113	0.035
4615.23	14315.8	491.47	3.10	0.106	0.034
4392.79	13786.02	486.87	3.14	0.111	0.035
4372.72	13903.69	486.58	3.18	0.111	0.035
4662.52	14655.39	476.96	3.14	0.102	0.033

Results from the 9554P:

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
1749.13	11345.92	207.24	6.49	0.118	0.018
4988.38	35653.47	700.93	7.15	0.141	0.020
5045.94	36031.08	626.63	7.14	0.124	0.017
1536.01	10843.42	215.54	7.06	0.140	0.020
6499.84	46624.73	845.3	7.17	0.130	0.018
4866.2	34857.28	700.06	7.16	0.144	0.020
1533.58	10750.01	194.52	7.01	0.127	0.018
1488.21	10427.97	191.67	7.01	0.129	0.018
6778.63	48483.06	946.37	7.15	0.140	0.020
5048.96	36133.04	756.07	7.16	0.150	0.021
4931.97	35358.92	721.99	7.17	0.146	0.020
6762.51	48365.44	846.41	7.15	0.125	0.018
4966.59	35623.22	776.77	7.17	0.156	0.022
4900.25	35102.64	671.11	7.16	0.137	0.019
4909.32	35267.84	680.48	7.18	0.139	0.019
6688.28	47963.89	871.52	7.17	0.130	0.018
1517.63	10711.25	192.42	7.06	0.127	0.018

The first three columns are cut-and-paste from the results tables on the MW@H web site.

The fourth column, CPU time per Run time, corresponds very well with the average CPU utilization which I am seeing with "top" or "htop". As you can see, scaling isn't very good even at low thread count.

Now to the subject of this thread:

As you can see, the 3.6 GHz Zen 4 host gets only about half the credit per CPU time as the 2.6 GHz Zen 2 host. Instead, the Zen 4 host should get more credit per CPU time than the Zen 2 host, due to faster CPUs.

Likewise, the Zen 4 host gets merely 1.23 times the credit per run time as the Zen 2 host, even though the Zen 4 host throws double the amount of CPUs, and faster CPUs, onto each task.

I'll switch the Zen 4 host to 4 threads per task too and wait and see where this will be going.

Edit, in case that 4-threaded tasks on the Zen 4 host won't work out, I will have to test without Asteroids@Home in the mix.

Kiska · Nov 19, 2023

StefanR5R said:
Now to the subject of this thread:

As you can see, the 3.6 GHz Zen 4 host gets only about half the credit per CPU time as the 2.6 GHz Zen 2 host. Instead, the Zen 4 host should get more credit per CPU time than the Zen 2 host, due to faster CPUs.

Likewise, the Zen 4 host gets merely 1.23 times the credit per run time as the Zen 2 host, even though the Zen 4 host throws double the amount of CPUs, and faster CPUs, onto each task.

I would think this is the effect of Credit~~Screw~~New, but I could be wrong in assuming that is the cause. Also if it is CreditNew then have you run the benchmark, it does take that into consideration

StefanR5R · Nov 19, 2023

BTW, the foremost reason why I looked at credits/seconds is because I wanted to figure out if the # tasks/socket and # threads/task which I chose are OK-ish, without having an Nbody benchmark like I do have for LLR2 and Genefer.

[For a benchmark, I would like the application to report its progress percentage, so that I don't have to run it until full completion. This application reports progress to the BOINC client, but not to stderr (when run in BOINC, I haven't tried running it standalone yet which might change its output).]

Here are results from almost a day later, now with a 4-core Haswell added (seems to perform not so well compared to Zen 2 and newer, which I also saw when I looked at other users' hosts), with Asteroids side load diminished on Epyc Rome and with Epyc Genoa switched to 4-threaded tasks:

loaded with one 3-threaded Nbody task, one Asteroids task, a desktop GUI, and at times with Firefox gone rampant.

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
18848.28	48150.75	700.32	2.55	0.037	0.015
28967.08	71828.37	929.35	2.48	0.032	0.013

BOINC benchmark result known to the MW@H validator: 5.48 / 23.24 billion FP/INT ops/s

loaded with 30 4-threaded Nbody tasks (w/ affinity) and 8 Asteroids tasks (w/o affinity)
socket 0:

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
4420.16	16384.31	226.9	3.71	0.051	0.014
4504.43	16661.59	246.56	3.70	0.055	0.015
4436.29	16401.34	238.69	3.70	0.054	0.015
4597.74	16995.54	225.45	3.70	0.049	0.013
19043.64	71337.95	1006.79	3.75	0.053	0.014
18840.05	70656.27	926.77	3.75	0.049	0.013
19067.7	71348.53	1026.22	3.74	0.054	0.014
4370.07	16160.86	270.78	3.70	0.062	0.017
14084.18	52709.73	789.49	3.74	0.056	0.015
4275.55	15903.83	207.39	3.72	0.049	0.013

BOINC benchmark result known to the MW@H validator: 5.05 / 13.66 billion FP/INT ops/s

socket 1:

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
4471.59	16614.05	203.98	3.72	0.046	0.012
4315.8	15982.16	220.43	3.70	0.051	0.014
14200.75	53211.68	756.79	3.75	0.053	0.014
3039.32	11152.6	189.74	3.67	0.062	0.017
14287.59	53464.39	693.2	3.74	0.049	0.013
19394.93	72649.95	970.28	3.75	0.050	0.013
267.74	802.3	15.51	3.00	0.058	0.019
11441.35	40730.73	688.09	3.56	0.060	0.017
19492.5	72988.49	1006.38	3.74	0.052	0.014
19063.6	71460.87	1000.92	3.75	0.053	0.014

BOINC benchmark result known to the MW@H validator: 5.08 / 13.64 billion FP/INT ops/s

loaded with 31 4-threaded Nbody tasks (w/ affinity) and 3 Asteroids tasks (w/o affinity)

Run time (sec)	CPU time (sec)	Credit	CPU time per Run time	Credit per Run time	Credit per CPU time
2866.34	10560.05	257.32	3.68	0.090	0.024
1.01	0.02	0.16	0.02	0.158	8.000
9176.4	34201.24	720.81	3.73	0.079	0.021
8872.48	29967.97	714.69	3.38	0.081	0.024
9167.52	34161.78	676.97	3.73	0.074	0.020
9442.43	35138.96	731.79	3.72	0.078	0.021
2468.48	9137.94	190.44	3.70	0.077	0.021
2760.29	10219.83	226.69	3.70	0.082	0.022
9466.38	35170.91	855.54	3.72	0.090	0.024
9452.48	35133.66	774.39	3.72	0.082	0.022
9491.44	35264.34	762.17	3.72	0.080	0.022
9165.48	34126.04	707.19	3.72	0.077	0.021
12815.3	47786.69	1073.76	3.73	0.084	0.022
9529.45	35378.11	759.8	3.71	0.080	0.021
2837.27	10451.12	233.28	3.68	0.082	0.022
9393.2	34961.86	717.2	3.72	0.076	0.021
12789.17	47522.46	931.42	3.72	0.073	0.020
1.02	0	0.27	0.00	0.265	∞
12799.35	47639.35	1032.06	3.72	0.081	0.022
9473.44	35203.73	709.28	3.72	0.075	0.020

BOINC benchmark result known to the MW@H validator: 6.55 / 23.83 billion FP/INT ops/s

That's more like it!
Average credit per run time: 0.0532 (Rome), 0.0800 (Genoa, = 1.50x Rome)
Average credit per CPU time: 0.0145 (Rome), 0.0217 (Genoa, = 1.50x Rome)
I omitted the two extremely short tasks from Genoa's average.

The longest task so far took 20 CPU hours on Rome, which seems manageable. I will therefore try 2-threaded and 1-threaded tasks too and see where PPD land with those.

Credit per run time went down a lot since yesterday, on Rome and on Genoa. So I suppose that's indeed CreditNew at work, slowly trying to converge to some sort of "proper" PPD.

Edit,
The current #1 host by RAC, a Threadripper 3990X (Zen 2, 64c/128t, 280 W default TDP, BOINC benchmark = 4.71/21.36 billion FP/INT ops/s) is running 16-threaded tasks and gets…
…8.83 average CPU time per Run time
…0.1590 average credit per Run time
…0.0180 average credit per CPU time = in between my current Rome and Genoa scoring.

The large variety and presumed unpredictability of workunit sizes, plus the somewhat differing thread counts per task between hosts (most run with 16t, but not all), should make it rather difficult for CreditNew to converge.

crashtech · Nov 19, 2023

Thanks for doing my homework again!

Icecold · Nov 20, 2023

Thank you Stefan for all your effort in documenting this!

StefanR5R · Nov 20, 2023

StefanR5R said:
Average credit per run time: 0.0532 (Rome), 0.0800 (Genoa, = 1.50x Rome)
Average credit per CPU time: 0.0145 (Rome), 0.0217 (Genoa, = 1.50x Rome)

Another day later:
Average credit per run time: 0.048 (Rome), 0.072 (Genoa), 0.160 (TR 3990X)
Average credit per CPU time: 0.0131 (Rome), 0.0195 (Genoa), 0.0193 (TR 3990X)
PPD from 128 threads: 132,000 (Rome), 197,000 (Genoa), 107,000 (TR 3990X)

As you can see, credit/time of my two hosts went further down since yesterday while CreditNew is trying to make sense of them, whereas the TR 3990X's credit rate stood about the same. Its credit/CPU time actually went up a little.

That's with still 4-threaded tasks on my two computers and 16-threaded tasks on the Threadripper.
PPT: 2x 155 W (Rome), 360 W (Genoa), 280 W ? (TR 3990X)
PPD/PPT: 426 (Rome), 547 (Genoa), 382 ? (TR 3990X)
Power draw at the wall: 330 W (Rome), 365 W (Genoa), unknown (TR 3990X)

I also found some other Epyc 9554 hosts further down in the top_hosts table. But they have too few results to make for a valid comparison.

At this point, I will leave it at that with these credit-based stats. Performance tuning on this basis would take far too long, as CreditNew would need several days of computation with constant host-side settings, in order to converge. And even then the fuzziness of CreditNew will be big enough to obfuscate the real performance optimum.

In other words, I will have to look into making a benchmark test with a fixed workunit. But that's something for another time; there is Radioactivity and Genefer also which should be taken care of…

StefanR5R · Nov 20, 2023

StefanR5R said:
I also found some other Epyc 9554 hosts further down in the top_hosts table. But they have too few results to make for a valid comparison.

I am referring to this nice collection:
https://milkyway-new.cs.rpi.edu/milkyway/hosts_user.php?userid=57498
I grabbed ~50 valid results from several of these and computed an average:
They are 16-threaded tasks as usual and amount to poor 103,000 PPD per 128 threads.

StefanR5R · Nov 23, 2023

One more thing.

StefanR5R said:
Power draw at the wall: 330 W (Rome), 365 W (Genoa)

…which means that's a rather light workload. You may have already guessed that from the low ratios of CPU time to run time which I reported. Another hint:
min/avg/max core clocks =
3.39/3.66/3.74 GHz (Genoa, f_base/f_max = 3.1/3.75 GHz),
2.65/2.74/2.85 GHz (Rome, f_base/f_max = 2.35/3.35 GHz).

Asteroids@home, in contrast, runs at
2.34/2.35/2.35 GHz (Rome, i.e. doesn't get past f_base); I don't have Genoa figures right now.

StefanR5R · Jan 21, 2024

StefanR5R said:
MilkyWay@Home had a general drop of granted credit during the week and other unusual developments, not sure why

The admins did… something. (post 76760)
At first, this caused that an undesired large number of new workunits was generated and, AFAIU, as a side effect validation stopped. (post 76762)
Then there were a lot of validations with 0.00 credit. (post 76769)
The respective workunits still have one unsent or in-progress task; some have an additional validation-inconclusive result. We'll see whether or not credit will be corrected once results of the unsent/in-progress tasks come in. (post 76780)

Skillz · Jan 22, 2024

I thought your credit dip in the last week or so was due to you moving on to something else. Didn't realize the project was having issues. Hopefully it gets sorted out.

StefanR5R · Jan 22, 2024

Due to the (politely said) unique way how MilkyWay implements doublechecking (wingman task is generated only after a first result came in, not together with the first task of a workunit), and because an excessive amount of new workunits had been generated accidentally, it will likely take a long while until credit is granted again at MilkyWay. (New results are piling up in "inconclusive" state now, known as "pending" at other projects.)

For the same reason, it will probably also take a long while until we see whether or not the 0.00 credit, which was assigned to one or more days worth of results across all participants, will be overridden with non-zero credit. At least so far it looks as if the scientists will be able to use these results.

From what I read, the current admin took over without own BOINC server experience and presumably without his predecessor available for guidance. If so, then mishaps like this are pretty much inevitable.

StefanR5R · Jan 26, 2024

Two days ago or so, the admin
– removed many of the unneeded new ready-to-send tasks on the server,
– switched all 0.00-credit valid results back to "validation inconclusive".
Meaning, validations will get back to normal sometime soonish, and all results which accidentally weren't credited will be so in the process.

StefanR5R · Jan 29, 2024

Kiska said:
Now if milkyway@home can validate WUs it would be great

Here is a rough estimation.

Each valid result from mid January used to earn about 1,000 credits. At least this is the order of magnitude of what I see in the current few valid results of the ~20 top hosts. There is quite some credit variation though. Let's go with 1,100 credit/result on average. (source)
Before January 17, the server maintained a level of ~1,000 ready-to-send NBody tasks. Then the mishap with the huge number of new tasks happened. The admin removed many of them on January 24, such that there is now a level of ~690,000 ready-to-send. (source)
Before January 17, MilkyWay@Home gave out typically ~14 M credit per day globally, sometimes more. (source).
So I guess that MilkyWay@Home received on the order of 13,000 valid results per day before the mishap.
Let's optimistically assume that there is still the same amount of computer capacity active.
Also let's assume that average workunit size stays the same as earlier in January, and that the fraction of successful returns remains the same.
If so, ~690,000 "*_0" tasks / ~13,000 results/day = ~50 days = 7...8 weeks is what it takes until there are results returned for all of these "*_0" tasks.
From what I understand, only after this will the server start to assign "*_1" tasks to hosts. And obviously, the server needs to receive valid results from "*_1" tasks in order to validate the pile of earlier "*_0" results.
Note, there is currently a quite constant level of ~690,000 tasks ready to send. This is because for each "*_0" result returned, the server generates a "*_1" task. (This is true for success returns as well as error returns.) That is, the current pile of tasks ready to send is slowly containing fewer _0 tasks and more _1 tasks. But still, the server assigns all those _0 tasks first because they were queued earlier.
We need to count these 7...8 weeks from January 17 on. (That's because that's the day from which on there were only a few _1, _2, _3... tasks left from before, and far more _0 tasks stuffed into the queue.) Which means that hosts will begin to receive "*_1" tasks in the middle of March, maybe early March.

Two questions:
– Is my math sound?
– How many active contributors will turn away during this time of maybe two months during which crediting is almost entirely deferred?

PS:
The recently biggest contributing "team" was Gridcoin. Arguably, the minders decide on their participation in projects differently than everyone else. However, Gridcoin's computation has been less than 1/10th of the overall computation. So the question whether or not folks will stick with MilkyWay during this time is going to be answered from the perspective of normal contributors.

PPS:
My own plan was to run almost exclusively MilkyWay outside of competitions for as long and as much as I can use computers to heat the apartment. So far I have no reason to deviate from this plan. That is, my amount of CPUs active at MilkyWay solely depends on outside temperatures, besides contests, presumably until well into spring.

StefanR5R · Jan 29, 2024

It was brought to my attention that many (perhaps half) of the tasks take only ~1/10 of the normal tasks (and give ~1/10th of the credit). Then my estimation shifts to ~23,000 results/day at mid January, and completion of the current pile of _0 tasks would happen at about mid February at this rate.

milkyway_nbody bogus credits?

StefanR5R

Elite Member

Kiska

Golden Member

StefanR5R

Elite Member

crashtech

Lifer

Icecold

Golden Member

StefanR5R

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

Skillz

Senior member

StefanR5R

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

StefanR5R

Elite Member

TRENDING THREADS