Help with F@H Client

Ionstream

Member
Nov 19, 2016
55
24
51
Hi guys, came across Ryan's article on distributed computing recently, and I thought I'd put my new rig to some use. I came across some extra spicy chips on eBay a few months back, and have built a dual-socket system for my undergraduate studies. It's not doing much right now though, so helping some boffins crunch their numbers seemed to be the most appropriate thing to do.

F@H doesn't seem to be using all my CPUs though, which is a shame. F@H is only using 32 logical cores on the first, and nothing on the 2nd. Does anyone know what's going on?

Setup:
2x E5-2686v3
64GB RAM
Win10 Pro
F@H v7 client
 
Reactions: Orange Kid

Kiska

Golden Member
Apr 4, 2012
1,022
290
136
You might have to go, Advanced control -> Configure -> and either change edit the CPU type to match how many threads you have or add another CPU type into the client
 

Ionstream

Member
Nov 19, 2016
55
24
51
Hmmm, I did manage to successfully add a slot, but it didn't seem to do much, apart from split the load from 2 slots across a single CPU. Also, I noticed that F@H only recognises 32 cores on my system, when I should have 36/72.

Task Manager shows that the load is either placed on Node 0 or 1, but never both at the same time. What's going on?
 

Pokey

Platinum Member
Oct 20, 1999
2,766
457
126
This is way over my head but here is a FAH Forum thread that might help you: Link
 

Ionstream

Member
Nov 19, 2016
55
24
51
Actually, the 2-slot solution appears to be working perfectly. It's just that F@H won't switch nodes until the work unit has completed. As for the 4 extra cores not showing up, it apparently has something to do with F@H on Windows being a 32-bit application.

Also, I vaguely recall Windows being unable to assign more than 64 cores simultaneously to an application. Came across the issue when researching which Xeon to get.

Overall, I'm quite pleased with this little build. It's cranking out roughly 200k ppd (CPU) on medium settings.

Cheers for the help
 

Ionstream

Member
Nov 19, 2016
55
24
51
@Kiska That returns an error message. All good now though

http://i.imgur.com/fvSyVAo.png

EDIT 1: zzz looks like I spoke too soon :/
EDIT 2: Forums seem to indicate that there's a shortage of large work units, which explain why all the work is being loaded onto 1 node, and why my PPD just fell. I'll have to test this out.
 
Last edited:

TennesseeTony

Elite Member
Aug 2, 2003
4,220
3,649
136
www.google.com
I've not actually tried this, but I think it will work: Let all the work finish by pressing the finish button, then, instead of 2 huge CPU slots, make a bunch of 8 thread slots, until you reach the logical limit.

You may also want to disable hyper-threading to eliminate that issue with Windows.
 

Ionstream

Member
Nov 19, 2016
55
24
51
Should I let the program decide on allocation, or should I specify 8 CPUs per slot?

EDIT: Under task manager, F@H cores have an affinity with a certain processor group. I've tried to set it to both (since I have 2), but it's either one or the other.
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,722
14,750
136
WOW, 72 threads. I only have 24 thread systems, and it seems that so far you have exceeded my knowledge level, but good luck !
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,722
14,750
136
72/8=9 As silly as it sounds, maybe see if it will allow you to have (up to) 9 CPU slots, each slot manually set for 8 threads. Should have no trouble getting 8 thread tasks, assuming it works.

Or maybe 3 24's ? (less work) I know that 24 works fine.
 

Ionstream

Member
Nov 19, 2016
55
24
51
Hmmm, trouble is, F@H refuses to acknowledge my system has more than 32 cores/threads. I've tried quite a few combinations, but the darn thing just divides work slots against a hard limit of 32. When I switch off HT, both nodes are near maximum load, but I lose out on that juicy performance boost. It seems that the only way around this is running another instance of F@H.

Interestingly, switching off HT shaves 20W (out of 120W) off power consumption.
 

Ionstream

Member
Nov 19, 2016
55
24
51
Hehe thanks Task Manager reads 72 logical cores. I still feel it's the x86 implementation of F@H that's the problem. No biggie though, as I could always run SETI along with F@H, and still have cores left for other tasks.
 

bds71

Member
Nov 29, 2016
60
31
46
I thought the "big adv" unit were Linux only? (the ones that actually require 16 "whole" cores) or are you not trying to run those? as far as not seeing all the cores, I would wager you have the right of it (32 bit programming limitation) but that would be a guess

good luck with this - if you figure it out, keep us posted. perhaps he information can help someone down the road!
 

Ionstream

Member
Nov 19, 2016
55
24
51
YEEAAAAH BOIIIIS I GOT DIS THANG WERKIN

http://imgur.com/a/8ZEfJ

It's a two-step process:
1. Set slots and cores under configuration
2. Split group of active cores (e.g. FahCore_a7.exe) into each CPU, by changing processor group affinity of thread

With 2 a4 cores and 2 a7 cores, my rig is cranking out 160k PPD at full power.

Edit:
Since windows dynamically assigns cores, starting a new WU may require resetting processor group affinities. Trying to find out how to stop this from happening.
 
Last edited:
Reactions: TennesseeTony

StefanR5R

Elite Member
Dec 10, 2016
5,673
8,195
136
As a point of comparison, dual E5-2690 v4 on Linux here (2x 14 cores). As I mentioned in the December race thread, I tried one, two, and three CPU slots (each getting all, or half, or a third of the overall number of threads), and found that a single CPU slot utilizing all threads gets better PPD than two slots that are only half as powerful, and so on. I don't recall what the precise difference was; maybe 20 % drop when going from 1 to 2 slots.

The mainboard always applies turbo frequency when under load, i.e. for this CPU 3.2 GHz as all-core turbo for normal INT and FP arithmetic, and 2.9 GHz for AVX. So, utilizing 56 CPU threads for one slot gives roughly:

300 k PPD with the a4 core,
800 k PPD with the a7 core.

(One a4 unit with 1250000 steps takes about 120 minutes to; one a7 unit with 80000 takes just 15 minutes to complete. Good thing that I mostly receive a7 WUs. Bad thing, it seems as if the portion of a4 WUs are gradually increasing now...)

I guess the quick-return bonus system works nonlinear and thereby favors fewer WUs quickly completed over more WUs slowly completed in parallel. If that is true, then the division of your box into four slots may cost quite a few PPD.
 

Ionstream

Member
Nov 19, 2016
55
24
51
Hmm I'll give this thing a try. 3 GHz for all cores sounds pretty darn nifty. Mine tops out at 2.3 GHz with all 18 cores active. I guess you really pay for going one step up from the E5-268x - 269x series. Can't really complain though, I paid only $1000 SGD for both chips ^^.

I must explain though, the Windows client will not accept any input more than 32 cores, which is why I split my slots into 4. I'll halve that, and see what happens.

P.S. How hot are your CPU VRMs getting? I have one side at 75C, and another hitting 85C, and I cannot figure out what's causing the temperature difference. My board does not have a case, and the one which is warmer is actually receives more airflow.
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,673
8,195
136
I'll reply re: GHz and VRMs later today...

Meanwhile, one of my two dual-Xeon boxes got stuck with "Failed to get assignment from [xyz]: Empty work server assignment" for the last three hours. I have seen this happening already in November. So I switched this machine from one 56-thread slot to two 28-thread slots; same failure. Then I downgraded it to three 16-thread slots, and now it is receiving work again.

Alas, this 3-slot box as well as the other dual-Xeon box and a Broadwell-E PC are wasting their time with 0xa4 work only. Apparently there is no 0xa7 work left for the time being, and even 0xa4 work is being handed out in smaller portions now. (Strangely though, my other dual-Xeon is still receiving 0xa4 work for its single 56-thread slot – it does show intermittent "Empty work server assignment" log lines but recovers after retrying for 20 or 30 minutes.)

Estimated PPD on the 3x 16-thread slots: about 3x 65 000 PPD = 195 000 PPD (in contrast to the mentioned 300 000 PPD which a single 0xa4 worker would earn on a 56-thread slot).

PS:
I have no experience with Windows on multi-socket machines, hence have no suggestion how to deal with those process scheduler limitations or what ever it is that prevents a maximum thread count on your box.
 
Reactions: Ionstream

Ionstream

Member
Nov 19, 2016
55
24
51
I always seem to receive at least one 0xa7 work unit at any given time, and it usually returns around 200k PPD with 32 threads assigned to it. 0xa4 work units are something else though, because the reutrns are a paltry 80k PPD if I'm lucky.
 

StefanR5R

Elite Member
Dec 10, 2016
5,673
8,195
136
No 0xa7 WUs for me today. Again I tried slots with 56, 32, 27, 24, 20, 16, 8 threads and am receiving WUs only for slots with 24, 20, 16, 8 threads, and only 0xa4 WUs for now. According to announcements like this, 27-thread WUs should be available - but not to me evidently.

Current estimations:
BDW-E, 20 threads @ 4.0 GHz: 120 k PPD
BDW-EP, 24+24+8 threads @ 3.2 GHz: 125 + 125 + 30 = 280 k PPD
BDW-EP, 20+20+16 threads @ 3.2 GHz: 100 + 100 + 60 = 260 k PPD

3 GHz for all cores sounds pretty darn nifty. Mine tops out at 2.3 GHz with all 18 cores active. I guess you really pay for going one step up from the E5-268x - 269x series. Can't really complain though, I paid only $1000 SGD for both chips ^^.

These boxes were built for simulations which do not scale very well with core count, hence the 2690 v4 looked like a good compromise between core count and frequency.

(Off topic: I do wonder whether a small cluster of high-frequency HEDT nodes with a low-latency interconnect, such as Infiniband, wouldn't beat the dual-socket Xeon boxes with this particular application. But Infiniband switches are expensive, and without switch you can at best create a three-node cluster using dualport cards. A Gigabit Ethernet cluster of fast single-socket nodes is slower than a dual-socket Xeon box though.)

How hot are your CPU VRMs getting? I have one side at 75C, and another hitting 85C, and I cannot figure out what's causing the temperature difference. My board does not have a case, and the one which is warmer is actually receives more airflow.

The boards (Supermicro X10DAX) only has got three temperature sensors; two for the CPUs and one for "overall system temperature". I have no idea where that third sensor is located. Also, the VRMs are impossible to reach at the moment, so I can't get a manual temperature reading.

However, the boards are specified for 160 W TDP CPUs (TDP of E5-2690 v4 is 135 W), and even allow for a little bit BCLK-overclocking and mild overvolting (but I don't use either). Also, they are mounted in cases such that the VRMs look to be subjected to a good amount of forced convection.

[edit: BDW-E estimation]
 

Ionstream

Member
Nov 19, 2016
55
24
51
No 0xa7 WUs for me today. Again I tried slots with 56, 32, 27, 24, 20, 16, 8 threads and am receiving WUs only for slots with 24, 20, 16, 8 threads, and only 0xa4 WUs for now. According to announcements like this, 27-thread WUs should be available - but not to me evidently.

Current estimations:
BDW-E, 20 threads @ 4.0 GHz: 120 k PPD
BDW-EP, 24+24+8 threads @ 3.2 GHz: 125 + 125 + 30 = 280 k PPD
BDW-EP, 20+20+16 threads @ 3.2 GHz: 100 + 100 + 60 = 260 k PPD



These boxes were built for simulations which do not scale very well with core count, hence the 2690 v4 looked like a good compromise between core count and frequency.

(Off topic: I do wonder whether a small cluster of high-frequency HEDT nodes with a low-latency interconnect, such as Infiniband, wouldn't beat the dual-socket Xeon boxes with this particular application. But Infiniband switches are expensive, and without switch you can at best create a three-node cluster using dualport cards. A Gigabit Ethernet cluster of fast single-socket nodes is slower than a dual-socket Xeon box though.)



The boards (Supermicro X10DAX) only has got three temperature sensors; two for the CPUs and one for "overall system temperature". I have no idea where that third sensor is located. Also, the VRMs are impossible to reach at the moment, so I can't get a manual temperature reading.

However, the boards are specified for 160 W TDP CPUs (TDP of E5-2690 v4 is 135 W), and even allow for a little bit BCLK-overclocking and mild overvolting (but I don't use either). Also, they are mounted in cases such that the VRMs look to be subjected to a good amount of forced convection.

[edit: BDW-E estimation]

I've stopped folding recently, as running games requires higher clocks, which isn't possible with fully loaded cores. I have noticed the lack of 0xa7 WUs though. 0xa4s aren't worth the heat and power, so it's not a huge loss for me.

Off-topic, I'm running on a X10DRL-i, which has temperature sensors for the CPUs, memory, PCH, system, VRMs and the environment. I would've bought the X10DAi for the extra x16 slots, but reviews on Newegg seem to suggest that the boards were prone to failure. No OC options here unfortunately.

I've yet to build my case, so it's all open air cooling for the moment (I should probably stick a fan in). Those 140mm Noctuas look really tempting though, much to the chagrin of my wallet.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |