Getting the most PPD out of your hardware for F@H

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

biodoc

Diamond Member
Dec 29, 2005
6,269
2,238
136
At least with bigger/newer Nvidia GPUs, FahCore_21.exe is obviously almost always loading one CPU hardware thread to its fullest. Therefore I am wondering whether single-core performance of the CPU can bottleneck GPU PPDs.

To test that, I am still planning to find out how to set up FAHClient to process a given copy of always the same Work Unit.

But for now I turned to FAHBench to get somewhat closer to an answer. The current FAHBench v2 supposedly uses the very same code as FahCore_21. FAHBench v2 comes with three different built-in WUs, and it is also possible to add custom WUs to FAHBench which can be derived from "real" Folding@Home WUs. I don't know how well the three built-in WUs reflect typical Folding@Home WUs.

During benchmarking, FAHBench also loads one CPU hardware thread fully. So, there is at least a distinct chance that any CPU single-thread performance bottleneck would also be showing up in FAHBench.

Software used in the tests:
FAHBench v2.2.5, OpenMM version 6.2-core21-0.0.17
options: OpenCL, single precision, accuracy check enabled, NaN check disabled, 60 s run length
Nvidia driver version 272.06
Windows 7​

Hardware:
Core i7-6950X, HT off, EIST off
reference GTX 1080
factory-overclocked GTX 1070 (Gainward Phoenix GS, presumedly 170 W TDP)
both cards in 16-lane PCIe 3.0 slots​

In all tests, the GPU performance cap was shown to be Voltage (not power or temperature), according to GPU-Z. Temperatures remained moderate, and cards ran at about 1.9 GHz (1080) and 2.0 GHz (1070).

All FAHBench scores shown below in absolute numbers are averages from three consecutive runs. Variability between those triple runs was reasonably low. The percentages in the table are simply the score at the given CPU clock divided by the score at 4.0 GHz CPU clock.

Work Unit: dhfr
Code:
CPU clock          4.0 GHz   3.5 GHz   3.0 GHz   2.5 GHz   2.0 GHz   1.5 GHz
----------------------------------------------------------------------------
GTX 1080 scores     110       106       103        98        92        82
                  (100 %)    (97 %)    (94 %)    (89 %)    (84 %)    (75 %)
----------------------------------------------------------------------------
GTX 1070 scores     106       103       100        95        89        80
                  (100 %)    (98 %)    (94 %)    (90 %)    (84 %)    (76 %)

Work Unit: dhfr-implicit
Code:
CPU clock          4.0 GHz   3.5 GHz   3.0 GHz   2.5 GHz   2.0 GHz   1.5 GHz
----------------------------------------------------------------------------
GTX 1080 scores     519       517       516       514       515       520
                  (100 %)   (100 %)    (99 %)    (99 %)    (99 %)   (100 %)
----------------------------------------------------------------------------
GTX 1070 scores     477       476       475       473       476       480
                  (100 %)   (100 %)   (100 %)    (99 %)   (100 %)    (99 %)

Work Unit: nav
Code:
CPU clock          4.0 GHz   3.5 GHz   3.0 GHz   2.5 GHz   2.0 GHz   1.5 GHz
----------------------------------------------------------------------------
GTX 1080 scores     14.4      14.4      14.3      14.3      14.2      14.1
                  (100 %)   (100 %)   (100 %)    (99 %)    (99 %)    (98 %)
----------------------------------------------------------------------------
GTX 1070 scores     14.7      14.6      14.6      14.6      14.5      14.4
                  (100 %)    (99 %)    (99 %)    (99 %)    (99 %)    (98 %)

So, there is practically no dro-poff in the dhfr-implicit and nav tests, while the dhfr test shows ~5 % loss of performance when going from 4.0 to 3.0 GHz CPU clock, and ~10 % loss at 2.5 GHz. Not as pronounced as I suspected.

It remains to be seen how this scales in FAHClient with typical WUs.

Another use for FAHBench is to test which Nvidia drivers give maximum ppd on FAH with the caveat that the results may not translate to all FAH WUs.

For most of the race, I was using the nvidia driver version 370.28 for my rig with 2 x 1080s on linux mint. This driver version was the first to allow the coolbits option to control fan speed and allow overclocking the GTX 1080 in linux.

Recently I decided to try to upgrade to the 384.111 driver so see if I could increase my ppd on the 1080s. For the FAHBench tests, I shut down folding.

370.28 driver results:

FAHBench: single precision/DHFR score was 144.85
9414 WU TPF: 43 seconds

384.111 driver results:

FAHBench: single precision/DHFR score was 135.781
9414 WU TPF: 47 seconds

As a not of caution for those with 10 series Ti cards, the 370 driver will not work. You'll need a more recent driver version.
 
Reactions: TennesseeTony

StefanR5R

Elite Member
Dec 10, 2016
5,673
8,195
136
Regarding the PCIe bus, I have personally found that maximum performance requires PCIe v2.0 @8x or better. Performance drops off sharply at 4x, and nearly dies at 1x.
On an LGA2011-3 based host with 3x GTX 1080Ti in PCIe v3 x16/ x16/ x8 slots, I monitored Folding@Home's PCIe usage of each GPU (a) under Windows and (b) under Linux, each time for more than an hour, using the nvidia-smi command line tool with 1 second reporting period.

The numbers are: average / 90%-quantile / peak.
RX = PCIe reception, TX = PCIe transmission (I guess RX is when data are loaded onto the card, and TX is when results are copied out of the card.) Application: FahCore_21.

OS: Windows 7 Pro 64bit, Nvidia driver 387.92
RX: 5,200 / 6,300 / 7,100 MB/s
TX: 780 / 890 / 1,200 MB/s​

OS: Linux Mint 64bit, Nvidia driver 384.111
RX: 150 / 170 / 240 MB/s
TX: 60 / 90 / 110 MB/s​

This means that you need PCIe v3 x8 on Windows in order to avoid bus bandwidth bottlenecks entirely (on big GPUs), whereas the measurements make it seem as if even PCIe v1 x1 could be adequate if using Linux. However, it's possible that my method of measurement missed some peaks, and that relatively more headroom is needed on slower links in order to keep application performance up.

Data rates of PCIe versions:
Code:
lanes      x1     x2     x4     x8     x16
------------------------------------------------
PCIe v3   985  1,970  3,940  7,880  15,760  MB/s
PCIe v2   500  1,000  2,000  4,000   8,000  MB/s
PCIe v1   250    500  1,000  2,000   4,000  MB/s
 

StefanR5R

Elite Member
Dec 10, 2016
5,673
8,195
136
Earlier this week I performed a quick & dirty experiment, checking for PPD per Watt on Pascal GPUs, depending on different power limit settings.

Test setup:
3x GTX 1080Ti, X99 / Xeon E5, Windows 7, driver 387.92
GPUs are reference design cards but watercooled
no clock offset set on GPU core clock and memory clock
no other load besides 3 F@H GPU slots
1,200 W platinum PSU
a cheap (but according to reviews very accurate) power meter made by Brennenstuhl​

I tried three power limit settings, each with one random WU on each GPU. (Hence why the test was quick & dirty: The PPD as well as the performance/Watt may have differed considerably between these WUs, making the three tests no reliably comparable.)

Test 1: Power limit 100 % (250 W board power limit)
estimated PPD 3.46 M
system power draw 785 W at the wall
= 4,400 PPD/Watt
average GPU core Voltage was 1.06 V​

Test 2: Power limit 80 % (200 W board power limit)
estimated PPD 3.17 M .................................. 92 % of test 1
system power draw 720 W at the wall ........ 92 % of test 1
= 4,400 PPD/Watt .................................... same as in test 1
average GPU core Voltage was 1.04 V​

Test 3: Power limit 68 % (170 W board power limit)
estimated PPD 2.92 M .................................. 84 % of test 1
system power draw 635 W at the wall ........ 81 % of test 1
= 4,600 PPD/Watt ........................................ 104 % of test 1
average GPU core Voltage was 0.89 V​

Comments, conclusions:
  • The difference between tests 1 and 2 is so small because without power limit, F@H is only using about 87 % of the board power of a GTX 1080Ti (on Windows; should be more on Linux, with accordingly higher PPD). And as you can see, the board BIOSes drove the GPUs at almost the same Voltage in these two tests.
  • Only in the 3rd test did the Voltage go down considerably, and the PPD/Watt up, as it should.
  • It would be interesting to learn what standard Voltage the BIOSes of Pascal Quadro and Tesla cards are applying.
  • Note that the power draw at the wall includes the energy consumption of the CPU, mainboard, VRMs, cooling system, and PSU inefficiency.
  • This quick test made the overall system performance per Watt go up by only 4 %, which is disappointing. The major problem here is that the system power overhead (CPU and so on) remained basically the same. Furthermore I suspect that the GPU BIOS had difficulties to maintain an optimum regime of clocks and voltages in the third test, with such a low power target.
Test 1:
GPU power consumption: 630 W (according to GPU sensors, average from hwinfo64)
hence, 785 - 630 = 155 W system power overhead​

Test 2:
GPU power consumption: 575 W (according to GPU sensors, average from hwinfo64)
hence, 720 - 575 = 145 W system power overhead​

Test 3:
GPU power consumption: 370 W (according to GPU sensors, average from hwinfo64)
hence, 635 - 370 = 265 W system power overhead
This data point looks suspicious. I suppose the sensors' readings or hwinfo64's averaging were inaccurate.​

Test 1: Power limit 100 % (250 W board power limit)
Code:
PRCG   estimated PPD
--------------------
11728      1,190,000
11719      1,140,000
11719      1,130,000
--------------------
           3,460,000

averages from hwinfo64:
GPU Temp  core Voltage  gpu clock    core load   power
------------------------------------------------------
  45 °C      1.062 V     1.90 GHz       87 %     203 W
  52 °C      1.061 V     1.88 GHz       87 %     213 W
  47 °C      1.061 V     1.91 GHz       87 %     215 W

at the wall: 785 W


Test 2: Power limit 80 % (200 W board power limit)
Code:
PRCG   estimated PPD
--------------------
11720      1,050,000
11720      1,050,000
11719      1,070,000
--------------------
           3,170,000

averages from hwinfo64:
GPU Temp  core Voltage  gpu clock    core load   power
------------------------------------------------------
  44 °C      1.058 V     1.90 GHz       85 %     192 W
  50 °C      1.020 V     1.84 GHz       87 %     193 W
  45 °C      1.047 V     1.90 GHz       85 %     191 W

at the wall: 720 W


Test 3: Power limit 68 % (170 W board power limit)
Code:
PRCG   estimated PPD
--------------------
11726      1,080,000
11715        910,000
11711        930,000
--------------------
           2,920,000

averages from hwinfo64:
GPU Temp  core Voltage  gpu clock    core load   power
------------------------------------------------------
  37 °C      0.854 V     1.15 GHz       52 %     109 W
  42 °C      0.954 V     1.56 GHz       71 %     151 W
  37 °C      0.874 V     1.19 GHz       51 %     110 W

at the wall: 635 W


Remarks:
I neglected to check whether or not the card IDs in F@H match those in hwinfo64.
GPU temperatures are systematically different because of the order of waterblocks and radiators in the cooling system. I did not run the pump at full speed, which would have made the temperatures a bit more (but not fully) uniform.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |