Question Hyperthreading on or off?

jamesdsimone

Senior member
Dec 21, 2015
916
256
136
I've been trying to determine if there is some way to determine when disabling hyperthreading will help performance? I assume it depends on what is running and how many cores the CPU has?
 

StefanR5R

Elite Member
Dec 10, 2016
6,522
10,179
136
Benchmarking your particular workload on your particular hardware would be the ideal way.

Given a parallel workload, consisting of several processes/ several program instances (maybe, but not necessarily, in several containers or VMs), a first question would be if the available RAM bandwidth is enough for all the instances, or even better, if the present CPU cache size is enough for the number of instances. If not, run fewer concurrent instances if you can; obviously.

Then, if each workload instance is multi-threaded, next questions would be (a) on the software side/ data side, to which thread count the instance can scale without incurring unacceptable overhead, (b) how many threads are needed in order to sustain good utilization of your CPUs' execution units.

And then we are, en passant, getting to the topic of HyperThreading or SMT. After the points made above, it depends on the instruction profile of your workload on the one hand, and on the architecture of your CPUs on the other hand — and sometimes on whether performance or power efficiency is your priority — whether or not HyperThreading or SMT is beneficial in your workload–hardware combo.

(By "hardware" I also am implying its particular settings, e.g. fixed clock or fixed power limit, and at what levels.)
 

jamesdsimone

Senior member
Dec 21, 2015
916
256
136
Benchmarking your particular workload on your particular hardware would be the ideal way.

Given a parallel workload, consisting of several processes/ several program instances (maybe, but not necessarily, in several containers or VMs), a first question would be if the available RAM bandwidth is enough for all the instances, or even better, if the present CPU cache size is enough for the number of instances. If not, run fewer concurrent instances if you can; obviously.

Then, if each workload instance is multi-threaded, next questions would be (a) on the software side/ data side, to which thread count the instance can scale without incurring unacceptable overhead, (b) how many threads are needed in order to sustain good utilization of your CPUs' execution units.

And then we are, en passant, getting to the topic of HyperThreading or SMT. After the points made above, it depends on the instruction profile of your workload on the one hand, and on the architecture of your CPUs on the other hand — and sometimes on whether performance or power efficiency is your priority — whether or not HyperThreading or SMT is beneficial in your workload–hardware combo.

(By "hardware" I also am implying its particular settings, e.g. fixed clock or fixed power limit, and at what levels.)
The only workloads that I use that I really need maximum CPU performance are gaming and Handbrake. This is mostly in relationhip to XEON's. They are all quad channel memory so I assume memory bandwidth is not a limiting factor. I can test Handbrake. I know it uses 8 cores at 100% but not 12. Neither my E5-2695 V2 or 5900x scale unless I do two encodes at the same time. Is there software that can tell me the number of threads something is using? Unfortunately, my E5-2695 V2 Windows install decided to get corrupted so I have to fix that before I can bench it with hyperthreading disabled.
 

StefanR5R

Elite Member
Dec 10, 2016
6,522
10,179
136
Handbrake scaling was discussed in another thread or two already, I think.

Virtually no games employ as many CPU time intensive threads as an E5-2695 v2 has got physical cores. Unless the games of your choice are Chess and Go. It is a safe bet to either disable SMT in the BIOS for gaming sessions, or to bind the game to half of the logical CPUs such that HyperThreads remain unused by the game. I believe Windows numbers HT siblings as even and odd number pairs. (That is: Physical core 0 = logical CPUs 0 and 1; physical core 1 = logical CPUs 2 and 3; and so on.) But I may be mistaken about Windows CPU numbering scheme.

You could also leave HT enabled and the game freely scheduled on all logical CPUs, and rely on Windows' process scheduler to do The Right Thing. I have my doubts about Windows's process scheduler's abilities, but then, I don't use Windows a lot myself, and not on servers, and not for games. Perhaps somebody else could comment.
 
Last edited:

jamesdsimone

Senior member
Dec 21, 2015
916
256
136
Got the Windows issue fixed and did some tests with HT on and off. There isn't a clear difference but the temps are significantly different, around 61-62c with HT off and 69c with HT on. Must mean the CPU is doing more work since it's drawing more power.
 
Reactions: Zepp

Z O X

Junior Member
Oct 31, 2022
12
7
51
My system (1680V2@4.4 with 6800XT and rebar) gets 500 more points in Time Spy with HT disabled.
Switching to xAPIC interrupt mode + HT off gives the best latency and snappiness, but some games (e.g. Hellldivers 2) will use more than eight cores when available and in this case it loses around 10 FPS in CPU heavy situations.
 
Jul 27, 2020
25,179
17,508
146
A nice example of HT?



That's Librecalc working hard to do a vlookup of 70k records against a column of about the same number of records.

Funny that the HT thread isn't getting utilized over 30%.
 
Jul 27, 2020
25,179
17,508
146
Librecalc was taking too long. Tried it for 1000 records. From that, estimated time came out to be 94 frickin' minutes.

Opened the sheet in Excel. Immediately saw that the formats of the columns were different. Changed both to text and completed in less than 3 minutes.

Opensource ain't what it's made out to be.
 

StefanR5R

Elite Member
Dec 10, 2016
6,522
10,179
136
Funny that the HT thread isn't getting utilized over 30%.
1.) Total utilization is shown to be 67%. Xeon Gold 6248R is a 24c/48t processor. 67% utilization is most likely caused by your program's spawning exactly 32 worker threads (not counting I/O threads, GUI threads...). 32 happens to be a power of 2, so that's not completely arbitrary or coincidental. Programmers tend to love powers of 2. Especially those who program machines which are based on "bit"-wise dataprocessing. A traditional "bit" can have 2 states. And from there, a lot of power-of-2 based math starts... :-)

2.) The per-CPU utilization graphs indicate that the OS is pursuing a policy of keeping one hardware thread of each physical core as busy as possible and the other hardware thread of each physical core as idle as possible. And this as equal as possible across all cores. Which makes some sense.

From 1. + 2. follows <100% load on half of the logical CPUs and <33% load on the other half of the logical CPUs, due to <100% load per worker thread.

If your program could be taught to spawn either 24 or 48 workers instead of 32 workers, and your dataset is amenable to that, your program *might* work more efficient on this 6248R in one of these alternative cases.
 
Last edited:
Jul 27, 2020
25,179
17,508
146
If your program could be taught to spawn either 24 or 48 workers instead of 32 workers, and your dataset is amenable to that, your program *might* work more efficient on this 6248R in one of these alternative cases.
Not a custom program. It's Librecalc, the last version in the 7 series. The current 2024 series branch has multithreading borked (my guess) as it won't utilize more than 3 threads during vlookups.
 
Jul 27, 2020
25,179
17,508
146
67% utilization is most likely caused by your program's spawning exactly 32 worker threads (not counting I/O threads, GUI threads...). 32 happens to be a power of 2, so that's not completely arbitrary or coincidental.
Your powers of 2 explanation makes sense since it will load 4C/8T i7-4770 to 100%.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |