Hyperthreading Revisited

Xenon14

Platinum Member
Oct 9, 1999
2,065
0
0
All analysis that I read online regarding hyperthreading seems to revolve around specific tasks; whether it's gaming or encoding, and how those tasks manage multithreading. But there is never discussion about actual system performance when you run A LOT of parallel threads.

Further, HT discussions seem to focus on a desktop user, and conclusions always seem to be that most will not notice a difference between an overclocked i5 with 4 cores versus an i7 that has 4 additional HT virtual cores enabled.

But what about people that have an ultrabook with a dual core processor. Does a dual core ULV i7 provide material advantages with HT enabled? Versus not?

More specifically, will HT provide tangible improvements with the following workload (by improvements I simply mean a smoother/non-interrupted user experience):
If I have Google Chrome open with 20 tabs, Opera Browser open & running IRC client, BitTorrent, VPN & Antivirus, MS Outlook, Microsoft Word, Excel (And by excel I mean really extensive financial models with thousands of calculations and sensitivity analysis), Tweetdeck, and possibly a video running in VLC.
 

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
It seems it would be an advantage for less than high end solutions.

Investigating gaming effects is clear dual cores have major benefits with hyper threading.
http://www.techbuyersguru.com/CPUgaming.php

The key is getting more multithreaded applications which is tremendously hard to do. Your browser example (seems) to be very easy to multithread but on the other hand each tab doesn't generally need much processing power.
 

Hulk

Diamond Member
Oct 9, 1999
5,099
3,609
136
I'll take a stab at this.

Of course the best way to evaluate the effectiveness of hyperthreading for a given CPU/software combination is to review tests of that combination. But since it's hard to find the exact combination we can make some predictions based on processor architecture and software.

In the most simple terms Hyperthreading uses "unused" processor resources to execute another logical thread. Logical because there isn't actually two physical cores but that's how your OS sees it. Modern Intel processors are very wide machines, capable of executing 4 or even 5 instructions in parallel given the right code. But often times all of those resources can't be utilized and that is where Hyperthreading comes in. Sometimes all instruction pipes can't be filled due to waiting for dependent instructions or other reasons. But those pipes can be used to execute another thread and a Hyperthreaded processor has the resources to monitor both threads.

Since there aren't physically two cores, HT will never provide as good performance as two actual cores but in some cases it can get do very well. Especially with the newest, widest cores, like Haswell (I would predict). HT is known to do very well on video compression because the task is very "parallellizeable." That is all of those execution ports in the processor can be filled.

Generally dual core HT processors show a larger benefit from HT than do 4 core processors because most code these days only handles 4 threads very efficiently. So it's not that dual cores have better HT, it's more that software is better written generally to take advantage of 4 cores rather than 8 in a quad HT processor.

If you look at the Anandtech Bench you'll see that a HT enabled 3220 Ivy is actually faster than a Q6600 true quad. Yes the Ivy has a clockspeed advantage, but even given that the HT 3220 does amazingly well.
 

moonbogg

Lifer
Jan 8, 2011
10,731
3,440
136
Someone said that stacking more cores onto a CPU was like trying to make an airplane by adding wings to a train. We need new tech, not stacked old tech...right?!
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
SMT was and still is more performance in return for die area than moar cores and will remain so for the foreseeable future.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Note that Haswell has improved Hyper-Threading performance. It has four arithmetic execution ports, instead of three (which we were stuck with since Core 2). What's more, they're arranged so that it's really two pairs of ports with equal capabilities (for scalar integer instructions). This is important to Hyper-Threading because when one thread occupies a port, there's always a second equivalent available for the other thread.

So while single-threaded IPC increased by about 10%, multi-threaded IPC increased by about 20%!
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
More specifically, will HT provide tangible improvements with the following workload (by improvements I simply mean a smoother/non-interrupted user experience):
If I have Google Chrome open with 20 tabs, Opera Browser open & running IRC client, BitTorrent, VPN & Antivirus, MS Outlook, Microsoft Word, Excel (And by excel I mean really extensive financial models with thousands of calculations and sensitivity analysis), Tweetdeck, and possibly a video running in VLC.

I wouldn't consider using less than a 2C/4T computer these days.

Yes, HT has significant benefit in parallel loading.

The cases where performance improves without HT are cases where the software is not capable of efficienctly taking advantage of more than "X" cores, where "X" is equal to or less than the number of physical cores.

So when benchmarking specific software, it may be a disadvantage to have HT enabled, but when you start adding background tasks, then HT would again have an advantage.
If you talk about a single user opening many tasks, it gets a little difficult to pin down. Mainly because most of those tasks will be doing nothing in the background. 20 tabs is a lot of tabs, but the majority of them are just sitting in memory doing nothing on the CPU. So while it's an advantage, in practice it's a small one provided you have enough cores to handle the main tasks.

2C / 4T seems to handle general computing (even heavy usage) pretty well. 4/4 is a good, but more expensive substitute, and 4/8 is really only a significant advantage in special usage that's more demanding than "general computing" apps, even several, since background tasks are usually very good at using minimal CPU resources.

As a result I definitely would value HT on a dual differently from HT on a quad. I think a lot of people feel similarly.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,099
3,609
136
Note that Haswell has improved Hyper-Threading performance. It has four arithmetic execution ports, instead of three (which we were stuck with since Core 2). What's more, they're arranged so that it's really two pairs of ports with equal capabilities (for scalar integer instructions). This is important to Hyper-Threading because when one thread occupies a port, there's always a second equivalent available for the other thread.

So while single-threaded IPC increased by about 10%, multi-threaded IPC increased by about 20%!


This is a really good observation. As more and more effort is given to increasing IPC it's just a fact that a lot of these additional resources aren't going to be used much of the time. And that is where HT comes in. It was interesting that HT disappeared with Conroe/Penryn and then reappeared with Nehalem. Nehalem isn't really much wider than Conroe but I guess Conroe was a big enough architecture change without having to incorporate HT as well.

I think HT was a development as a kind of "band-aid" for the netburst right? Something to do with the execution ports during those costly P4 pipeline stalls.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,202
126
I'm wondering about the more subjective benefit of HT, rather than the objective benchmarks. Does a rig with HT "feel" smoother?
 

BSim500

Golden Member
Jun 5, 2013
1,480
216
106
wand3r3r - Investigating gaming effects is clear dual cores have major benefits with hyper threading.
http://www.techbuyersguru.com/CPUgaming.php
That's a very good article. HT can make quite a big difference in performance depending on app / game. Other good apps include WinRAR (i3-3220 = 3,557KB/s WITH HT vs 3,098KB/s NO HT - a 15% difference or equiv of 450-500Mhz extra clock). When it works well, it's almost like having 2.5 cores. It does benefit i3's more than i7's in many apps & games though, due to the depreciating benefit of core scaling in general.

Eg, take Deus Ex: Human Revolution mentioned in above article (min / avg / max fps):-

1-core = 53 / 106 / 149
2-core = 88 / 138 / 211 (30% vs 1C)
3-core = 113 / 155 / 218 (12% vs 2C)
4-core = 112 / 155 / 217 (0% vs 3C)
http://benchmark3d.com/deus-ex-human-revolution-benchmark/2

Above dual core, in many games, most of the extra performance gain comes from the third core. With only very few exceptions, if you Google "x core scaling" where x is whatever game, you'll see the same thing over and over (Techspot seems to do a lot of these). That means i3's generally do punch well above their weight compared to Pentium's and will show more improvement than i7's whose full 4 cores will have already soaked up most of the 2-core vs 3-core load that makes the most difference.

VirtualLarry - I'm wondering about the more subjective benefit of HT, rather than the objective benchmarks. Does a rig with HT "feel" smoother?
In my experience it does. I have a lowly i3-530 (OC'd to 4.2GHz) which comfortably provides average +60fps in Bioshock Infinite, Dishonored, DXHR, Skyrim, etc. Feels very smooth (mainly due to HT increasing minimum frame-rates during CPU spikes that would normally load 2-cores 100%). Switching off Hyper-Threading causes WinRAR to drop almost 18% on mine. But an i7 may show less of a difference in games that only use 4 cores.

Benchpress - Note that Haswell has improved Hyper-Threading performance. It has four arithmetic execution ports, instead of three (which we were stuck with since Core 2). What's more, they're arranged so that it's really two pairs of ports with equal capabilities (for scalar integer instructions). This is important to Hyper-Threading because when one thread occupies a port, there's always a second equivalent available for the other thread.

So while single-threaded IPC increased by about 10%, multi-threaded IPC increased by about 20%!
Indeed. It'll be interesting to see how the new Haswell i3's stack up, especially the new i3-4340 3.6GHz which with a +5% BCLK OC (3.78ghz) will be almost 15% faster than a standard i3-3220 3.3GHz Ivy in clock speed alone, and probably +20% faster including Haswell's better IPC. The new i3's also include AES-NI (which previously was for +i5's only), and Haswell i3's also match Ivy i3's in both having 55w TDP's, whereas Haswell 84w quad's are higher than Ivy's 77w.

For those unaware, the new Haswell i3's due this autumn were shown in Intel's leaked slides:-
http://4.bp.blogspot.com/-dVI3mYfmJRw/Uboi5GjGUzI/AAAAAAAAAB0/NeAiGONavd4/s1600/i5.jpg

Haswell i3's are actually starting to look like a genuine "tock upgrade" over the previous generation's i3's and not just a "same clock rate refresh" glorified "tick-plus". Only sad thing is the crippled BCLK overclocking for all non-K chips (a 4.5GHz i3 would have been a really nice budget overclocker (3.6GHz x 1.25 BCLK gearing = 4.5GHz)).
 

Emulex

Diamond Member
Jan 28, 2001
9,759
1
71
SMT was and still is more performance in return for die area than moar cores and will remain so for the foreseeable future.


What i'd like to see in virtualization or hardware is reverse-SMT.

256 cores acting as 16 - it would rock for licensing purposes and allow you to scale with Project Moonshot ( et others ).

Good applications might be control plane scaling - maybe you have a software defined network and or storage that needs to be dynamically scaled up/down in power to match FLOW demands?

Is reverse Hyperthreading just a dead idea?
 

WhoBeDaPlaya

Diamond Member
Sep 15, 2000
7,414
402
126
I'm wondering about the more subjective benefit of HT, rather than the objective benchmarks. Does a rig with HT "feel" smoother?
In games, normal desktop use, or ???
Comparing my i7 920 at home to my 3570K at work, I don't notice any difference for normal desktop use (browser tabs, SSH sessions, vid playing, etc.)
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
What i'd like to see in virtualization or hardware is reverse-SMT.

256 cores acting as 16 - it would rock for licensing purposes and allow you to scale with Project Moonshot ( et others ).

Good applications might be control plane scaling - maybe you have a software defined network and or storage that needs to be dynamically scaled up/down in power to match FLOW demands?

Is reverse Hyperthreading just a dead idea?

It is a dead idea. It is not a good way to get performance if you are concerned with performance/cost or performance/watt.

Reverse hyperthreading takes die-space. That same die-space could be allocated towards beefing up compute units for traditional performance increases (wider compute for ILP, more cache, etc).

It just comes down to what other things you could do if you told the design engineers that they have an extra 10W budget or an extra 100mm^2 silicon budget...they would not choose "reverse hyperthreading" as the thing to implement with that budget because it would not give them as much performance gains as would come from just adding yet-another solitary core or another 10MB of cache, etc.

Reverse hyperthreading is something that won't become relevant until a lot of other performance avenues are thoroughly exhausted. Like having 16GB of on-die cache or 48 cores that are 12-issue wide and stuff like that.
 

NTMBK

Lifer
Nov 14, 2011
10,400
5,635
136
IDC- in your opinion, how well would SMT "stack" with CMT? I'm thinking in the context of some future AMD part, where they beefed up the decode/FPUs etc enough to handle 4 threads within a single module. Would it make sense? Would having all those resources shared give it solid single threaded performance, while having lots of threads to throw around? Or is it a dead end like IBM's Niagara?
 

JimmiG

Platinum Member
Feb 24, 2005
2,024
112
106
It does, on a 1-core P4.

Actually the P4 was ill suited for HT. It was just one of of the many desperate things Intel did to try to improve performance during the later days of Netbust. It actually makes more sense with the Core architecture since it's much wider and has more resources to spare, but HT's poor reputation seems to be sticking.

I run a mobile i7 2C/4T machine at work. It works very well for multi-tasking etc. It does feel very responsive even when running CPU intensive tasks.

There are certainly instances where HT helps, and others where it makes no difference.
http://anandtech.com/bench/Product/836?vs=837

Maybe HT will be more useful for gaming etc. once the next-gen consoles come out, as they use 4 module/8 integer units. This may cause developers to spend more time optimizing for multiple cores, but I doubt they will be able to push 4/8 cores to 100% all the time. With 100% load for cores 1-2, 50% for 2-3 and ~10-25% on the remaining cores, a 4C/8T CPU would be perfect.
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
Actually the P4 was ill suited for HT. It was just one of of the many desperate things Intel did to try to improve performance during the later days of Netbust. It actually makes more sense with the Core architecture since it's much wider and has more resources to spare, but HT's poor reputation seems to be sticking.


He does have a point though in that general smoothness is improved significantly for low core CPUs when they have HT. 1C / 2T and 2C / 4T see the most frequent benefit.

People simply don't do things that require more than 2 strong and 2 weak cores unless they do heavy crunching like handbrake encoding, rendering, simulations, etc... Those kinds of things don't really fall under general computing. My wife notes my computer is better. She has no idea what's in the computers or why, but she definitely notices my computer feels faster. Hers is an e7200 @ 3.5 GHz, no slouch, but certainly not cutting edge. Dual without HT. Mine is an i3-530 @ 4GHz. So is it HT or is it the extra speed + a little IPC that she notices? Who really knows, but I think there's something to having 2C / 4T. Intel seems to think so too, since even though i3 and i5 seem like they should be the staple 2C / 4T in notebooks, Intel still gives some 2C / 4T CPUs an i7 branding. They feel like 2C / 4T is good enough to get the top of the line i7 branding.

I feel that there is some subjective benefit to HT in general computing. But like anything subjective, there's not a good way to provide evidence other than circumstantial.

Gaming, we also see that 3-4 cores seems optimum. Again 2C/4T processors are the best balance, with 4C / 4T performing equal or nearly equal to 4C / 8T. I believe a big part of that is due to the design of current consoles and the need for semi-common development trees. If that assumption is true, then with PS4 and XBO being approximately 4C / 8T or 8 weak cores, we should see 4C / 8T CPUs have a much bigger advantage than they do now when the next round of game engines come out. However the 8 cores on the consoles may be so weak that 4C / 4T CPUs will cope pretty well. This is the most interesting aspect of the future to me, since I see little happening with software of general computing that will benefit from faster or wider CPUs.

So the value of Hyperthreading is a sliding scale. On a 1 core CPU it's critically important. On a 2 core CPU it's important enough Intel gives top branding to some 2C / 4T CPUs and on a 4 core CPU it loses a lot of it's value for general computers, and really only holds value for very specific tasks like crunching, rendering, encoding and such.
 
Last edited:

SPBHM

Diamond Member
Sep 12, 2012
5,065
418
126
Actually the P4 was ill suited for HT. It was just one of of the many desperate things Intel did to try to improve performance during the later days of Netbust.


ill suited or not it worked, it's just that, most software back in the day was optimized for single core/CPU...

as for HT being an act of desperation, I don't think so, it was built in into the architecture, Intel just decided not to enabled it for some time,

this is from 2002


(this is a little optimistic let's say, "best case scenario", there were issues and cases of performance loss, but still, it worked)
it's an equivalent to the Northwood P4.


but as said earlier, his point was, the less resources you have, the more you will notice the benefit of HT, and it's probably true, perhaps the best example would be a Celeron G460 (single core sandy bridge with HT), you will definitely notice a gain on basic web browsing and stuff, even the old P4, running win 7 with newer software, I think it would be easier to see the benefit of HT compared to when it was introduced.

even with my i3 I clearly noticed the benefit of HT in some regular usage situations.
 

JimmiG

Platinum Member
Feb 24, 2005
2,024
112
106
but as said earlier, his point was, the less resources you have, the more you will notice the benefit of HT, and it's probably true, perhaps the best example would be a Celeron G460 (single core sandy bridge with HT), you will definitely notice a gain on basic web browsing and stuff, even the old P4, running win 7 with newer software, I think it would be easier to see the benefit of HT compared to when it was introduced.

even with my i3 I clearly noticed the benefit of HT in some regular usage situations.

I think the ideal CPU for HT is one with few, but very wide cores. Dual-core Haswell variants should benefit greatly.
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
Actually the P4 was ill suited for HT. It was just one of of the many desperate things Intel did to try to improve performance during the later days of Netbust. It actually makes more sense with the Core architecture since it's much wider and has more resources to spare, but HT's poor reputation seems to be sticking.
The first Willamettes had HT (well, technically Foster, but basically the same thing), in 2001, yet it was a desperate move in the later days?

The P4 was ill-suited to running any code that isn't today a candidate for being GPGPU, HT or no. HT kinda sucked due to being starved for cache, and decoders, and ports to issue to, and taking several cycles for ALU ops that took 1 cycle on every other CPU, but the same was true for non-HT P4s, until they beefed up the cache, which helped a lot with HT.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |