AMD multi threading

turion64

Junior Member
Mar 3, 2005
2
0
0
AMD is riding high in most benchmarks thanks to their innovative x86 technology, especially in gaming AMD wiped out intel bad.
If multithreaded game will be realize, would this be an advantage to intel, does AMD have plans to counter intel HT tech
 

silverpig

Lifer
Jul 29, 2001
27,703
12
81
2 things:

1. A64's pipeline is much shorter than P4s. The reason why P4 benefits from HT is because there are a lot of "gaps" in the pipeline with only one thread running. The second thread fills these "gaps". A64's pipe is much shorter and dosn't have as many "gaps" to fill. It wouldn't benefit from HT very much.

2. Dual cores.
 

AbAbber2k

Diamond Member
Mar 1, 2005
6,474
1
0
Intel's current processors will likely benefit more in games coded for multithreading but I doubt it'd be enough to push past AMD, if anything it'd close the performance gap in AMD-favored games.

As for future products (multicore) it's hard to say since both have dualcore's on the horizon. I don't really see HT suddenly making Intel king of the hill again though.
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: AbAbber2k
As for future products (multicore) it's hard to say since both have dualcore's on the horizon. I don't really see HT suddenly making Intel king of the hill again though.
IDK... If a single core P4 with HT can beat a single core A64 under DC projects like Folding@home. When time for dual core, one can imagine a dual cored Pentium with HT (on each core) outperforming a dual cored A64.
 

Calin

Diamond Member
Apr 9, 2001
3,112
0
0
I think the hyperthreading won't help too much for the P4 - at most something like 20%. For computing-intensive, memory-bandwidth insensitive processes like those, I think the advantage will remain similar
 

MetalStorm

Member
Dec 22, 2004
148
0
0
Originally posted by: Algere
Originally posted by: AbAbber2k
As for future products (multicore) it's hard to say since both have dualcore's on the horizon. I don't really see HT suddenly making Intel king of the hill again though.
IDK... If a single core P4 with HT can beat a single core A64 under DC projects like Folding@home. When time for dual core, one can imagine a dual cored Pentium with HT (on each core) outperforming a dual cored A64.

The only Pentium chips that will use dual core and HT will be the EE chips, fancy forking out $1000 for one of those badboys? I didn't think so.

The dual cored EE chips might win at some things, I think rendering especially, but as for games, until they become multi-threaded it won't make a difference, and even when they are multi-threaded, if they're only 2 threads, then HT won't make any difference.
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: MetalStorm
Originally posted by: Algere
Originally posted by: AbAbber2k
As for future products (multicore) it's hard to say since both have dualcore's on the horizon. I don't really see HT suddenly making Intel king of the hill again though.
IDK... If a single core P4 with HT can beat a single core A64 under DC projects like Folding@home. When time for dual core, one can imagine a dual cored Pentium with HT (on each core) outperforming a dual cored A64.

The only Pentium chips that will use dual core and HT will be the EE chips, fancy forking out $1000 for one of those badboys? I didn't think so.

The dual cored EE chips might win at some things, I think rendering especially, but as for games, until they become multi-threaded it won't make a difference, and even when they are multi-threaded, if they're only 2 threads, then HT won't make any difference.
It was never a matter of price (hence why I didn't say dual cored Pentium 4 with HT), only performance/technology & who knows, Intel could eventually migrate HT'd dual cores into the mainstream. As for games, if they're designed for more than 2 threads, there would probably be demand for 3+ core processors then. IIRC according to Gabe (HL2), it would've been difficult (time restraints and cost for little perf. boost) to code for HT. If that's indication of future games + HT well...

However for DC projects & the like which already benefit from the Pentium 4's logical CPUs (HT) & are programmed to take advantage of more than 2 threads, I see possibilities here.
 

MetalStorm

Member
Dec 22, 2004
148
0
0
If you have a look on HardOCP they have some links to a new sort of card that is designed to boost the physics processing power of a rig by 100 times and it certainly looks very promising.

Basically, it's another add in card like the GPU but dubbed the PPU, as it's a physics processing unit. It will allow games that use a certain game engine (that will be used in Unreal 3) to have far more objects interacting, rather than from 30-40 for high end PCs at the moment, they claim they are able to pump between 30,000 and 40,000 interacting objects, which will basically mean you can have deformable walls and basically everything. It's certainly looking very interesting anyway.

The real point of mentioning it though is that it will take quite a lot of load from the CPU, and I imagin it's one of the few things that can be split in to different threads, so if these cards catch on then I'm not sure what they'll be able to split up for dual cores??
 

mdchesne

Banned
Feb 27, 2005
2,810
1
0
d'ya see that beer comerical out recently? I think it was a Miller bash by Coors (or some other beer companies, it doesn't matter)

but anyways, the comerical goes like this:

"Recently, Miller (or w/e) came out with a new beer with better taste called Miller (or whater) Select. Peopple have been asking us whether we're going to make another beer. We respond simply, 'we made it right the first time'"
lol, fits nicely

my juxtaposition:

"Recently, AMD fanatics have asked us whether we are going to make another model processor in response to the Intel hyperthreading technology. We respond simply, 'we got it right the first time'"

Be really funny if AMD pulled somethign like this in their adverstising
 

Peter

Elite Member
Oct 15, 1999
9,640
1
0
Remember that Hyperthreading isn't a magic trick. It's an (admittedly elegant) crutch to overcome a design weakness in the P4 ... which the K8 doesn't have to begin with.
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
I'm skeptical as to whether AMD wouldn't benefit from multi-threading.
If it were true that pipeline depth affected a processor's ability to execute multiple threads simultaneously, then surely Prescott would be a far better multitasker than Northwood?
The way I understand HT is that it allows more of a processor's execution resources to be used at once.
Intel and AMD go about spreading their execution resources differently; Intel adopted a deep approach, AMD a wide approach.
With the Athlon's ability to do more in parallel, it seems intuitive that AMD would benefit from HT atleast, if not more than, Intel has.

As far as non-HT multitasking performance goes, I seem to remember AMD taking the multitasking benchmarks pre-P4C.
With HT disabled on the upcoming Pentium 'D', it looks like AMD could once again lead in this area.
 

ghackmann

Member
Sep 11, 2002
39
0
0
Originally posted by: BitByBit
I'm skeptical as to whether AMD wouldn't benefit from multi-threading.
If it were true that pipeline depth affected a processor's ability to execute multiple threads simultaneously, then surely Prescott would be a far better multitasker than Northwood?
I think you're misunderstanding how the pipeline affects Hyperthreading. The P4 has a worse branch predictor than the Athlon and a much longer pipeline, so pipeline stalls are much more expensive on the P4 than on the Athlon. The only thing Hyperthreading does for the P4 is finding something for the CPU to do while waiting for the pipeline stall to be resolved, when it would normally be sitting there idle. So, the shorter pipeline stalls on the Athlon mean it stands to gain much less from Hyperthreading.

All other things being equal, the 31-stage pipeline on the Prescott means it would use Hyperthreading more often than Northwood. But they improved the branch predictor on the Prescott core, so you end up getting more expensive stalls that happen less often. How this affects application performance depends on the app. Either way, I wouldn't consider falling back on Hyperthreading more often to be "better at multitasking", because again, it's just finding something to do instead of wasting an excessive amount of time.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
The P4 has a worse branch predictor than the Athlon
Do you have numbers to back that up?

When you say a longer pipeline means stalls are more expensive - what exactly would cause this? Thinking about in-order execution, I'd expect that a wider, shorter pipe would be hit harder by a single cycle stall than a narrower, deeper pipe (since fewer instructions could have been committed in a given cycle anyway). Out of order execution makes perfomance analysis with stalls really complicated too. When you say stalls, do you mean pipeline flushes? That would make more sense to me.
 

ghackmann

Member
Sep 11, 2002
39
0
0
Originally posted by: CTho9305
The P4 has a worse branch predictor than the Athlon
Do you have numbers to back that up?
Not off the top of my head, but I have seen several benchmarks to that effect.

Originally posted by: CTho9305
When you say a longer pipeline means stalls are more expensive - what exactly would cause this? Thinking about in-order execution, I'd expect that a wider, shorter pipe would be hit harder by a single cycle stall than a narrower, deeper pipe (since fewer instructions could have been committed in a given cycle anyway). Out of order execution makes perfomance analysis with stalls really complicated too. When you say stalls, do you mean pipeline flushes? That would make more sense to me.
To be honest, I'm not sure -- I'm not that familiar with the internals of the Netburst architecture specifically, so I'm just repeating what I've been told by people in the know. I assume they meant pipeline flushing.
 

ribbon13

Diamond Member
Feb 1, 2005
9,343
0
0
Originally posted by: Algere
However for DC projects & the like which already benefit from the Pentium 4's logical CPUs (HT) & are programmed to take advantage of more than 2 threads, I see possibilities here.

Actually HT hinders DC...
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: ribbon13
Originally posted by: Algere
However for DC projects & the like which already benefit from the Pentium 4's logical CPUs (HT) & are programmed to take advantage of more than 2 threads, I see possibilities here.

Actually HT hinders DC...
How so?

It's known you get more S@H WUs done with HT enabled than with it disabled on the Pentium 4. Lots of ppl to back that up in the DC forum.

 

ribbon13

Diamond Member
Feb 1, 2005
9,343
0
0
Originally posted by: Algere

How so?

It's known you get more S@H WUs done with HT enabled than with it disabled on the Pentium 4. Lots of ppl to back that up in the DC forum.

When you try to run two units at once each unit takes longer to finish, it adds a global delay. I've been trying to find out where I heard it exactly. I'm pissed because I know I read it on these forums, but the search engine is a piece of #%^$@%^!!11one1
 

Algere

Platinum Member
Feb 29, 2004
2,157
0
0
Originally posted by: ribbon13
Originally posted by: Algere

How so?

It's known you get more S@H WUs done with HT enabled than with it disabled on the Pentium 4. Lots of ppl to back that up in the DC forum.

When you try to run two units at once each unit takes longer to finish, it adds a global delay. I've been trying to find out where I heard it exactly. I'm pissed because I know I read it on these forums, but the search engine is a piece of #%^$@%^!!11one1
Of course for single threaded performance you'll get a marginal or so hit in performance, but we're talkin' DC now where the aim is to crunch out as many WUs as possible. Yes one WU will be slower to finish with HT enabled than without but IDK if your forgettin' that with HT you have 2 WUs crunching @ the same time. For instance...

HT disabled = 2.5 hours/WU = 5 WU completed in 12.5 hours
HT enabled = 3 Hours/WU x 2 processes (HT) = 8 WU completed in 12 hours


Hint: Look at "Join TeAm AnandTech S@H" in my sig (Step 7).
 

BitByBit

Senior member
Jan 2, 2005
474
2
81
The only thing Hyperthreading does for the P4 is finding something for the CPU to do while waiting for the pipeline stall to be resolved, when it would normally be sitting there idle. So, the shorter pipeline stalls on the Athlon mean it stands to gain much less from Hyperthreading.

So you're saying the only function of HT is to mask the impact of pipeline stalls?
Considering that the P4 gets a ~20% increase in performance for tasks like encoding and rendering, that's alot of stalls!
My case for the Athlon and HT is that while it has only a 12-stage pipeline, its execution core is much wider.
(This, incidently, is why the Athlon has a much higher IPC, and not because of its shorter pipeline.)
Tasks like encoding and rendering have low ILP, and thus can't make proper use of the Athlon's wider execution core, which results in redundant execution units.
This is where HT comes in.
 

kpb

Senior member
Oct 18, 2001
252
0
0
i'd recommend you reading http://arstechnica.com/articles/paedia/cpu/hyperthreading.ars

The fact is that because of the design differences the athlon is less likely to experience a stall and is less effected by it when it does experience one. The most common causes of a stall are going to be waiting for memory access. So the pipeline stall isn't going to be really x number of cycles but more x ms for the memory to be accessed. So a slower processor misses less cycles and athlon 64's integrated memory controller has a lower latency so has a shorter wait to get that from the memory.

The other thing is going to be a branch misprediction. Which will typically cause the entire pipeline to be flushed and start refilling. A 2 wide 31 long pipleline (aka prescot) is going to flush alot more instructions and take longer to fill back up than a 3 wide 13 long (aka a 64) processor. Those numbers aren't accurate but demonstrate the idea.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Eh? SMT wasn't introduced to overcome a design weakness in P4 (there are plenty of other features that specific address netburst glass jaws). It's been around since Willamette. And all processors will benefit from SMT. The only reason not to do it is design time and validation effort.
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
Originally posted by: dmens
Eh? SMT wasn't introduced to overcome a design weakness in P4 (there are plenty of other features that specific address netburst glass jaws). It's been around since Willamette. And all processors will benefit from SMT. The only reason not to do it is design time and validation effort.

Actually SMT has been around since Alpha EV8. I much preffer CMT to SMT. If you are going to replicate the registers and other logic to run 2 threads on one core, I say just go ahead and replicate the entire core, and save yourself the time. It may cost a bit more area, but you will get a linear scaling on independent threads instead of uncertain performance improvement that depends on how much functional unit contention there is between SMT threads.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
CMT doubles die size, SMT is about 5% more area. And the two can be combined.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |