Conroe and Athlon 64

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Furen

Golden Member
Oct 21, 2004
1,567
0
0
Originally posted by: dmens
... people getting paid more than I do run lots of simulations to figure out how to tweak the knobs.

Is that jelousy I hear? Heh.

Damn Mark, if you don't want to speculate then don't. We're free to discuss whatever we want and having your self-righteous "people don't have anything better to do" comments is pretty annoying.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,059
15,994
136
Well, after the 1 zillionth post on how the next generation chips MIGHT do, it's also really annoying. I have as much right t say that is annoying as you do to be annoying.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Is that jelousy I hear?

Not really. No PhD = no architecture definition.

btw mark, not everyone here is randomly guessing the performance. Plenty of people on this forum can make educated proposals. Don't get pissy because you don't know enough to join the discussion.
 

Furen

Golden Member
Oct 21, 2004
1,567
0
0
Originally posted by: Markfw900
Well, after the 1 zillionth post on how the next generation chips MIGHT do, it's also really annoying. I have as much right t say that is annoying as you do to be annoying.

Yes, of course... If you bothered to read the posts before commenting you'd notice that we say almost nothing about how Conroe might do. We talk about how absurd that article is (the thing is a freaking joke) and discuss FSB/HT, FB dimms, and FP performance in the K7/K8 and P6, hardly things that don't exist or we can't be fairly certain about. Regardless, if you dont want to read anything about Conroe then just dont freaking click on the topics that say "Conroe" and then complain because people are talking about it.
 

pedramrezai

Member
Sep 5, 2005
59
0
0
The way most of the people are thinking is just the way Intel has planned. I am a AMD fan but I can realize competition is good for customers. I was shocked by Core performance but using a handicapped AMD system really annoyed me. First, reviewers have already proven RD580 or solutions with dual 16x can deliver up to 10-15% more performance when paired with high-end, bandwith hungry vga cards. Second, we have been hearing of dual core optimizations in display drivers for some time but were unable to see something significant until we saw Conroe performance; I am quite suspicious over some hefty optimizations in intel-cooked display driver. Time will reveal. Third, this might be the beginning of a new SSEx game with unfair optimizations for a new technoogy.
I am surprized how people are trashing the current as well as future AMD64 technology.But remember that Core is not out yet and all these might be some optimizations that has granted it this performance level. Moreover, the current AMD64 technology is almost 3 years old and the new AM2 will update its specs. AMD did not like DDR2 high latency; What they are looking for is its higher frequency that can be paired with the new AM2 FSB.For Athlon 64 and Sempron a 333mhz FSB that paires with DDR2 666 and for the Fx parts a 400 mhz FSB pairing with DDR2 800. If DDR1 could reach these frequencies you could now see the real potential of AMD64. This kind of bandwith will give Core a hard time. Also remember that AMD is increasing cache (L2 and maybe L3). Shared cache is also something that will be seen in the future products and will bring huge performance gains. Based on the preliminary data of 200/266 async single channel bandwith of 3500mb/s a memory bandwith of >10k is expected in the final product and if Intel was going to compare its future platform, it was not fare to compare it with an infrastructure of >2 years old. I am sure the new AM2 will regain AMD reputation once again. But we all must remember that this competition between major players is good for the end users.
 

MrSpadge

Member
Sep 29, 2003
100
6
0
Pedramrezai,

>or solutions with dual 16x can deliver up to 10-15% more performance

This is similar to my aunt saying "never ever buy non-"BIO"-bananas, because the regular ones CAN get the insect poison stuff UP TO 17 times!". So did your banana get it 17 times or non at all? I didn't read any recent reviews about crossfire / SLI 8x/16x performance, because I don't care about it, so I have to ask you: does dual 16x deliver 10 - 15% more performance in fairly CPU-bound resolutions like 1024 and 1280? Because otherwise it doesn't matter for these tests.

>I am quite suspicious over some hefty optimizations in intel-cooked display driver

They're using standard drivers as far as I know / the reviewers said. Even if there was some Conroe-optimization done, don't you think it would find it's way into the regular drivers?

>this might be the beginning of a new SSEx game with unfair optimizations for a new technoogy.

So far I didn't read anything about special new instructions for Conroe. Seems a bit weird if intel put no SSE update in, but it may be possible. With that being said, Conroe did get a hefty SSE1/2/3 performance update: single cycle 128 Bit fp! (I guess for add and mul, but certainly not div) This is something quite important which doesn't require software optimization to work (though they would help to work well).

Please don't speak about a FSB when you're referring to the A64. FSB implies depedencies which aren't there with the A64. Same for the term "async" - there are not "async penalties" on A64. But back to topic: AM2 + DDR2 will help to close the gap between Conroe and A64 a bit, but possibly not as much as you think. Do you still remember, how much faster S939 was compared to S754? Normally 0 - 5%, sometimes (games) 10%. And that was already doubling the memory bandwidth. This time we need (still expensive) DDR2 800 4-4-4 to keep the latency constant and double the theoretical bandwidth again. This may well give games a 10% boost (quite important for these X2 - Conroe benchies), but it won't do anything for media encoding, rendering etc (thanks to clever algorithms).

Concerning cache: look at how amazingly well the Duron and Semporns do with their small caches. K7/8 has never been dependend on large L2 caches, thanks to the large L1 cache. Or negatively formulated: a bigger L2/3 won't help it as much as it does for intel. And while increasing the cache size further, you'll get diminishing returns. Add that to the fact that a shared L3 has to be slower than L2, and you won't get anything like "huge performance gains". Games and server apps should profit the most here, maybe with 5 - 10% as an educated guess.
But don't expect any increase in cache size before 65nm at the end of 2006 / beginning of 2007. The 1MB L2 90nm X2s are already large (=expensive) enough.

To sum things up: it's reasonable to expect AM2 to close the gap a bit, but Conroe is clearly the superior design, with it's P-M efficiency and it's damn powerful FPU. And remember that a 2.66GHz Conroe was compared to a 2.8GHz X2. Intel won't have much trouble to scale Conroe to 3GHz (EE on roadmaps). To catch up, AMD needs substatial changes in the A64 design, which we won't see before 65nm. One rumor I've seen talks about doubling the FPU units of K8. Together with the infrastructure to feed them with instructions, this might be what is necessary. Of course I'd like to include multi threading into such a design

Regards, MrS
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Originally posted by: Markfw900
I really think all this supposition is a waste. Wait until it comes out, and then discuss. I guess some people have nothing better to do than BS on things that are in the future....

Well, that's something you're just gonna have to deal with now isn't it?
Nobody grabs your hand, moves your mouse over just conroe threads and smashes your index finger with a ball pein hammer to enter the thread do they? You may be tired of the threads, but it still appears you want to see what everyone is talking about in these threads.

 

pedramrezai

Member
Sep 5, 2005
59
0
0
MrS,
Thanks for clarifications. I do not remember where I saw the difference between dual 16x and dual 8x but as far as I can remember the difference was significant. As far as I know the new Core architecture brings upon a new instruction set of SSE4. Finally, I am
quite optimistic about AMD 333/400 HTT pairing with 667/800 parts. Whatever happens,
this competition is beneficial for end users. Maybe its time for K9.
Regards,Pedram
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
I think conroe is using a very similar architecture compare to amd 64. short pipeline, more work done per clock, low mhz, low power consumption. This is testament to how successful a64 is. They in my opinion basically copied amd 64 by reverting back to classic Petium 3 based designs: pentium-m etc.

No, it means Intel badly screwed up with latest incarnation of the Netburst architecture. The architecture DOES have potential. How different would it be now if Prescott DID REACH 5GHz and similar/lower power?? Tejas at 7-10GHz for that matter.

P6 architecture SUCKS. It's why Intel went with Netburst, since ANOTHER design happened to be coming they thought was better. Little later a seperate design team decided they need a mobile architecture. They are not gonna take Netburst, with extraordinary clock speeds, aggressive designs.

Pentium M outperformed 256KB L2 cache Tualatin Celeron per clock by 30-50%!!

Prescott performed badly thanks to more pipeline stages, and MUCH, I mean, MUCH higher L1 and L2 latencies. Just if Intel put same latency L1 L2 latency as Northwood, Prescott would have out-performed Northwood PER-CLOCK. It won't be much, but it will be better. They just went too far with Prescott.

We might see Netburst technologies in the future: maybe not the double pumped ALU, but maybe the Trace Cache, I heard it has a lot of potential, pipelines will also increase slightly as it progresses, probably close to Northwood.
 

MrSpadge

Member
Sep 29, 2003
100
6
0
Right, now that you say it, I actually remember I did read that SSE4 will be included *doh*. I guess it's a minor update, like SSE3 was to SSE2. Damn it, I want an in-depth article on Conroe!

AMD doesn't need to increase the HTT clock to 400MHz to support DDR2 800. On A64, the mem clock is derived by dividing the CPU clock by a divider. You choose a smaller divider and you get a higher mem clock, independent of HTT clock. That's why I suppose AMD will officially support DDR2 800 as soon as it's readily available (not just as expensive overclockers mem) and has a JEDEC spec (don't know whether that's already the case). As long as there are no serious issues with the integrated mem controlers, this should be possible for all AM2 CPUs with a simple bios update, just like they added support for ~250MHz DDR1 in Rev. E.

@inteluser2000

I roughly agree with what you said. However, I'd suggest another reason for "why Prescott sucked" as the primary one: they DOUBLED the number of core logic transistors. On the one hand, they made it perform the same as Northwood, clock for clock, with a much longer pipeline. That's almost brilliant. On the other hand they need the double amount of core logic transistors to achieve the same performance - now that sucks, because these transistors eat so much energy. And so it happened that Prescott couldn't stretch it's wings and use the frequency headroom added by the longer pipeline and higher latency caches, just because it was getting too hot. I wonder what a 90nm Northwood with SSE3 and Prescott branch prediction & HT could have done. 5 - 10% faster per clock and reaching 4 - 4.5GHz without major issues on air?

MrS
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: stevty2889
Very interesting article. I don't think the FSB is going to be as much as a limiter for Conroe as the seem to believe however. While netburst chips are very bandwidth hungry, the FSB doesn't seem to have nearly as much impact on pentium-m's, and conroe should be a lot more similar to a pentium-m than to a netburst chip.

Conroe's 4 issue core should make it more bandwidth hungry than netburst...however wide cores are much harder to optimize for than deep cores, many algorithms just can't go parallel.

Still, I see AMD only holding an advantage in multisocket systems, Intel's shared cache idea is genius and will keep their multicore chips ahead of AMD's.
 

pedramrezai

Member
Sep 5, 2005
59
0
0
MrS,
As far as I can remember SSE2 was a major update for intel; at least they claimed it is a mojor instruction set and AMD was forced to include it in its products. Maybe SSE4 has the same fate.
I know that they do not need to increase HTT in order to use faster memories but it only makes sense to use faster memories with faster HTT in a 1:1 basis. I give you an example: WinRAR 2.50 has a benchmark tool an here are some benchmarks:

Athlon FX60@ 13x200=2600 (Dual DDR400) = 457 Kb/s (Neoseeker) Cache=2x1MB
Athlon FX60@ 11x250=2750 (Dual DDR500) = 530 Kb/s (Neoseeker) Cache=2x1MB
Sempron1.6Ghz@8x300=2400(Single DDR600) =605 Kb/s (My system) Cache=1x256Kb
Athlon64@12x300=2400(DualDDR600-ocz4800)=632 kb/s (Neoseeker) Cache=1x?

As you see at a certain frequency it is only mem bandwith that lets higher performance. a
single core A64@2400 with 50% more bandwith gives 38% more performance than a
dual core FX60. I think we must wait to see how future A64 will take advantage of DDR2.
 

Fox5

Diamond Member
Jan 31, 2005
5,957
7
81
Originally posted by: pedramrezai
MrS,
As far as I can remember SSE2 was a major update for intel; at least they claimed it is a mojor instruction set and AMD was forced to include it in its products. Maybe SSE4 has the same fate.
I know that they do not need to increase HTT in order to use faster memories but it only makes sense to use faster memories with faster HTT in a 1:1 basis. I give you an example: WinRAR 2.50 has a benchmark tool an here are some benchmarks:

Athlon FX60@ 13x200=2600 (Dual DDR400) = 457 Kb/s (Neoseeker) Cache=2x1MB
Athlon FX60@ 11x250=2750 (Dual DDR500) = 530 Kb/s (Neoseeker) Cache=2x1MB
Sempron1.6Ghz@8x300=2400(Single DDR600) =605 Kb/s (My system) Cache=1x256Kb
Athlon64@12x300=2400(DualDDR600-ocz4800)=632 kb/s (Neoseeker) Cache=1x?

As you see at a certain frequency it is only mem bandwith that lets higher performance. a
single core A64@2400 with 50% more bandwith gives 38% more performance than a
dual core FX60. I think we must wait to see how future A64 will take advantage of DDR2.

Eh, that may not be just bandwidth, as the frequency increases the latency tends to decrease as well. (unless much laxer memory timings are used, as with ddr2)
 

stevty2889

Diamond Member
Dec 13, 2003
7,036
8
81
Originally posted by: keysplayr2003
Originally posted by: Markfw900
I really think all this supposition is a waste. Wait until it comes out, and then discuss. I guess some people have nothing better to do than BS on things that are in the future....

Well, that's something you're just gonna have to deal with now isn't it?
Nobody grabs your hand, moves your mouse over just conroe threads and smashes your index finger with a ball pein hammer to enter the thread do they? You may be tired of the threads, but it still appears you want to see what everyone is talking about in these threads.

You do realize this is a month old thread that started before any of conroes info was widely available....so when he made that statment, there was very very limited info on conroe..
 

MBrown

Diamond Member
Jul 5, 2001
5,726
35
91
Just read the article. I know its old but from what I read and what I know about conroe now, it looks like AMD is just going to be increasing speed to compete with conroe. It was an interesting article anyway.
 

darkdemyze

Member
Dec 1, 2005
155
0
0
Heh, I just read it, again, even older but meh..

I agree that it was very bias toward AMD with again, alot of overemphasis on FBS limits. Even being an AMD fan, I can see that. Which at this point it can be seen alot of the claims aren't true. Although interesting, the article seemed to miss qutie a few key points.

Originally posted by: dmens
A few nitpicks:

1. The author misintepreted 4-issue. Intel nomenclature defines "issue" as the pipeline width from the frontend to the backend, whereas others define that as scheduler to execution, which intel refers to as "dispatch". Given that definition, the claim that the FSB will affect 4-wide issue is bunk, since that is instruction fetch, barely affected by bandwidth or even latency issues.
2. I fail to see how AMD64 should be considered "next generation" compared to merom, especially when no justification is given. Of course AMD has technologies that merom lack, but the exact same can be said vice versa.
3. The author makes a comparison between merom's FSB frequency and AMD64's on-die memory controller frequency, which is kind of meaningless.
4. Large caches are not the only way to design around lower memory bandwidth. Hard/soft prefetchers do wonders on many workloads... sometimes wiping out the entire latency differential. Naturally that exposes glass jaws, but as long as the common case is fast, the processor will do ok.
5. If anything, 64-bit registers will shrink text size, since the compiler will generates less text for code which manipulates 64/128 bit values. As for larger datasets, is that really true? Since the registers and memory are still accessible at smaller granularities, programmers will not suck up more memory for the sake of it. Although I agree that programs are hogging more memory, in which case, a larger cache will help just as much as anything else.
6. The article makes no mention of FB-DIMM, which conroe/woodcrest will support.
7. The author asserts that the FSB will limit the efficiency of merom's wider issue width. True, but only if merom's buffer structures are narrow in depth... but they're not (sorry I cannot give numbers). Combine that with a high speed FSB, smarter prefetcher, software assist, etc, I believe merom's glass jaws due to memory fetch bandwidth will not be anywhere as severe as the article says.

I agree that CSI will be an equalizer, but the article definitely puts an overemphasis on the FSB, which only makes a performance difference at the high end. As mentioned above, there are many ways to design around the low fetch bandwidth.

Well said.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |