Barcelona 'openssl speed' benchmarks

bryanW1995 · Sep 30, 2007

Originally posted by: Viditor

Originally posted by: SniperDaws
id hate to get into an argument with you lot, Ewwwww proper geeks, i wouldnt have a leg to stand on, you'd blind me with science and win every time. all i know is the K10 isnt going to be the big deal everyone is expecting it to be.

Click to expand...

And you know this because the Magic 8 Ball told you so?

We should stop all this arguing over which cpu company is "better" and just put viditor and sniperdaws in the ring and let them have at it. Last one who can run small fft's and 3dmark 06 for 30 min straight is the winner

bfdd · Sep 30, 2007

Thing is, as soon as Barc and Phenom are big enough to deliver if they even do, we should start hearing about and seeing Intels next gen Nehelam. I honestly think if what AMDs bringing to the table barely competes with Intels current lineup how are they going to fair next time around? It's not like AMD is going to be tossing another new architecture out by next year to compete with Nehelam, maybe higher clocks, but even then I have a feeling they'll be left in the dust.

DrMrLordX · Sep 30, 2007

Originally posted by: bfdd
Thing is, as soon as Barc and Phenom are big enough to deliver if they even do, we should start hearing about and seeing Intels next gen Nehelam. I honestly think if what AMDs bringing to the table barely competes with Intels current lineup how are they going to fair next time around? It's not like AMD is going to be tossing another new architecture out by next year to compete with Nehelam, maybe higher clocks, but even then I have a feeling they'll be left in the dust.

As the OP has pointed out, K10 already seems to do pretty well against Intel processors in OpelSSL. Furthermore, K10 processors should be strutting their stuff by November if what Kubicki says is correct. Any hype Intel starts over Nehalem will be just that until '08 at the earliest.

Nemesis 1 · Sep 30, 2007

Don't no if this link will work but I will try it.

http://babelfish.altavista.com/babelfish/tr

Na it won't transferr. But you guys can use the translation tool to read the entire article.

Basicly it reads like an AMD advertizement.

I choose Babelfish. For good reason . For over a year thats all K10 has been. Just Babel!

We have already seen 2 other reviews of K10 be it not in the best platform. Never the less I think it would be easy to find a german site that would by no surprize skew review results. Now lets just wait for phenom and cut the BS.

Than we all well know the true facts instead of all the AEG type hype.

dmens · Sep 30, 2007

Originally posted by: zpdixon42

dmens wrote: uh, yeah, that'd explain why c2d is kicking ass in just about every other integer benchmark.

Click to expand...

I have contributed some optimized assembly code to OpenSSL, I know what I am talking about. Its RSA implementation does NOT use the FPU at all. Heck look at its source code, this is an opensource project. The world is not black & white. It's not like Core has to win every integer benchmark and K10 has to win every floating point benchmark. There are dozens of architectural differences (see list in my previous post) which might advantage, in some cases, a processor you wouldn't expect to perform better than its competitor.

if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.

Viditor · Sep 30, 2007

Originally posted by: dmens

Originally posted by: Viditor
If you re-read Kris's blog, you'll see that the performance enhancement was actually a bug fix...

"These processors, manufactured after work-week 30 (WW30 for those who work in the corporate world) include errata fixes not present in the chips reviewed on September 10th"

The fixes resulted in a net 5%+ gain in performance...

Click to expand...

sorry i find that almost impossible. a 5% performance deviation from modeling should have been found almost immediately and fixed well before qualification.

that is unless this 5% gain is a frequency fix, but that of course means the performance of the supposed "BA" stepping will be the same as the ones reviewed when clocked at the same frequency.

I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.

zpdixon42 · Oct 1, 2007

dmens wrote:
if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.

Don't insult me please. The difference in L1 cache size seems, at least, a plausible explanation.

There are simple experiments that any programmer can reproduce to demonstrate the impact of the L1 cache size and associativity level. By writing a loop accessing the first few bytes of many cache lines and by tuning it so the number of accessed lines don't fit in the Core microarchitecture L1 cache, it will perform 5x-10x faster on K10. Similarly this loop could be modified to fit in the cache of both processors but could exploit the higher 8-way associativity level of Core to run much faster than on K10 (2-way cache). A cache line is 64 bytes (2**6) on Core and K10, so bits 0 to 5 of a memory address represent the offset in the cache line. The 64-kB 2-way set associative data cache on K10 means there are 65536/2=32768 sets (2**15), so bits 6 to 20 of a memory address represent the set number. The remaining bits (21 to 31) are the tag number. Since the K10 cache can only store 2 cache lines with the same set number in the same set, then by accessing repetitively in a loop, say 3 different memory adresses having the same set number but a different tag (e.g. 0x200000, 0x400000, 0x600000) you will end up with a lot of cache misses on K10, and a much slower execution speed compared to Core (on the order of 5x-10x).

Now of course the 5x-10x speed differences I mention above are never observed in real-world applications, they represent the absolute worst case. But still it is a demonstration that simple microarchitectural characteristics that nobody pays attention to can be responsible for significant perf advantages or inconvenients.

bryanW1995 · Oct 1, 2007

Originally posted by: dmens

Originally posted by: zpdixon42

dmens wrote: uh, yeah, that'd explain why c2d is kicking ass in just about every other integer benchmark.

Click to expand...

I have contributed some optimized assembly code to OpenSSL, I know what I am talking about. Its RSA implementation does NOT use the FPU at all. Heck look at its source code, this is an opensource project. The world is not black & white. It's not like Core has to win every integer benchmark and K10 has to win every floating point benchmark. There are dozens of architectural differences (see list in my previous post) which might advantage, in some cases, a processor you wouldn't expect to perform better than its competitor.

Click to expand...

if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.

what's the real reason?

AlabamaCajun · Oct 1, 2007

The problem with trying to force L1 cache is that the look ahead architecture can load the data faster that the time it takes for the code that needs it to drop through the pipes. It might show if we optimize for Opteron but it seems that Intels large L2 keeps up even with the FSB bottleneck.

Personally I don't see why people think rooting for Intel is a sport like a favorite team. It's a monopolistic company for godsakes.

dmens · Oct 1, 2007

Originally posted by: zpdixon42
Don't insult me please. The difference in L1 cache size seems, at least, a plausible explanation.

There are simple experiments that any programmer can reproduce to demonstrate the impact of the L1 cache size and associativity level. By writing a loop accessing the first few bytes of many cache lines and by tuning it so the number of accessed lines don't fit in the Core microarchitecture L1 cache, it will perform 5x-10x faster on K10. Similarly this loop could be modified to fit in the cache of both processors but could exploit the higher 8-way associativity level of Core to run much faster than on K10 (2-way cache). A cache line is 64 bytes (2**6) on Core and K10, so bits 0 to 5 of a memory address represent the offset in the cache line. The 64-kB 2-way set associative data cache on K10 means there are 65536/2=32768 sets (2**15), so bits 6 to 20 of a memory address represent the set number. The remaining bits (21 to 31) are the tag number. Since the K10 cache can only store 2 cache lines with the same set number in the same set, then by accessing repetitively in a loop, say 3 different memory adresses having the same set number but a different tag (e.g. 0x200000, 0x400000, 0x600000) you will end up with a lot of cache misses on K10, and a much slower execution speed compared to Core (on the order of 5x-10x).

Now of course the 5x-10x speed differences I mention above are never observed in real-world applications, they represent the absolute worst case. But still it is a demonstration that simple microarchitectural characteristics that nobody pays attention to can be responsible for significant perf advantages or inconvenients.

victim cache?

anyways, my point is that the benchmark you used is a known performance issue even on c2d, whereas the items you listed are generics, none of which are responsible for that much skew off the average. sorry if i sounded abrupt.

dmens · Oct 1, 2007

Originally posted by: Viditor
I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.

to clarify, i doubt amd would allow a 5% performance drop to be seen by the public with promised fixes, it would've been fixed quickly very early in the qualification stage. imho, letting the public or even customers know of such issues is not good for customer confidence in products, and hence not good business.

savageseb · Oct 1, 2007

Originally posted by: AlabamaCajun
The problem with trying to force L1 cache is that the look ahead architecture can load the data faster that the time it takes for the code that needs it to drop through the pipes. It might show if we optimize for Opteron but it seems that Intels large L2 keeps up even with the FSB bottleneck.

Personally I don't see why people think rooting for Intel is a sport like a favorite team. It's a monopolistic company for godsakes.

And i couldnt agree more, muahauhahaha

my first post at anandtech! wooooooo, lols

Viditor · Oct 1, 2007

Originally posted by: dmens

Originally posted by: Viditor
I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.

Click to expand...

to clarify, i doubt amd would allow a 5% performance drop to be seen by the public with promised fixes, it would've been fixed quickly very early in the qualification stage. imho, letting the public or even customers know of such issues is not good for customer confidence in products, and hence not good business.

I agree that it was a very bad move on AMD's part...but it does seem evident that the stepping they shipped for review cannot be the production stepping (the errata sheet alone seems to bear that out, even if Kris, the developer, and the AMD engineer hadn't mentioned it).

I can only imagine the causes...
Remember that the review systems are organized by marketing, and H Richard had already announced his leaving...this alone would cause confusion in the dept.
It could also be that the only working review systems they had contained the BIOS for B1 (including the patch for the errata), and they were too slow off the mark getting a working mobo (especially the BIOS) ready for launch. This seems to me to be the most likely scenario as the first chip of the BA stepping wasn't turned out until WW30 (end of July).

Nemesis 1 · Oct 1, 2007

Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Acanthus · Oct 1, 2007

It looks like the fanbois finally found their one benchmark to parade around, yay.

In the meantime ill sit on my hands and wait for clockspeed ceiling to rise on barc.

Viditor · Oct 1, 2007

Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

Phynaz · Oct 1, 2007

Originally posted by: Viditor

Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Click to expand...

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

Viditor · Oct 1, 2007

Originally posted by: Phynaz

Originally posted by: Viditor

Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Click to expand...

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

Click to expand...

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

LOL...And I find it predictable that you would consider George Ou "an accomplished technology journalist".

BTW, what the poster was saying is that the difference between B1 and BA was a Northbridge fix (whether this is true or not I have no idea, but it's interesting).
However, if you look at page 12 of the errata sheet, table 7 shows steppings DR-BA and DR-B2 as the production steppings as of Sept 07...stepping B1 (what was reviewed) is not even listed as it ws an Engineering Sample only.

Keysplayr · Oct 1, 2007

Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

Viditor · Oct 1, 2007

Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

To answer your question from my own POV, I think it's important to keep asking for the most information we can get...
There are obviously quite a few holes in our knowledge of Barcelona at this point, and I feel that mentioning what they might be and discussing them can only help us as we pass along our requests to review sites like AT.
At the end of the day, it's only places like AT that are going to answer some of these questions...

Phynaz · Oct 1, 2007

Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

Hey, good idea! I think I'll start that thread.

CTho9305 · Oct 1, 2007

Originally posted by: Viditor

Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

Click to expand...

To answer your question from my own POV, I think it's important to keep asking for the most information we can get...
There are obviously quite a few holes in our knowledge of Barcelona at this point, and I feel that mentioning what they might be and discussing them can only help us as we pass along our requests to review sites like AT.
At the end of the day, it's only places like AT that are going to answer some of these questions...

Given that Barcelonas are available now, couldn't you just buy one and run the tests you're interested in? With targeted tests you can determine a lot of microarchitectural details (see microarchitecture.pdf).

Markfw · Oct 1, 2007

$340 for a cpu and $270 for a moytherboard, just to test ? Oh, and then memory ?

Phynaz · Oct 1, 2007

Originally posted by: Viditor

Originally posted by: Phynaz

Originally posted by: Viditor

Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Click to expand...

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

Click to expand...

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

Click to expand...

LOL...And I find it predictable that you would consider George Ou "an accomplished technology journalist".

BTW, what the poster was saying is that the difference between B1 and BA was a Northbridge fix (whether this is true or not I have no idea, but it's interesting).
However, if you look at page 12 of the errata sheet, table 7 shows steppings DR-BA and DR-B2 as the production steppings as of Sept 07...stepping B1 (what was reviewed) is not even listed as it ws an Engineering Sample only.

Your post implies that George Ou is not exactly what I say he is. Why do you say that?
Or is Theo Valich more your type of jounalist?

bradley · Oct 1, 2007

George Ou? The same guy who writes for ZDNet, and defends the dumbed-down concepts behind monopolies and oligopolies with great furvor and zeal? I didn't realize he was so respected, or that many visited ZDNet for technology information.

http://www.amdzone.com/index.p...4c6f4a72f61f0169525e20

http://www.amdzone.com/modules...&file=article&sid=8018

Barcelona 'openssl speed' benchmarks

Lifer

Lifer

Lifer

Lifer

Platinum Member

Diamond Member

Junior Member

Lifer

Member

Platinum Member

Platinum Member

Junior Member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Elite Member

Diamond Member

Lifer

Elite Member

Moderator Emeritus, Elite Member

Lifer

Diamond Member