Barcelona 'openssl speed' benchmarks

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Originally posted by: Viditor
Originally posted by: SniperDaws
id hate to get into an argument with you lot, Ewwwww proper geeks, i wouldnt have a leg to stand on, you'd blind me with science and win every time. all i know is the K10 isnt going to be the big deal everyone is expecting it to be.

And you know this because the Magic 8 Ball told you so?
We should stop all this arguing over which cpu company is "better" and just put viditor and sniperdaws in the ring and let them have at it. Last one who can run small fft's and 3dmark 06 for 30 min straight is the winner
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Thing is, as soon as Barc and Phenom are big enough to deliver if they even do, we should start hearing about and seeing Intels next gen Nehelam. I honestly think if what AMDs bringing to the table barely competes with Intels current lineup how are they going to fair next time around? It's not like AMD is going to be tossing another new architecture out by next year to compete with Nehelam, maybe higher clocks, but even then I have a feeling they'll be left in the dust.
 

DrMrLordX

Lifer
Apr 27, 2000
22,590
12,476
136
Originally posted by: bfdd
Thing is, as soon as Barc and Phenom are big enough to deliver if they even do, we should start hearing about and seeing Intels next gen Nehelam. I honestly think if what AMDs bringing to the table barely competes with Intels current lineup how are they going to fair next time around? It's not like AMD is going to be tossing another new architecture out by next year to compete with Nehelam, maybe higher clocks, but even then I have a feeling they'll be left in the dust.

As the OP has pointed out, K10 already seems to do pretty well against Intel processors in OpelSSL. Furthermore, K10 processors should be strutting their stuff by November if what Kubicki says is correct. Any hype Intel starts over Nehalem will be just that until '08 at the earliest.

 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Don't no if this link will work but I will try it.

http://babelfish.altavista.com/babelfish/tr

Na it won't transferr. But you guys can use the translation tool to read the entire article.

Basicly it reads like an AMD advertizement.

I choose Babelfish. For good reason . For over a year thats all K10 has been. Just Babel!

We have already seen 2 other reviews of K10 be it not in the best platform. Never the less I think it would be easy to find a german site that would by no surprize skew review results. Now lets just wait for phenom and cut the BS.

Than we all well know the true facts instead of all the AEG type hype.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: zpdixon42
dmens wrote: uh, yeah, that'd explain why c2d is kicking ass in just about every other integer benchmark.

I have contributed some optimized assembly code to OpenSSL, I know what I am talking about. Its RSA implementation does NOT use the FPU at all. Heck look at its source code, this is an opensource project. The world is not black & white. It's not like Core has to win every integer benchmark and K10 has to win every floating point benchmark. There are dozens of architectural differences (see list in my previous post) which might advantage, in some cases, a processor you wouldn't expect to perform better than its competitor.

if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: dmens
Originally posted by: Viditor
If you re-read Kris's blog, you'll see that the performance enhancement was actually a bug fix...

"These processors, manufactured after work-week 30 (WW30 for those who work in the corporate world) include errata fixes not present in the chips reviewed on September 10th"

The fixes resulted in a net 5%+ gain in performance...

sorry i find that almost impossible. a 5% performance deviation from modeling should have been found almost immediately and fixed well before qualification.

that is unless this 5% gain is a frequency fix, but that of course means the performance of the supposed "BA" stepping will be the same as the ones reviewed when clocked at the same frequency.

I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.
 

zpdixon42

Junior Member
Sep 17, 2007
8
0
0
dmens wrote:
if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.

Don't insult me please. The difference in L1 cache size seems, at least, a plausible explanation.

There are simple experiments that any programmer can reproduce to demonstrate the impact of the L1 cache size and associativity level. By writing a loop accessing the first few bytes of many cache lines and by tuning it so the number of accessed lines don't fit in the Core microarchitecture L1 cache, it will perform 5x-10x faster on K10. Similarly this loop could be modified to fit in the cache of both processors but could exploit the higher 8-way associativity level of Core to run much faster than on K10 (2-way cache). A cache line is 64 bytes (2**6) on Core and K10, so bits 0 to 5 of a memory address represent the offset in the cache line. The 64-kB 2-way set associative data cache on K10 means there are 65536/2=32768 sets (2**15), so bits 6 to 20 of a memory address represent the set number. The remaining bits (21 to 31) are the tag number. Since the K10 cache can only store 2 cache lines with the same set number in the same set, then by accessing repetitively in a loop, say 3 different memory adresses having the same set number but a different tag (e.g. 0x200000, 0x400000, 0x600000) you will end up with a lot of cache misses on K10, and a much slower execution speed compared to Core (on the order of 5x-10x).

Now of course the 5x-10x speed differences I mention above are never observed in real-world applications, they represent the absolute worst case. But still it is a demonstration that simple microarchitectural characteristics that nobody pays attention to can be responsible for significant perf advantages or inconvenients.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Originally posted by: dmens
Originally posted by: zpdixon42
dmens wrote: uh, yeah, that'd explain why c2d is kicking ass in just about every other integer benchmark.

I have contributed some optimized assembly code to OpenSSL, I know what I am talking about. Its RSA implementation does NOT use the FPU at all. Heck look at its source code, this is an opensource project. The world is not black & white. It's not like Core has to win every integer benchmark and K10 has to win every floating point benchmark. There are dozens of architectural differences (see list in my previous post) which might advantage, in some cases, a processor you wouldn't expect to perform better than its competitor.

if you knew what you were talking about, you wouldn't have listed any of the items that you did, the real reason is something else.
what's the real reason?

 

AlabamaCajun

Member
Mar 11, 2005
126
0
0
The problem with trying to force L1 cache is that the look ahead architecture can load the data faster that the time it takes for the code that needs it to drop through the pipes. It might show if we optimize for Opteron but it seems that Intels large L2 keeps up even with the FSB bottleneck.

Personally I don't see why people think rooting for Intel is a sport like a favorite team. It's a monopolistic company for godsakes.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: zpdixon42
Don't insult me please. The difference in L1 cache size seems, at least, a plausible explanation.

There are simple experiments that any programmer can reproduce to demonstrate the impact of the L1 cache size and associativity level. By writing a loop accessing the first few bytes of many cache lines and by tuning it so the number of accessed lines don't fit in the Core microarchitecture L1 cache, it will perform 5x-10x faster on K10. Similarly this loop could be modified to fit in the cache of both processors but could exploit the higher 8-way associativity level of Core to run much faster than on K10 (2-way cache). A cache line is 64 bytes (2**6) on Core and K10, so bits 0 to 5 of a memory address represent the offset in the cache line. The 64-kB 2-way set associative data cache on K10 means there are 65536/2=32768 sets (2**15), so bits 6 to 20 of a memory address represent the set number. The remaining bits (21 to 31) are the tag number. Since the K10 cache can only store 2 cache lines with the same set number in the same set, then by accessing repetitively in a loop, say 3 different memory adresses having the same set number but a different tag (e.g. 0x200000, 0x400000, 0x600000) you will end up with a lot of cache misses on K10, and a much slower execution speed compared to Core (on the order of 5x-10x).

Now of course the 5x-10x speed differences I mention above are never observed in real-world applications, they represent the absolute worst case. But still it is a demonstration that simple microarchitectural characteristics that nobody pays attention to can be responsible for significant perf advantages or inconvenients.

victim cache?

anyways, my point is that the benchmark you used is a known performance issue even on c2d, whereas the items you listed are generics, none of which are responsible for that much skew off the average. sorry if i sounded abrupt.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: Viditor
I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.

to clarify, i doubt amd would allow a 5% performance drop to be seen by the public with promised fixes, it would've been fixed quickly very early in the qualification stage. imho, letting the public or even customers know of such issues is not good for customer confidence in products, and hence not good business.

 

savageseb

Junior Member
Aug 1, 2007
1
0
0
Originally posted by: AlabamaCajun
The problem with trying to force L1 cache is that the look ahead architecture can load the data faster that the time it takes for the code that needs it to drop through the pipes. It might show if we optimize for Opteron but it seems that Intels large L2 keeps up even with the FSB bottleneck.

Personally I don't see why people think rooting for Intel is a sport like a favorite team. It's a monopolistic company for godsakes.

And i couldnt agree more, muahauhahaha

my first post at anandtech! wooooooo, lols
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: dmens
Originally posted by: Viditor
I'm not sure what you mean here dmens...

are you saying that the errata should have been fixed a long time ago?
If so, I agree...and I believe that was one of Kris' points in the article. However that doesn't mean that BA hasn't fixed it in time for shipping (no matter how last minute).

Are you saying that AMD shouldn't have sent the B1s to anyone for review?
Undisputably true...as I said, a VERY botched launch.

Are you saying that you can't imagine what kind of errata would cause a 5% performance loss (besides a freq errata)?

I can think of quite a few myself...and I'm certainly no expert.
For instance, the B0 stepping was supposedly only able to perform half of the 128bit loads/clock with anything over 1.3GHz...
The errata # for B1 was supposedly #281 (which you can see is no longer on the production errata sheet). Sorry but I can't document this as it was told to me in an e-mail, so use the proper amount of salt at your discretion...
Obviously they used a BIOS workaround for the B1 pre-production sample, and how efficient this work-around was is also unknown...but I have no problem believing that fixing the problem itself could yield a significant increase.

You should also note that B1 is not listed as a production chip either. The only 2 steppings listed are BA and B2 on the errata sheet.

to clarify, i doubt amd would allow a 5% performance drop to be seen by the public with promised fixes, it would've been fixed quickly very early in the qualification stage. imho, letting the public or even customers know of such issues is not good for customer confidence in products, and hence not good business.

I agree that it was a very bad move on AMD's part...but it does seem evident that the stepping they shipped for review cannot be the production stepping (the errata sheet alone seems to bear that out, even if Kris, the developer, and the AMD engineer hadn't mentioned it).

I can only imagine the causes...
Remember that the review systems are organized by marketing, and H Richard had already announced his leaving...this alone would cause confusion in the dept.
It could also be that the only working review systems they had contained the BIOS for B1 (including the patch for the errata), and they were too slow off the mark getting a working mobo (especially the BIOS) ready for launch. This seems to me to be the most likely scenario as the first chip of the BA stepping wasn't turned out until WW30 (end of July).
 

Acanthus

Lifer
Aug 28, 2001
19,915
2
76
ostif.org
It looks like the fanbois finally found their one benchmark to parade around, yay.

In the meantime ill sit on my hands and wait for clockspeed ceiling to rise on barc.
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Originally posted by: Viditor
Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Phynaz
Originally posted by: Viditor
Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

LOL...And I find it predictable that you would consider George Ou "an accomplished technology journalist".

BTW, what the poster was saying is that the difference between B1 and BA was a Northbridge fix (whether this is true or not I have no idea, but it's interesting).
However, if you look at page 12 of the errata sheet, table 7 shows steppings DR-BA and DR-B2 as the production steppings as of Sept 07...stepping B1 (what was reviewed) is not even listed as it ws an Engineering Sample only.
 

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

To answer your question from my own POV, I think it's important to keep asking for the most information we can get...
There are obviously quite a few holes in our knowledge of Barcelona at this point, and I feel that mentioning what they might be and discussing them can only help us as we pass along our requests to review sites like AT.
At the end of the day, it's only places like AT that are going to answer some of these questions...
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?


Hey, good idea! I think I'll start that thread.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Viditor
Originally posted by: keysplayr2003
Why doesn't everyone just post what they "want" to see in this thread? It would cut down a lot on the beating around the bush posts. I mean, I can see why certain members arguments are what they are. But what is the point? Seriously?

To answer your question from my own POV, I think it's important to keep asking for the most information we can get...
There are obviously quite a few holes in our knowledge of Barcelona at this point, and I feel that mentioning what they might be and discussing them can only help us as we pass along our requests to review sites like AT.
At the end of the day, it's only places like AT that are going to answer some of these questions...

Given that Barcelonas are available now, couldn't you just buy one and run the tests you're interested in? With targeted tests you can determine a lot of microarchitectural details (see microarchitecture.pdf).
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,035
15,984
136
$340 for a cpu and $270 for a moytherboard, just to test ? Oh, and then memory ?
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Originally posted by: Viditor
Originally posted by: Phynaz
Originally posted by: Viditor
Originally posted by: Nemesis 1
Maybe this will help some of you guys out . I don't know but this is the best debate I have read on this subject.

http://aceshardware.freeforums.org/viewtopic.php?t=178

Thanks for that Nemesis...
There is some good stuff in there (at least until George Ou showed up...)

The poster "someoneelse" had some interesting info...
"BA is B1 with the northbridge fix. FWIW they found the erratum after B2 taped out"

I find it odd that you find posts by "someoneelse" more interesting that posts by George Ou. An anonymous person versus an accomplished technology journalist.

Of course George Ou doesn't support your pro AMD position.

Oh well, I suppose you will now be spreading the rumor of a nothbridge fix in some stepping or another.

LOL...And I find it predictable that you would consider George Ou "an accomplished technology journalist".

BTW, what the poster was saying is that the difference between B1 and BA was a Northbridge fix (whether this is true or not I have no idea, but it's interesting).
However, if you look at page 12 of the errata sheet, table 7 shows steppings DR-BA and DR-B2 as the production steppings as of Sept 07...stepping B1 (what was reviewed) is not even listed as it ws an Engineering Sample only.

Your post implies that George Ou is not exactly what I say he is. Why do you say that?
Or is Theo Valich more your type of jounalist?

 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |