Intel Bensley Platform Preview
Posted on: 11/07/2005 06:00 AM

SiSoft Sandra 2005 (cont'd)

I have been a huge fan of Ace's precompiled Linpack binary for a long time. Recently though, we've started running into the limits of the benchmark as it only supports datasets out to 2MB. Since a lot of the modern processors we look at around here have L caches larger than 2MB, Linpack doesn't let us see how bandwidth scales all the way out to main memory. I think SiSoft saw an opening here, and while their Cache/Memory benchmark isn't Linpack, it does do what we need it to do.

This graph is interesting to look at for a few reasons. The first, most obvious reason is to see how the bandwidth falls as the datasets move through the L caches and out into main memory. The second, more interesting thing to look at is the actual memory bandwidth. I wish there was a better way for me to scale the graph because the differences in main memory bandwidth don't look quite as dramatic as they are. Way out there in the 1gb dataset range, the Nocona Xeons are chugging along at 1850MB/s, the Dempsey Xeons at 4430MB/s and the Opterons are still managing 5485MB/s!

Obviously Intel has done some major work to the memory controller on the Blackford chipset, but it really looks like nothing can beat the Opterons' on-die controller.

Black & Scholes Kernel
In 1973, Black and Scholes developed a model for estimating the value of a stock option, which has been refined over the years to remove several assumptions, thus making techniques based on the model very accurate. Today, financial analysts rely on algorithms based on the Black and Scholes technique to determine the price of a stock option.

This benchmark constitutes of a kernel that implements a derivative of the Black and Scholes technique. The code was developed at SunGard, and utilizes a continuous fraction technique, which is more accurate than the more traditional polynomial approximation technique.

The workload for this benchmark comes in the form of loop iterations internal to the code. The number of steps used in calculating option price, is set to 1e8 (100,000,000) by default. This value can be changed via command line parameter. The number of threads to use can also be specified as a command line parameter.
The Black Scholes kernel benchmark is one that Jim and I were turned onto by the guys at Intel. The reason why it makes such a good benchmark is because it is a real world application and it is completely scalable. If you have two or thirty-two processors, it doesn't matter. Just specify the number of threads and the numbers of steps and have at it. We did have a few problems initially with the application that Intel provided as it would not run on our 64-bit operating systems at all. Luckily the good folks at Intel also provided us with the source code, and with a little help from our IRC/forum personality "AssKoala", we now have our very own 64-bit native binary (optimized and compiled in Visual Studio 2005 with the default Microsoft compiler).

You may be looking at the graph above and thinking that the comparison between the processors used isn't even fair. But if you look closer you begin to see a different picture. As I mentioned, this application is extremely scalable. The more threads you add, the faster it is. Taking that into account, we can assume almost a 100% speedup moving from two threads to four, and if you remember that the Opteron 246s run at an actual 2.0ghz clock speed... Well, I'll let you figure out the rest.

Printed from (,4.html)