Linux 2.6 and Hyper-Threading
Published on 2004-02-23 12:10:07 By: Jim_

The last time we got together, we took a brief look into the technology behind hyper-threading and performance with a variety of Windows benchmarks. At the end of that article I vowed to bring you the Linux side of the story at a later date. Well friends, that time has arrived.

Since the last installment, my testbed has strengthened considerably with the addition of 2x 3.2GHz Xeons with 1MB of L3 cache and a 3.2E Pentium 4 'Prescott'. The latter has received an upgrade of hyper-threading technology and Intel was kind enough to let us take a look at the impact it may have on Linux performance.

Anyone who's ever set out to perform Linux benchmarks quickly realizes the difficulties involved in such an undertaking, not only with the availability of quality benchmarks (or lack thereof), but also in the way the test system(s) are configured. Most of the Linux benchmarks that I see on hardware review sites are simple things like kernel compiles or povray... maybe a game benchmark or two. Those certainly have merit, but I wanted to try to do things differently for this article. I wanted to get more involved with server-oriented benchmarks to really see what hyper-threading brings to that market. Don't get me wrong, we're still going to take a look at compiling performance and even media-encoding performance, but those won't be the most interesting results you'll see here today.

One of the other problems that I see in articles featuring Linux benchmarks is a lack of details when it comes to system configuration, compile options, versions of software and libraries, and so forth. Consider this article an act of full disclosure. I'm going to give you as much background information as humanly possible so you really get a sense of how my test systems were configured. Curious how I compiled apache? I'm going to tell you. What version of GCC was I using? You'll know. How did I configure my kernel? You can peek at the .config file for yourself.

Prescott's Improved Hyper-Threading

While the general concensus on Prescott's performance in its current state (namely clockspeed) has not been glowing, we should take a moment to go over the architectural changes meant to improve hyper-threading. The simplest change Intel made to Prescott was increasing the size of the respective L caches on the processor. The L1 data cache was bumped from 8KB to 16KB and the L2 from 512KB to 1024KB. As I discussed previously, when you have two logical processors competing for shared resources, more is better.

I don't consider myself to be a processor architecture guru, so whenever a new processor rolls along I generally refer to Ace's Hardware. They were kind enough to break down the other more subtle changes which should result in improved hyper-threading performance. From their Prescott review:

  • 64K address aliasing has been upgraded to 4M aliasing
  • Store Buffers have been increased from 24 to 32
  • Load Request Buffers have doubled from 4 to 8
  • Write Combining Buffers increased from 6 to 8
  • Floating point schedulers (x87/SSE/SSE2/SSE3) now have 4 more entries in the queue to find more parallelism
  • Additional WC Buffers. Instead of sending small pieces of data to the AGP video card, these pieces of data are stored together in buffers, and sent through in one big burst. This helps to preserve FSB bandwidth as the bandwidth of the FSB is more efficiently used (less overhead from one big burst than from many small ones, fewer bus turnarounds, etc.)

Additionally, two new instructions were added to Prescott: Monitor and mWait. These should improve performance by increasing efficiency while also decreasing power consumption. Based on their musings, it appears as if we'll just have to wait for an OS patch before these improvements can be realized. From what I gather, the decreased power consumption will certainly be welcome.

The first thing we have to do is break down the configuration of my test systems. We'll not only delve into the hardware used, but also peer into my OS and software configuration with a magnifying glass.

 
[ Next to Page 2 ]