Linux 2.6 and Hyper-Threading - Kernel Compiles and MP3 Encoding
Published on 2004-02-23 12:10:07 By: Jim_

We'll get the boring kernel compile graph out of the way first. We'll break down the compiling performance on a processor-by-processor basis afterwards.

Starting with the 3.2GHz Prescott with its improved hyper-threading (HT) we see that when we issue a simple 'make' it finishes the compile of our 2.6.2 kernel in 327.06 seconds. When we enable HT and specify two processes with 'make -j 2', we see a decrease in compile time of ~23% (~75 seconds).

The Northwood-based 3.2GHz P4 performs better overall in our compile tests than the Prescott, but enabling HT and issuing a 'make -j 2' only shows an ~18.7% decrease in compile time (~56 seconds).

The Xeon results are a little more interesting. Look at what happens when we enable HT and run with 'make -j 2'. It actually takes longer to compile the code with HT enabled! Since issuing a 'make -j 4' with HT enabled seemingly fixes the problem, I'll assume the previous results are due to some scheduling confusion as I did tell the compiler to issue two simultaneous processes but we had 4 logical processors awaiting work. The Xeons are the big winners in this test; as we went from HT disabled and a simple 'make' to HT enabled with 'make -j 4', we got a ~57% decrease in overall compile time. That's impressive.

You'll notice that I didn't add 'make -j 4' results for the two uniprocessor test machines. The reason behind this is simple enough; the numbers were literally identical to the 'make -j 2' results. This also holds true when going beyond 'make -j 4' on the dual Xeon machine. I tested all the way up to 16 simultaneous processes and after 'make -j 4', I didn't see any improvements.

As we move along to our next benchmark, BladeEnc mp3 encoding, I should provide some background information. If I'm going to do audio encoding at home, I generally use LAME (as do most of you, probably) but since it's not multi-threaded in the least I went in search of a threaded encoder that could possibly show us some interesting results in our analysis of HT. A gentleman in our irc channel (#2cpu on irc.freenode.net) pointed me to BladeEnc. The author decided to parallelize the application using the message passing interface (MPI). The original intent was to split the process of mp3 encoding across several machines, but he does have this to say about its use on multiple processor platforms:

"So does this scheme work with SMPs?

Absolutely. One of the nice things about MPI is that it intentionally doesn't distinguish between whether ranks are on the same physical machine or not. When you use MPI to send a message, you just rely on the MPI implementation to do the fastest thing to send the message to the destination rank (regardless of whether the source and destination ranks are on the same machine or not)."

Since it sounded relatively cool, I downloaded and configured LAM/MPI and compiled BladeEnc. To ensure BladeEnc would be compiled with MPI and not the default C compiler, I had to use a simple export command: "export CC mpicc". Let's have a look at my results.

All-in-all we see some moderate improvements on the Xeons with HT enabled and a slight decrease in encode time on the uniprocessor machines. Nothing earth-shattering, but we'll take the 12% decrease in encode time on the Xeons.

[ Back to Page 2 ]
 
[ Next to Page 4 ]