Exploring Hyper-Threading Performance - Conclusion
Published on 2004-01-07 13:44:04 By: Jim_

The first issue I want to address in my conclusion actually doesn't even pertain to this article. I want us all to slide back in time for a few seconds, back to our first look at hyper-threading performance. HT performance looked rosey until we got to our DivX benchmark, where we unveiled the shock that HT actually hampered performance. At the time, I really couldn't get a sense of why that would be the case. Well now, I'm ready to take a stab at it. Here goes:

The DivX benchmark, and for the most part all video encoding, is all about accessing system memory as fast as possible. The data sets are so large that they can't always be stored in the processor(s) cache, so it's constantly hitting the bus to fetch data from main memory. The processors being used in that article were 2.0 and 2.4 GHz Xeons. All four logical processors in those test machines were sharing its plodding 400MHz bus. There probably just was not enough bandwidth available to meet the needs of four logical processors all wanting access to the bus at the same time.

I can't stress how important I think this information actually is when you try to understand hyper-threading performance and how we're going to see it scale in the future. Let's now jump back to the present and discuss the DVD encoding benchmarks we showed you on the previous page. This time around, there wasn't a performance penalty on the Xeons, but these Xeons were running on a 533MHz bus. There was more bandwidth available for the logical processors to share and the system was less apt to have its bus and cache bandwidth completely exhausted. I say cache bandwidth because as each of the logical processors are trying to fetch data from main memory it has to be stored somewhere, right? We already know that cache space is shared amongst the logical processors so you can imagine the sort of thrashing that would be going on in such a scenario.

This being 2CPU.com, I'm sure a lot of people were curious why I included a Pentium 4 processor in my testing. The reason was quite simple, it runs on an 800MHz FSB. Keeping the whole concept of future scaling in mind, it made sense for me to include it's hyper-threading performance. We really don't have to look any further than the DVD encoding benchmark to get a sense of how its bus speed impacts hyper-threading. We saw a whopping 30% decrease in encoding time with HT enabled on the 3.2GHz P4C. We were using an application that is certainly multi-threaded in TMPGEnc, so each logical processor had plenty of work to do and they both had plenty of bandwidth available to share.

Since Intel hasn't released Xeons on an 800MHz FSB to this point, we'll have to wait and see what sort of impact it'll have on hyper-threading but I think it's rather clear from our testing and observations that it will at the very least decrease the possibility bus contention and probably help boost performance in applications where large data sets are being used.

What we do have available to us at this point are 3.2GHz Xeons with 1MB of L3 cache. Hooz recently acquired a pair of these from Intel and he's been playing around with them a little bit. Soon enough we'll be able to publish these same benchmarks on that test system. While they still run on a 533MHz FSB, the added cache should decrease the amount of fetches the logical processors have to make to system memory and thus decrease the possibility of resource contention (bus and cache) in a round-about way.

Intel does have the Pentium 4 Extreme Edition available that also sports 2MB of L3 cache, and I've mentioned to my new best friend George Alfs over at Intel that I'd love to have an opportunity to test these theories with one of those processors.

One of the cool things that I've discovered from all the research and testing required to patch this article together is that hyper-threading performance will continue to improve in the future not only due to technological innovation with hyper-threading itself but by improvements to other areas of processor design. As bus speeds increase, and more cache becomes available on die, hyper-threading is going to be more and more efficient. It appears to be somewhat of an engineering symbiotic relationship.

In Part II of this series, I'm going to be taking a look at hyper-threading under Linux 2.6. I hope to not only test php/apache/mysql performance but also look into how hyper-threading affects GCC compilation performance and I may even dabble with povray. You'll want to check back for that one.

Thanks for your time. I hope this article has been interesting and insightful to you all. It certainly has been an educational experience for me.

Jim Kirk

Related Articles

http://www.intel.com/technology/hyperthread/
http://www.intel.com/business/bss/products/hyperthreading/server/index.htm
http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.html

[ Back to Page 4 ]