|
Exploring Hyper-Threading Performance - Conclusion The
first issue I want to address in my conclusion actually doesn't even
pertain to this article. I want us all to slide back in time for a
few seconds, back to our
first look at hyper-threading performance. HT performance looked
rosey until we got to our DivX
benchmark, where we unveiled the shock that HT actually hampered
performance. At the time, I really couldn't get a sense of why that
would be the case. Well now, I'm ready to take a stab at it.
Here goes: The
DivX benchmark, and for the most part all video encoding, is all about
accessing system memory as fast as possible. The data sets are so
large that they can't always be stored in the processor(s) cache,
so it's constantly hitting the bus to fetch data from main memory.
The processors being used in that article were 2.0 and 2.4 GHz Xeons.
All four logical processors in those test machines were sharing its
plodding 400MHz bus. There probably just was not enough bandwidth
available to meet the needs of four logical processors all wanting
access to the bus at the same time. I
can't stress how important I think this information actually is when
you try to understand hyper-threading performance and how we're going
to see it scale in the future. Let's now jump back to the present
and discuss the DVD encoding benchmarks we showed you on the previous
page. This time around, there wasn't a performance penalty on the
Xeons, but these Xeons were running on a 533MHz bus. There
was more bandwidth available for the logical processors to share and
the system was less apt to have its bus and cache bandwidth completely
exhausted. I say cache bandwidth because as each of the logical processors
are trying to fetch data from main memory it has to be stored somewhere,
right? We already know that cache space is shared amongst the logical
processors so you can imagine the sort of thrashing that would be
going on in such a scenario. This
being 2CPU.com, I'm sure a lot of
people were curious why I included a Pentium 4 processor in my testing.
The reason was quite simple, it runs on an 800MHz FSB. Keeping the
whole concept of future scaling in mind, it made sense for me to include
it's hyper-threading performance. We really don't have to look any
further than the DVD encoding benchmark to get a sense of how its
bus speed impacts hyper-threading. We saw a whopping 30% decrease
in encoding time with HT enabled on the 3.2GHz P4C. We were using
an application that is certainly multi-threaded in TMPGEnc, so each
logical processor had plenty of work to do and they both had plenty
of bandwidth available to share. Since
Intel hasn't released Xeons on an 800MHz FSB to this point, we'll
have to wait and see what sort of impact it'll have on hyper-threading
but I think it's rather clear from our testing and observations that
it will at the very least decrease the possibility bus contention
and probably help boost performance in applications where large data
sets are being used. What
we do have available to us at this point are 3.2GHz Xeons with
1MB of L3 cache. Hooz recently acquired a pair of these from Intel
and he's been playing around with them a little bit. Soon enough we'll
be able to publish these same benchmarks on that test system. While
they still run on a 533MHz FSB, the added cache should decrease the
amount of fetches the logical processors have to make to system memory
and thus decrease the possibility of resource contention (bus and
cache) in a round-about way. Intel
does have the Pentium 4 Extreme Edition available that also sports
2MB of L3 cache, and I've mentioned to my new best friend George Alfs
over at Intel that I'd love to have an opportunity to test these theories
with one of those processors. One
of the cool things that I've discovered from all the research and
testing required to patch this article together is that hyper-threading
performance will continue to improve in the future not only due to
technological innovation with hyper-threading itself but by improvements
to other areas of processor design. As bus speeds increase, and more
cache becomes available on die, hyper-threading is going to be more
and more efficient. It appears to be somewhat of an engineering symbiotic
relationship. In
Part II of this series, I'm going to be taking a look at hyper-threading
under Linux 2.6. I hope to not only test php/apache/mysql performance
but also look into how hyper-threading affects GCC compilation performance
and I may even dabble with povray. You'll want to check back for that
one. Thanks
for your time. I hope this article has been interesting and insightful
to you all. It certainly has been an educational experience for me. Related
Articles http://www.intel.com/technology/hyperthread/
|