|
Exploring Hyper-Threading Performance - Technical Details As
most of you are aware, hyper-threading (otherwise known as simultaneous
multithreading technology or SMT) is the ability of one physical processor
to execute two threads concurrently. The concept of true multitasking
certainly isn't foreign to the readers of 2CPU.com,
as we've been singing the praises of symmetric multi-processing for
this reason for years. One of the many common misconceptions we've
had to deal with over the years has been that "SMP is only beneficial
when the application being used is multi-threaded." Any geek
wise to the kung-fu of SMP would certainly try to debunk that myth
based on the multitasking abilties of a duallie alone. Moving
forward, it's important (and somewhat obvious) to note that a CPU
that supports hyper-threading is not going to provide comparable performance
with two physical processors rated at the same speed. The simple reason
for this is because the two logical processors that make up your hyper-threaded
CPU have to share resources, namely the execution engine, cache, and
access to the system bus. In
Intel's documentation, they say that moving from one logical processor
to two on the same physical CPU can account for a performance increase
in the neighborhood of 10-30%, depending on the application and the
situation. Let me run through a general explanation of why this might
be the case: Let's say that a thread is being executed on the first
logical processor. It's consuming resources (execution units, cache,
and possibly accessing memory via the bus) as efficiently as possible
to execute the particular thread. At the same time, a second thread
is scheduled to the second logical processor. It too requires access
to execution units, cache and it may also need access to the bus.
A conflict arises and one of two things may happen: The initial thread
executing on the first logical processor may slow down, or the performance
of the next thread, running on the second logical processor may be
limited. Intel
is certainly quick to point out that the combined total of the performance
of these two threads will probably out-weigh that of a single processor
which doesn't support HT. This could be analogous to two people in
moderate shape being able to pile more wood in total, than a single
person who's in great shape. Above,
I mentioned that each logical processor can be halted independently
of the other. It's interesting to note, that if a logical processor
is idle with no threads awaiting execution, it will still compete
for shared resources unless it's specifically instructed to halt.
This is actually one of the reasons that Windows XP is strongly recommended
as the operating system of choice for CPUs supporting hyper-threading
(the other being licensing limitations). Windows XP will issue a halt
much more agressively than Windows 2000 which can potentially improve
performance as the battle for shared resources continues to wage on. In my first
article, we noticed that during our DivX
encode performance suffered when hyper-threading was enabled on
both of our test systems. The performance decrease was in the neighborhood
of 10-16%. At the time, I was really curious why that would be the
case. Now that we've looked into how our hyper-threaded CPUs work,
some light is being shone on that quandry. I think Hannibal said it
best in his hyper-threading article at Arstechnica:
There are obviously a lot of smart people at Intel, and they understand the issues involved when two logical processors are sharing resources (and ultimately data). As such, there is oodles of documentation available for software developers to ensure that applications are structured in a way that they can take advantage of hyper-threading's positives while side-stepping the negatives. The ultimate example of a positive situation for a CPU that supports hyper-threading would be one in which an application both consumes data and produces data. The affinities could be set so that a consumer and producer thread could be executed on two logical processors on the same physical CPU. This way, they could share data easily and be less apt to interfere with one another. There certainly is a lot of promise here. I would think that the performance gains of hyper-threading will only increase as time goes on and more software is developed in a manner that is advantageous for what it can bring to the table. It's not like situations haven't arisen in the past like this. We've gone through the waiting game with MMX, 3dnow, SSE and so on and so forth. Once software developers become more comfortable with the technologies available to them, the better performance we can expect. To give you another lame analogy, it's somewhat like console game developers. With each generation of games for a particular console, the developers learn how to squeeze more out of the platform and as a result the eye candy generally improves. Let's stop all this technical babble and dig into some graphs. I'll first detail our test systems and we'll get started.
|