Exploring Hyper-Threading Performance - Technical Details
Published on 2004-01-07 13:44:04 By: Jim_

As most of you are aware, hyper-threading (otherwise known as simultaneous multithreading technology or SMT) is the ability of one physical processor to execute two threads concurrently. The concept of true multitasking certainly isn't foreign to the readers of 2CPU.com, as we've been singing the praises of symmetric multi-processing for this reason for years. One of the many common misconceptions we've had to deal with over the years has been that "SMP is only beneficial when the application being used is multi-threaded." Any geek wise to the kung-fu of SMP would certainly try to debunk that myth based on the multitasking abilties of a duallie alone.

Moving forward, it's important (and somewhat obvious) to note that a CPU that supports hyper-threading is not going to provide comparable performance with two physical processors rated at the same speed. The simple reason for this is because the two logical processors that make up your hyper-threaded CPU have to share resources, namely the execution engine, cache, and access to the system bus.

In Intel's documentation, they say that moving from one logical processor to two on the same physical CPU can account for a performance increase in the neighborhood of 10-30%, depending on the application and the situation. Let me run through a general explanation of why this might be the case: Let's say that a thread is being executed on the first logical processor. It's consuming resources (execution units, cache, and possibly accessing memory via the bus) as efficiently as possible to execute the particular thread. At the same time, a second thread is scheduled to the second logical processor. It too requires access to execution units, cache and it may also need access to the bus. A conflict arises and one of two things may happen: The initial thread executing on the first logical processor may slow down, or the performance of the next thread, running on the second logical processor may be limited.

Intel is certainly quick to point out that the combined total of the performance of these two threads will probably out-weigh that of a single processor which doesn't support HT. This could be analogous to two people in moderate shape being able to pile more wood in total, than a single person who's in great shape.

Above, I mentioned that each logical processor can be halted independently of the other. It's interesting to note, that if a logical processor is idle with no threads awaiting execution, it will still compete for shared resources unless it's specifically instructed to halt. This is actually one of the reasons that Windows XP is strongly recommended as the operating system of choice for CPUs supporting hyper-threading (the other being licensing limitations). Windows XP will issue a halt much more agressively than Windows 2000 which can potentially improve performance as the battle for shared resources continues to wage on.

In my first article, we noticed that during our DivX encode performance suffered when hyper-threading was enabled on both of our test systems. The performance decrease was in the neighborhood of 10-16%. At the time, I was really curious why that would be the case. Now that we've looked into how our hyper-threaded CPUs work, some light is being shone on that quandry. I think Hannibal said it best in his hyper-threading article at Arstechnica:

"... none of the caches know the difference between one logical processor and another, or between code from one thread or another. This means that one executing thread can monopolize virtually the entire cache if it wants to, and the cache, unlike the processor's scheduling queue, has no way of forcing that thread to cooperate intelligently with the other executing thread. The processor itself will continue trying to run both threads, though, issuing fetches from each one. This means that, in a worst-case scenario where the two running threads have two completely different memory reference patterns (i.e. they're accessing two completely different areas of memory and sharing no data at all) the cache will begin thrashing as data for each thread is alternately swapped in and out and bus and cache bandwidth are maxed out."

There are obviously a lot of smart people at Intel, and they understand the issues involved when two logical processors are sharing resources (and ultimately data). As such, there is oodles of documentation available for software developers to ensure that applications are structured in a way that they can take advantage of hyper-threading's positives while side-stepping the negatives. The ultimate example of a positive situation for a CPU that supports hyper-threading would be one in which an application both consumes data and produces data. The affinities could be set so that a consumer and producer thread could be executed on two logical processors on the same physical CPU. This way, they could share data easily and be less apt to interfere with one another.

There certainly is a lot of promise here. I would think that the performance gains of hyper-threading will only increase as time goes on and more software is developed in a manner that is advantageous for what it can bring to the table. It's not like situations haven't arisen in the past like this. We've gone through the waiting game with MMX, 3dnow, SSE and so on and so forth. Once software developers become more comfortable with the technologies available to them, the better performance we can expect. To give you another lame analogy, it's somewhat like console game developers. With each generation of games for a particular console, the developers learn how to squeeze more out of the platform and as a result the eye candy generally improves.

Let's stop all this technical babble and dig into some graphs. I'll first detail our test systems and we'll get started.

[ Back to Page 1 ]
 
[ Next to Page 3 ]