In the coming year, three new microarchitectures will grace the x86 world. This abundance of new designs is exciting; especially since each one embodies a different philosophy. At the high-end, Sandy Bridge focuses on efficient per-core performance, while Bulldozer explicitly trades away some per-core performance for higher aggregate throughput. AMD’s Bobcat takes an entirely different road, emphasizing low-power, but retaining performance.
The complexity of these new systems is breath-taking. Consider this description of the Sandy Bridge memory subsystem:
The load buffer grew by 33% and can track 64 uops in-flight. Sandy Bridge’s store buffer increased slightly to 36 stores, for an overall 100 simultaneous memory operations, roughly two thirds of the number of the total uops in-flight. To put this in perspective, the number of memory uops in-flight for Sandy Bridge is greater than the entire instruction window for the Core 2 Duo. Again, like Nehalem, the load and store buffers are partitioned between threads.
For the most part, the details of modern processor architectures are hidden from people like me. Even though most programmers would consider the low-level C-language server programming that I do to be very "close to the metal", there's still layers and layers below me:
- C runtime libraries
- Compiler-generated code
- Operating system APIs
- Device drivers
And then we get down to "the hardware" itself, which, as is clear from reading the RWT analysis, is extremely sophisticated and multi-layered as well.
It's a very well-written and fascinating whirlwind tour through the latest CPU architecture, and certainly worth your time to read.