What is the Intel i9 processor architecture

Intel Core i9 / i7 / i5 / i3

The processors Intel Core i7, i5 and i3 have replaced the predecessors Core 2 Duo and Core 2 Quad since the end of 2008. The Core i series is based on the Nehalem architecture, the Sandy Bridge architecture since 2011 and the Ivy Bridge architecture since mid-2012. Then came Haswell (2013), Broadwell (2014/2015), Skylake (2015) and Kaby Lake (2016/2017). Since Skylake-X (2017) there is a Core i9, which was released due to the announced AMD Ryzen Threadripper.

The names i3, i5, i7 and i9 hardly provide any information about the performance of the processors. They mainly serve to classify the performance. Whereby a Core i9 is more powerful than a Core i7. And that, in turn, more powerful than a Core i5, and that more powerful than a Core i3.
The different types are intended for different requirements and purposes. There are also special versions for servers, desktop PCs and notebooks.
The Core i9 forms the performance and high-end segment. Before that it was the Core i7. The Core i5 positions itself as a mid-range CPU. The Core i3 is used more in the lower price segment, which is suitable for typical office work, a little Internet and multimedia.

The particularities

  • Introduction of the QuickPath Interconnection (QPI)
  • serial connections to chipset and memory
  • Reintroduction of Hyper-Threading (HT)
  • integrated memory controller
  • native quad core
  • Turbo Boost

Processor architecture

The architecture of the Intel Core i CPU series is like a modular system. Many processor variants are possible for different markets and applications. Including servers, workstations and high-end desktop systems. The architecture also has an integrated graphics controller for inexpensive desktop systems.

A special feature of the Core i CPUs is the integrated memory controller, which was originally a specialty of AMD CPUs. The optimal memory throughput is achieved with several memory modules of the same size.

Due to the increasing number of cores (over 4), the cores, the L3 caches, several memory controllers and I / O interfaces are linked with one another via a mesh. Vertical and horizontal connections are used for this. However, there are different runtimes when data is transferred over several hops within the CPU.

Intel Core i7 (old: Nehalem architecture)

The most important change is the integrated memory controller. This means that the main memory is connected directly to the processor and no longer via the chipset. External communication no longer takes place via a single front-side bus, but via the QuickPath Interconnection (QPI) with scalable links and an independent memory bus.
The structure of a core is optimized for the parallel processing of numerous threads.

Intel Core i5 (old: Nehalem architecture)

The Intel Core i5 is the smaller brother of the Intel Core i7, which is intended for the mass market in a cheaper version. In the Nehalem generation, for example, it only has two memory channels and the DMI interface, which is connected to the 1-chip chipset called the Platform Controller Hub (PCH).
In this processor, Intel combines the actual CPU with a graphics processor (GPU). If you take a closer look, Intel has transplanted a dual-core processor and a classic northbridge (chipset) with memory controller and integrated graphics unit into a common housing. In this form, it is not really a unified CPU-GPU processor. It only comes in later processor generations.

QPI - QuickPath Interconnection

QPI is a serial point-to-point interface and replaces the front-side bus (FSB). The interface is a kind of PCI Express (PCIe). In server processors, QPI is also suitable for CPU coupling.
A full-width QPI port consists of 20 links in each direction, each of which transmits up to 6.4 Gbit / s. Because 4 out of 20 links transmit CRC checksums, the useful data width is 2 bytes (16 bits). A full-width link can therefore transmit 12.8 GB / s per direction. Full duplex even comes up to 25.6 GB / s.
Full-width QPI on a 16x16 link is also faster at 6.4 Gbit / s than HyperTransport 3.0 with only 2.6 GHz (5.2 Gbit / s). This is because each individual link uses two signal lines. Because there is also a differential clock signal for each link, a full-width QPI link requires a total of 84 signal lines.
Only the data from and to PCIe expansion cards and other onboard units flow through the QPI.

Sandy Bridge Architecture (2nd Generation)

With Sandy Bridge, Intel goes one step further in integrating functions into the processor that were typically located outside the processor. The CPU and GPU sit on the same die.
The Sandy Bridge Die consists of an elongated chip with 4 CPU cores, integrated L1 and L2 caches, as well as a shared L3 cache, which is also known as the last-level cache (LLC). The graphics core (GPU), which also uses the L3 cache, is located to the left of the CPU cores. This relieves the main memory and saves electricity. The system agent with memory controller and the input and output interfaces are located to the right of the CPU cores. The memory interface is arranged below. The areas of the chip communicate via a ring bus that has 1000 signal lines. Each CPU core has a connection to the ring bus, which can transfer up to 300 GB per second.

The basic clock rate of the processor is set at 100 MHz. With a variable multiplier, a PLL circuit generates the operating frequency for all CPU cores, the main memory and also the external interfaces. The Direct Media Interface (DMI) is a modified PCIe 2.0 X4 connection between CPU and chipset.
The GPU only supports DirectX 10.1 and has limited 3D performance. Although the GPU is faster than onboard graphics processors, it is still unsuitable for PC games. Even a low-cost graphics card has more power than the integrated GPU. The GPU performance is sufficient for notebooks and office computers.

In principle, Sandy Bridge's arithmetic units work no differently than Nehalem's. But because of AVX, the CPU cores have been redesigned from scratch. This includes some detail improvements. The software that does not use AVX commands also benefits from some measures.

Ivy Bridge Architecture (3rd Generation)

Ivy Bridge is the name of the third generation Core i. Compared to the Sandy Bridge architecture, the Ivy Bridge processors have slight architectural differences and are somewhat faster and more economical in consumption with the same clock frequency. Compared to Sandy Bridge processors, Ivy Bridge processors have 26 percent less silicon area and 20 percent more transistors. The transistors are mainly used for the HD 4000 graphics unit. Also new are the tri-gate transistors, in which the gate surrounds the drain and source on three sides, thus reducing leakage currents and ensuring better energy efficiency.

The integrated graphics processor (HD 2500 or HD 4000) is DirectX 11 and OpenCL 1.1 compatible. The GPU HD 4000 has 16 cores. In comparison, the HD 2000 only has 6 and HD 3000 only 12 cores. Most Ivy Bridge processors, however, only have one HD 2500, which has only minor advantages over the HD 2000. If you want to play 3D games, you shouldn't rely on the internal GPU, but rather install an external graphics card in the PC.
The integrated graphics processor HD 4000 is comparable to an entry-level graphics card. The smaller versions are hardly suitable for demanding games. At most for office work and video editing.
The integrated HD video transcoder Quick Sync Video is a special feature. With the appropriate software, the transcoder converts videos in certain formats into other formats very quickly.

With Ivy Bridge, not only are the processors changing, but also the chipsets. Among other things, PCI Express 3.0 and USB 3.0 are finding their way here. PCI Express 3.0 is integrated directly into the CPU as a PCIe Root Complex. A memory controller for 2 DDR3 SDRAM is also integrated.
The chipset, known as the Platform Controller Hub (PCH), is coupled to the processor via the PCIe-like DMI (Direct Media Interface). With a maximum of 8 PCIe 2.0 lanes (depending on the PCH) at 500 MB / s each.

Haswell architecture (4th generation)

Haswell or Haswell-E (Refresh) is the name for the fourth Core-i generation. In addition to various improvements, the main focus is on Advanced Vector Extension 2 (AVX2). This is a vector unit that can perform 256-bit integer operations. Likewise, Fused Multiply-Add (FMA), a unit that can combine multiplication and addition in one processing step. Overall, the processor cores have more functional units. The pipeline remains the same and is basically still based on the old Pentium Pro.
Thanks to the modified power supply, the Haswell processors are more economical. In which the majority of the voltage converters are integrated directly in the processor. They call themselves the Fully Integrated Voltage Regulator (FIVR).
The Haswell architecture is only around 10 percent faster, even more economical when idling and only really powerful with special software that supports AVX2 and DMA.
The integrated graphics unit (IGP) is called Iris Pro 5200 and works with an eDRAM that behaves like a cache-coherent L4 cache in the address space of the GPU and CPU. The Iris Pro 5200 doesn't quite come close to a mid-range GPU. But they clearly leave entry-level graphics chips behind.

Broadwell architecture (5th generation)

Broadwell is the name for the fifth generation of the Core i. The introduction in early 2015 brought CPUs with dual cores for notebooks and economical mini-PCs. There are a few more powerful quad-core models for desktop PCs.
There are only minor improvements to this micro-architecture. The difference to the Haswell architecture are lower clock rates, with less power consumption and a more powerful graphics unit. The CPU cores can use the additional eDRAM chip that is supposed to speed up the GPU. There is also an L4 cache. FinFET transistors in 14 nm technology are used for production.
The Broadwell processors will only be available for a short time because the processors of the Skylake architecture are to follow in the second half of 2015. It is to be expected that Intel will maintain several processor architectures in parallel on the market and, depending on the market segment, different architecture generations.
Other than being more efficient, Broadwell CPUs have no advantage over Haswell CPUs. If you need more power, you have to wait for the Skylake CPUs.

Skylake architecture (6th generation)

The CPUs of the Skylake architecture have been around since the second half of 2015. The Skylake architecture is the first core architecture that has different chips with corresponding optimizations for servers, desktops and notebooks. The entire design has been revised and optimized mainly for energy efficiency.
Thanks to the AVX3 or AVX512 with 512-bit arithmetic units, the Skylake CPUs should double the performance potential compared to Haswell per core and clock cycle. The base clock is independent of the clock frequency of PCIe and the DMI connection to the chipset. Together with the external voltage regulators, overclockers enjoy this.
The switch to DDR4 SDRAM with DDR4-2133 chips with a bandwidth of 17 GByte / s is planned. DDR3 SDRAM with DDR3-1600 chips with a bandwidth of 12.8 GByte / s are also supported. A new memory means a new CPU version (LGA2011v3) and thus new mainboards. The Intel chipsets then also master PCIe 3.0 and USB 3.1.

Kaby Lake architecture / Skylake refresh (7th generation)

In principle, Kaby Lake is just a slightly improved Skylake and is cleverly pin-compatible with it. The around 12 percent increase in performance is primarily the result of higher frequencies in the basic and turbo clocks.
Of course, there are also a few new features that mainly affect the integrated graphics unit. 4K videos are supported in HEVC and VP9, ​​with HDR-10 and without burdening the CPU. This brings a quieter fan and longer runtimes (battery operation).

Kaby Lake Refresh, Coffee Lake, and Cannon Lake Architecture (8th Generation)

The eighth Core i generation processors with up to 40 percent more computing power than their predecessors have been available since September 2017. There are different processor code names, which is why there is occasional confusion here.

  • Kaby Lake Refresh (desktops and notebooks)
  • Coffee Lake S (six-core with 12 threads for desktop and notebook processors)
  • Cannon Lake (especially thin notebooks and fanless Windows tablets)

Ice Lake Architecture (9th Generation)

In mid-2018, Ice Lake will introduce a new processor architecture.

Core-i9-X (core: Skylake-X)

The Intel Core-i9-X is a high-performance processor with 10, 12, 14, 16 and 18 cores for desktop computers for gaming and VR. Each core has the instruction set extension AVX512, wider paths in the arithmetic units, and a 1 Mbyte level 2 cache per core. There are four DDR4 memory channels and 44 PCIe 3.0 lanes on the outside.
Another special feature is that all Core X can be overclocked thanks to unlimited multipliers.

Mobile Core i processor with Radeon Vega GPU from AMD

The combination processor developed jointly by AMD and Intel consists of a Core-i CPU from Intel and a Radeon Vega GPU from AMD. This processor is said to work in mini PCs and slim gaming notebooks.
The Intel processor and the AMD graphics unit sit together on a carrier board (package) and communicate with each other via eight PCIe 3.0 lanes. The graphics chip is connected to the stacked memory chips of the type High Bandwidth Memory 2 (HBM2) via an Embedded Multi-Die Interconnect Bridge (EMIB).
The powerful quad-core CPUs with AMD GPU and HBM2 memory replace the mobile graphics card in flat notebooks. Intel promises enough performance for full HD gaming and VR applications.

GPU - Graphic Processing Unit

The GPU, the integrated graphics unit, is different depending on the processor type (i3, i5, ...). They support DirectX, OpenGL and OpenCL. Some have their own eDRAM, which is controlled via its own bus. HDMI and DisplayPort signals are output directly on the output side.

Power management

The processor has a built-in Power Control Unit (PCU). The PCU contains an integrated microcontroller for power management. More than a million transistors are used for this alone. This makes it possible to really set the power consumption to zero for some parts of the processor that are not needed. There are also sensors that constantly monitor the temperature of the individual cores and the power consumption. If a core has nothing to do, then it is sent into deep sleep.
A technique called Speed ​​Shift was introduced with the Skylake architecture. It replaces the P-States managed by the operating system. The processor was only able to switch to turbo mode under full load. With Speed ​​Shift, hardware algorithms now decide on the correct choice of clock frequency and voltage. This enables the system to react much more quickly to changing requirements.


The Intel Core i9, i7, i5 and i3 have inherited Hyper-Threading (Simultaneous Multi-Threading, SMT) from the Pentium 4. However, it is an advanced form of Hyper-Threading that is much more efficient. With Hyper-Threading, each physical core also has a virtual or logical core. The operating system detects twice as many cores as are physically present.
Hyper-threading has a positive effect when the processing of a thread (code thread) is delayed due to data that has to be reloaded. Then it is switched internally and another thread is processed. In this way, Hyper-Threading brings between 10 to 20 percent more performance.

Turbo Boost

Applications that do not use all cores are accelerated by clocking a single core higher. Behind this mechanism is an automatic overclocking called Turbo Boost.
Multi-core processors work with a lower clock frequency than single or dual-core processors. This means that conventional programs that only support one core work more slowly. With Turbo Boost in multi-core processors, Intel offers a technology that optimally supports both modern and old software.

VT-d - Virtualization Technology for Directed I / O

VT-d is an extension to virtualization functions. It increases the I / O throughput of virtual systems and enables, for example, expansion cards to be passed through to the guest systems. In principle, virtualization can also be completely emulated using software. But with additional virtualization commands and with the support of the processor, the virtualization environments run much faster and more stable.

AVX - Advanced Vector Extensions

AVX is an instruction set extension that is part of the Sandy Bridge architecture. AVX is comparable to the instruction set extension SSE. While SSE can only process data with a width of 128 bits, AVX can process data up to 256 bits. For software to benefit from AVX, it must support AVX. For example, AVX is designed to speed up video data processing.
AVX is just a preliminary stage to further acceleration units in the future.

Other related topics:

Everything you need to know about computer technology.

Computer technology primer

The computer technology primer is a book about the basics of computer technology, processor technology, semiconductor memory, interfaces, data storage, drives and important hardware components.

I want that!

Everything you need to know about computer technology.

Computer technology primer

The computer technology primer is a book about the basics of computer technology, processor technology, semiconductor memory, interfaces, data storage, drives and important hardware components.

I want that!