Overview
This article has the goal of uncovering the x86 archictecture, specially the 64-bit version, mainly focused the way I understand them.
Although I am basing it on Intel manuals, some mistakes could pass. If you find one, feel free to warn me. :)
Why x86 matters
What’s x86?
Basically, x86 is a family (various processors) of Instruction Set Architectures — You can call it ISA.
| An ISA is a model that will define how software and hardware will interact with each other, defining a group of instructions, types and memory access technique. |
It was developed by Intel in 1978, starting with the historical 8086 microprocessor.
Why does it still dominate?
If you reading this article in a desktop, it’s much probable your processor ISA be x86. That’s the answer: The most of modern personal desktops and servers are totally based in x86 processor’s family.
There are a bunch of ISAs aiming being modern alternatives to x86.
ARM is one of those. It’s widely used for mobile devices, embedded systems and servers due to its versatility.
Versions context
As the years were passing, many improved versions of the original 8086 emerged. Considering that’s a 16-bit microprocessor, it’s not dificult to imagine it would be necessary.
8086
As we know, 8086 is a 16-bit microprocessor, which means its registers and its buses were limited to 16 bits.
|
A register is a small memory location the processor uses to temporarily hold data and instructions for quick access during the processing. I’ll detail a bit more later. A bus is any communication interface used to transfer data between components of a system, including the processor. |
20-bit addresses
Although being a 16-bit microprocessor, 8086 could form 20 bits addresses by using a combination of segment registers and offsets.
Each segment register is 64-KiB-sized (\(2^{16} \space bits\)). This value is left-shifted by 4 bits and added to the 16-bit offset value to compute the physical address: \((Segment << 4) + Offset\).
80386
The first 32-bit Intel processor, released in 1985. It was a remarkable evolution, updanting the ISA and increasing the clock speed.
| Modern x86 processor has to set the 64-bit mode by themselves. I’ll comment in the next topic. |
AMD Opteron
The first 64-bit x86 processor, release in 2003. Introduced the x86-64 (or x64) ISA, allowing 64-bit computing.
A major expanding of the capabilities of x86 processors, enabling support for larget amounts of memory and improving performance.
| Intel didn’t develop the first 64-bit x86 processor originally. |
ISA modes
A modern 64-bit x86 processor is divided into three main modes:
| Mode | Size | Introduced by |
|---|---|---|
Real mode |
16 bits |
Intel 8086 |
Protected mode |
32 bits |
Intel 80386 |
Long mode |
64 bits |
AMD Opteron |
Registers
One of the most confusing aspects of x86 is that registers are not independent; they overlap.
The same physical register can be accessed at different sizes, as it’s showed the following example.
movq 0x123456, %rax
movb 0xFF, %al
/* RAX = 0x1234FF */
|
The Assembly code above pass a value to the 64-bit version to the accumulator register. Modifying its lower byte version will also change the physical address of it, resulting |
Core registers
| Type | 64 bits | 32 bits | 16 bits | Higher byte | Lower byte |
|---|---|---|---|---|---|
Accumulator |
RAX |
EAX |
AX |
AH |
AL |
Base |
RBX |
EBX |
BX |
BH |
BL |
Counter |
RCX |
ECX |
CX |
CH |
CL |
Data |
RDX |
EDX |
DX |
DH |
DL |
Source |
RSI |
ESI |
SI |
— |
SIL |
Destination |
RDI |
EDI |
DI |
— |
DIL |
Instructions
Instructions are special-defined values the processor will decode.
In Assembly, on this case, GNU Assembler, registers come together of the data size:
| Size | Name | Symbol |
|---|---|---|
64 bits |
Quadword |
|
32 bits |
Long |
|
16 bits |
Word |
|
8 bits |
Byte |
|
movq %rbx, %rax
movb $8, %al
$ is used to pass literal values.
|
Core instructions
Data movement
-
mov: Used to move data from somewhere to another place.
| Although it’s common using it just with registers, it’s not restrict. |
movq $40, %rax
movq %rax, %rbx
Arithmetic
-
add: Used to sum a value from a source to a destination.
movl $50, %edi
addl $10, %edi
-
sub: Used to subtract a value from a source to a destination.
movl $20, %edi
subl $10, %edi
-
inc: Used to increment a value by one.
movl $60, %edi
incl %edi
-
dec: Used to decrement a value by one.
movl $10, %edi
decl %edi