Overview
This article has the goal of uncovering the x86 archictecture, specially the 64-bit version, mainly focused the way I understand them.
Although I am basing it on Intel manuals, some mistakes could pass through me. If you find one, feel free to warn me.
Why x86 matters
What’s x86?
Basically, x86 is a family (various processors) of Instruction Set Architectures — You can call it ISA.
|
Note
|
An ISA is a model that will define how software and hardware will interact with each other, defining a group of instructions, data types and memory access technique.
|
It was developed by Intel in 1978, starting with the historical 8086 microprocessor.
Why does it still dominate?
If you reading this article in a desktop, it’s much probable your processor ISA being x86-based. That’s the answer: The most of modern personal desktops and servers are totally based in x86 processor’s family.
There are a bunch of ISAs that aim being modern alternatives to x86.
ARM is one of those. It’s widely used for mobile devices, embedded systems and servers due to its versatility.
Versions context
As the years were passing, many improved versions of the original 8086 emerged. Considering that’s a 16-bit microprocessor, it’s not dificult to imagine it would be necessary.
8086
Now we know 8086 is a 16-bit microprocessor, it is important to point its registers and its buses were limited to 16 bits.
|
Important
|
A bus is any communication interface used to transfer data between |
20-bit addresses
Although being a 16-bit microprocessor, 8086 could form addresses with 20 bits by using a combination of segment registers and offsets.
Each segment register is 16-bit-sized (216 bits = 64 KiB). This value is 4 bits left-shifted and added to the 16-bit offset value to compute the physical address:
(Segment << 4) + Offset80386
The first 32-bit Intel processor, released in 1985. It was a remarkable evolution, updating the ISA and increasing the clock speed.
|
Tip
|
The clock speed is a measurement of how many cycles the processor can execute within a period. It’s often measured in Hertz ( For each cycle, the processor can decode instructions and make operations. |
AMD Opteron
The first 64-bit x86 processor, released in 2003. Introduced the 64-bit version of x86 (x86-64 or x64), allowing 64-bit computing.
A major expanding of the capabilities of x86 processors, enabling support for larger amounts of memory and improving performance.
|
Note
|
Intel didn’t develop the first 64-bit x86 processor originally. It was AMD.
|
ISA modes
A modern 64-bit x86 processor is divided into three main modes:
| Mode | Size | Introduced by |
|---|---|---|
Real mode |
16 bits |
Intel 8086 |
Protected mode |
32 bits |
Intel 80386 |
Long mode |
64 bits |
AMD Opteron |
Registers
One of the most confusing aspects of x86 is that registers are not independent; they overlap.
They are small storage locations inside the CPU. Unlike RAM, they are extremely fast and directly used by instructions.
That’s because the same physical register can be accessed with different sizes, as it’s shown the following example with RAX.
movq $0x123400, %rax
movb $0xFF, %al
# RAX = 0x1234FF|
Note
|
The assembly code above pass a literal value to the 64-bit version to the accumulator register. Modifying its lower byte version will also change the physical address of it, resulting
|
General-purpose registers
| Type | 64 bits | 32 bits | 16 bits | Higher byte | Lower byte |
|---|---|---|---|---|---|
Accumulator |
RAX |
EAX |
AX |
AH |
AL |
Base |
RBX |
EBX |
BX |
BH |
BL |
Counter |
RCX |
ECX |
CX |
CH |
CL |
Data |
RDX |
EDX |
DX |
DH |
DL |
Source |
RSI |
ESI |
SI |
— |
SIL |
Destination |
RDI |
EDI |
DI |
— |
DIL |
Instructions
Instructions are special values the processor will decode and change its state. They are called opcode.
Using assembly, in this case, the GNU Assembler, instructions come together of the data size (just after it):
| Size | Name | Symbol | Example |
|---|---|---|---|
64 bits |
Quadword |
|
|
32 bits |
Long |
|
|
16 bits |
Word |
|
|
8 bits |
Byte |
|
|
|
Warning
|
You can use a instruction In this case, the assembler will consider the register’s size to decode it. |
Core instructions
There’s a lot of instructions x86 provides us. Many of them are restricted by the kernel due to their functionality.
However, there’re a couple of them considered the "most used" by someone aiming low level programming.
MOV
One of the most basics instructions. The idea is pretty simple.
From a source value, moves it to a destination.
# Fill our registers
movq $50, %rbx
movq $40, %rax
# Moves the value in RAX to RBX
movq %rax, %rbx
# RAX = 40
# RBX = 40|
Important
|
The value in Yeah… It is a copy. |
ADD
For this one, taken two values, combines them and store the result.
The first value passed will be the source. It’s going to be summed.
The second value is the destination. It’s gonna receive the value of the first one to get the result number.
# Fill the source and destination values
movl $10, %esi
movl $50, %edi
# Sum the source's value to the destination's
addl %esi, %edi
# ESI = 10
# EDI = 60SUB
Used to subtract a value from a source to a destination.
movl $20, %edi
subl $10, %ediINC
Used to increment a value in one.
movl $60, %edi
incl %ediDEC
Used to decrement a value in one.
movl $10, %edi
decl %edi