The x86 architecture

Exploring the x86 architecture


Overview

This article has the goal of uncovering the x86 archictecture, specially the 64-bit version, mainly focused the way I understand them.

Although I am basing it on Intel manuals, some mistakes could pass through me. If you find one, feel free to warn me.


Why x86 matters

What’s x86?

Basically, x86 is a family (various processors) of Instruction Set Architectures — You can call it ISA.

Note
An ISA is a model that will define how software and hardware will interact with each other, defining a group of instructions, data types and memory access technique.

It was developed by Intel in 1978, starting with the historical 8086 microprocessor.

Why does it still dominate?

If you reading this article in a desktop, it’s much probable your processor ISA being x86-based. That’s the answer: The most of modern personal desktops and servers are totally based in x86 processor’s family.

There are a bunch of ISAs that aim being modern alternatives to x86.

ARM is one of those. It’s widely used for mobile devices, embedded systems and servers due to its versatility.


Versions context

As the years were passing, many improved versions of the original 8086 emerged. Considering that’s a 16-bit microprocessor, it’s not dificult to imagine it would be necessary.

8086

Now we know 8086 is a 16-bit microprocessor, it is important to point its registers and its buses were limited to 16 bits.

Important

A bus is any communication interface used to transfer data between components of a system, including the processor.

20-bit addresses

Although being a 16-bit microprocessor, 8086 could form addresses with 20 bits by using a combination of segment registers and offsets.

Each segment register is 16-bit-sized (216 bits = 64 KiB). This value is 4 bits left-shifted and added to the 16-bit offset value to compute the physical address:

(Segment << 4) + Offset

80386

The first 32-bit Intel processor, released in 1985. It was a remarkable evolution, updating the ISA and increasing the clock speed.

Tip

The clock speed is a measurement of how many cycles the processor can execute within a period. It’s often measured in Hertz (Hz) and sub-variants.

For each cycle, the processor can decode instructions and make operations.

AMD Opteron

The first 64-bit x86 processor, released in 2003. Introduced the 64-bit version of x86 (x86-64 or x64), allowing 64-bit computing.

A major expanding of the capabilities of x86 processors, enabling support for larger amounts of memory and improving performance.

Note
Intel didn’t develop the first 64-bit x86 processor originally. It was AMD.

ISA modes

A modern 64-bit x86 processor is divided into three main modes:

Mode Size Introduced by

Real mode

16 bits

Intel 8086

Protected mode

32 bits

Intel 80386

Long mode

64 bits

AMD Opteron


Registers

One of the most confusing aspects of x86 is that registers are not independent; they overlap.

They are small storage locations inside the CPU. Unlike RAM, they are extremely fast and directly used by instructions.

That’s because the same physical register can be accessed with different sizes, as it’s shown the following example with RAX.

rax
movq $0x123400, %rax
movb $0xFF, %al

# RAX = 0x1234FF
Note

The assembly code above pass a literal value to the 64-bit version to the accumulator register.

Modifying its lower byte version will also change the physical address of it, resulting RAX to be modified.

$ is used to access a literal value.

General-purpose registers

Type 64 bits 32 bits 16 bits Higher byte Lower byte

Accumulator

RAX

EAX

AX

AH

AL

Base

RBX

EBX

BX

BH

BL

Counter

RCX

ECX

CX

CH

CL

Data

RDX

EDX

DX

DH

DL

Source

RSI

ESI

SI

 — 

SIL

Destination

RDI

EDI

DI

 — 

DIL


Instructions

Instructions are special values the processor will decode and change its state. They are called opcode.

Using assembly, in this case, the GNU Assembler, instructions come together of the data size (just after it):

Size Name Symbol Example

64 bits

Quadword

q

movq

32 bits

Long

l

addl

16 bits

Word

w

andw

8 bits

Byte

b

orb

Warning

You can use a instruction without the data size prefix, but it’s not recommended.

In this case, the assembler will consider the register’s size to decode it.

Core instructions

There’s a lot of instructions x86 provides us. Many of them are restricted by the kernel due to their functionality.

However, there’re a couple of them considered the "most used" by someone aiming low level programming.

MOV

One of the most basics instructions. The idea is pretty simple.

From a source value, moves it to a destination.

# Fill our registers
movq $50, %rbx
movq $40, %rax

# Moves the value in RAX to RBX
movq %rax, %rbx

# RAX = 40
# RBX = 40
Important

The value in RAX is not exactly moved to RBX. It remains there.

Yeah…​ It is a copy.

ADD

For this one, taken two values, combines them and store the result.

The first value passed will be the source. It’s going to be summed.

The second value is the destination. It’s gonna receive the value of the first one to get the result number.

# Fill the source and destination values
movl $10, %esi
movl $50, %edi

# Sum the source's value to the destination's
addl %esi, %edi

# ESI = 10
# EDI = 60

SUB

Used to subtract a value from a source to a destination.

movl $20, %edi
subl $10, %edi

INC

Used to increment a value in one.

movl $60, %edi
incl %edi

DEC

Used to decrement a value in one.

movl $10, %edi
decl %edi