Overview

This article has the goal of uncovering the x86 archictecture, specially the 64-bit version, mainly focused the way I understand them.

Although I am basing it on Intel manuals, some mistakes could pass. If you find one, feel free to warn me. :)

Why x86 matters

What’s x86?

Basically, x86 is a family (various processors) of Instruction Set Architectures — You can call it ISA.

An ISA is a model that will define how software and hardware will interact with each other, defining a group of instructions, types and memory access technique.

It was developed by Intel in 1978, starting with the historical 8086 microprocessor.


Why does it still dominate?

If you reading this article in a desktop, it’s much probable your processor ISA be x86. That’s the answer: The most of modern personal desktops and servers are totally based in x86 processor’s family.

There are a bunch of ISAs aiming being modern alternatives to x86.

ARM is one of those. It’s widely used for mobile devices, embedded systems and servers due to its versatility.

Versions context

As the years were passing, many improved versions of the original 8086 emerged. Considering that’s a 16-bit microprocessor, it’s not dificult to imagine it would be necessary.


8086

As we know, 8086 is a 16-bit microprocessor, which means its registers and its buses were limited to 16 bits.

A register is a small memory location the processor uses to temporarily hold data and instructions for quick access during the processing. I’ll detail a bit more later.

A bus is any communication interface used to transfer data between components of a system, including the processor.


20-bit addresses

Although being a 16-bit microprocessor, 8086 could form 20 bits addresses by using a combination of segment registers and offsets.

Each segment register is 64-KiB-sized (\(2^{16} \space bits\)). This value is left-shifted by 4 bits and added to the 16-bit offset value to compute the physical address: \((Segment << 4) + Offset\).


80386

The first 32-bit Intel processor, released in 1985. It was a remarkable evolution, updanting the ISA and increasing the clock speed.

Modern x86 processor has to set the 64-bit mode by themselves. I’ll comment in the next topic.

AMD Opteron

The first 64-bit x86 processor, release in 2003. Introduced the x86-64 (or x64) ISA, allowing 64-bit computing.

A major expanding of the capabilities of x86 processors, enabling support for larget amounts of memory and improving performance.

Intel didn’t develop the first 64-bit x86 processor originally.

ISA modes

A modern 64-bit x86 processor is divided into three main modes:

Mode Size Introduced by

Real mode

16 bits

Intel 8086

Protected mode

32 bits

Intel 80386

Long mode

64 bits

AMD Opteron

Registers

One of the most confusing aspects of x86 is that registers are not independent; they overlap.

The same physical register can be accessed at different sizes, as it’s showed the following example.

rax
Figure 1. Example with RAX
Example
movq 0x123456, %rax
movb 0xFF, %al
/* RAX = 0x1234FF */

The Assembly code above pass a value to the 64-bit version to the accumulator register.

Modifying its lower byte version will also change the physical address of it, resulting RAX to be modified.


Core registers

Type 64 bits 32 bits 16 bits Higher byte Lower byte

Accumulator

RAX

EAX

AX

AH

AL

Base

RBX

EBX

BX

BH

BL

Counter

RCX

ECX

CX

CH

CL

Data

RDX

EDX

DX

DH

DL

Source

RSI

ESI

SI

 — 

SIL

Destination

RDI

EDI

DI

 — 

DIL


Instructions

Instructions are special-defined values the processor will decode.

In Assembly, on this case, GNU Assembler, registers come together of the data size:

Size Name Symbol

64 bits

Quadword

q

32 bits

Long

l

16 bits

Word

w

8 bits

Byte

b

Example of usage
movq %rbx, %rax
movb $8, %al
$ is used to pass literal values.

Core instructions

Data movement
  • mov: Used to move data from somewhere to another place.

Although it’s common using it just with registers, it’s not restrict.
movq $40, %rax
movq %rax, %rbx
Arithmetic
  • add: Used to sum a value from a source to a destination.

movl $50, %edi
addl $10, %edi

  • sub: Used to subtract a value from a source to a destination.

movl $20, %edi
subl $10, %edi

  • inc: Used to increment a value by one.

movl $60, %edi
incl %edi

  • dec: Used to decrement a value by one.

movl $10, %edi
decl %edi