x86 Assembler using MASM32 - 32-bit Assembler Basics



Watch the video or follow the tutorial.



In this tutorial we will have a look at the basics of 32-bit x86 assembler programming using MASM32. We will review what we need to know to get started and after this tutorial you will be able to start programming right away. First of all we need to understand the basics of how a computer counts and how it relates to assembler.

The decimal numbering system that we use every day has whats called a base of 10, that is to say, it has 10 different characters that it uses to identify the 10 available numbers. The binary numbering system that computers use has a base of 2, which means that there are only 2 characters in use, 0 and 1. A bit or binary digit is a value that holds a binary number, it can have the value 0 or 1.

Binary numbers are arranged in collections of bits with the most basic of these being the byte which is 8 bits. Since there are two options for each of 8 bits the total number of combinations of the 8 bits is 2^8 = 256. This means that in decimal 1 byte can cover the values ranging from 0 to 255. As we add more bits the range increases and it also differs when we include negative (or signed) values. The following table gives an idea of the different sizes and their associated ranges. The table also includes the names of the data types and the codes that you use to define them in assembler programming.

Name Code bits Bytes Range
Byte DB 8 1 0 - 255
Signed Byte SBYTE 8 1 -128 - 127
Word DW 16 2 0 - 65,535
Signed Word SWORD 16 2 -32,768 - 32,767
Double Word DW 32 4 0 - 4,294,967,295
Signed Double Word SDWORD 32 4 -2,147,483,648 - 2,147,483,647
Far Word DF 48 6 0 - 281,474,976,710,655
Quad Word DQ 64 8 0 - 18,446,744,073,709,551,615
TBYTE DT 80 10 0 - 1,208,925,819,614,629,174,706,175

Computers that use a 32-bit memory space allow for 2^32 bytes of available memory = 4,294,967,295 bytes approximately 4 GB. We will be using this type of memory space in our assembler programming. Please don't worry if you have a 64-bit computer with more memory, 32-bit assembler will still work


Another important numbering system is the Hexadecimal numbering system which has 16 characters 0-9 then A-F. Each hexadecimal digit represents four bits, it allows a more human-friendly representation of binary-coded values and is very much more readable than binary. One hexadecimal digit represents a nibble (4 bits), which is half of a byte (8 bits). For example, a single byte can have values ranging from 00000000 to 11111111 in binary form, but this may be more conveniently represented as 00 to FF in hexadecimal. Converting between numbering systems is not that difficult either. We will now have a look at converting a binary number to Hexadecimal.

Consider the binary number 1101 and the following table.


Binary 1 1 0 1
Decimal 8 4 2 1

Each bit in the binary number represents a particular number in decimal as shown in the table above. so the binary number 1101 represents in decimal (1x8 + 1x4 + 0x2 + 1x1) = 13.

In Hexadecimal, 0-9 are equivalent to decimal and A=10, B=11, C=12, D=13, E=14 and F=15 so the binary number 1101 = 13 in decimal and D in Hex. Let's try a bigger number. How about 10101101.


Binary 1 0 1 0 1 1 0 1
Decimal 8 4 2 1 8 4 2 1

The second four bits will be the same as before and to calculate the first four we use the same method. 1010 = 1x8 + 0x4 + 1x2 + 0x1 = 10 = A. So the binary number 10101101 = AD in Hex and that is how easy it is to convert between binary and hexadecimal.



The 32-bit CPU Registers




The Data Registers are Eax, Ebx Ecx, Edx which are four 32-bit locations on the CPU that are used for arithmetic, logical, and other operations. These 32-bit registers can be used in three ways:

• As registers that hold 4 bytes or 32-bits of data
• The lower 16 bits of each 32-bit register can be used to store 2 bytes or 16 bits. These registers are referred to as AX, BX, CX and DX.
• The lower and higher haves of the above 16-bit registers can be used as separate 8-bit registers called - AH, AL, BH, BL, CH, CL, DH, and DL.

Some of these registers have a specific uses in certain arithmetical operations -

• AX is the primary accumulator and is used in most input/output operations as well as in most arithmetical operations.
• BX is the base register and it is used in index addressing
• CX is the count register along with Ecx and it stores the current count of any loop operation.
• DX is the data register which is also used for input/output operations and various multiply and divide operations involving large values.

The Pointer Registers

The registers EIP, ESP and EBP and their corresponding lower segments IP, SP and BP are known as the pointer registers.

• The Instruction Pointer (IP) − The 16-bit IP register stores the offset address of the next instruction to be executed. IP in association with the Code Segment (CS) register (as CS:IP) gives the complete address of the current instruction in the code segment.
• Stack Pointer (SP) − The 16-bit SP register provides the offset value within the program stack. SP in association with the Stack Segment (SS) register (SS:SP) refers to be current position of data or address within the program stack.
• Base Pointer (BP) − The 16-bit BP register mainly helps in referencing the parameter variables passed to a subroutine. The address in SS register is combined with the offset in BP to get the location of the parameter. BP can also be combined with DI and SI as base register for special addressing.

The Index Registers

The 32-bit index registers, ESI and EDI, and their 16-bit rightmost portions. SI and DI, are used for indexed addressing and sometimes used in addition and subtraction. There are two sets of index pointers:

• Source Index (SI or ESI) is used as source index for string operations.
• Destination Index (DI or EDI) is used as destination index for string operations.

The Control Registers or Flags

The 32-bit instruction pointer register and the 32-bit flags register combined are considered as the control registers.

Many instructions involve comparisons and mathematical calculations that change the status of the flags and some other conditional instructions test the value of these status flags to take the control flow to other location.

The common flag bits are:

• Overflow Flag (OF) − Indicates the overflow of the leftmost bit of data after a signed arithmetic operation.
• Direction Flag (DF) - Determines left or right direction for moving or comparing string data. When the DF value is 0 the string operation takes left-to-right direction and when the value is set to 1 the string operation takes right-to-left direction.
• Interrupt Flag (IF) − Determines whether the external interrupts like keyboard entry, etc, are to be ignored or processed. It disables the external interrupt when the value is 0 and enables interrupts when set to 1.
• Trap Flag (TF) − Allows setting the operation of the processor in single-step mode. The DEBUG program we used sets the trap flag, so we could step through the execution one instruction at a time.
• Sign Flag (SF) − Shows the sign of the result of an arithmetic operation. This flag is set according to the sign of a data item following the arithmetic operation. The sign is indicated by the leftmost bit. A positive result clears the value of SF to 0 and negative result sets it to 1.
• Zero Flag (ZF) − Indicates the result of an arithmetic or comparison operation. A non-zero result clears the zero flag to 0, and a zero result sets it to 1.
• Auxiliary Carry Flag (AF) − Contains the carry from bit 3 to bit 4 following an arithmetic operation; used for specialized arithmetic. The AF is set when a 1-byte arithmetic operation causes a carry from bit 3 into bit 4.
• Parity Flag (PF) − Indicates the total number of 1-bits in the result obtained from an arithmetic operation. An even number of 1-bits clears the parity flag to 0 and an odd number of 1-bits sets the parity flag to 1.
• Carry Flag (CF) − It contains the carry of 0 or 1 from the leftmost bit after an arithmetic operation. It also stores the contents of last bit of a shift or rotate operation.

The Stack


The stack has the following properties:

• The stack is an area of memory for storing data temporarily.
• The data is said to be pushed onto, and popped from the top of the stack.
• To retrieve something from the stack, you must remove (pop) anything that was added after it.
• Stacks are often referred to as LIFO buffers, from their last-in-first-out operation.
• A program continually uses its stack to temporarily store and preserve return addresses, procedure arguments, memory data, flags, and registers.
• The operating system initializes ESP to the address of the first byte above the stack.
• The PUSH instruction stores its operand on the stack.
• The POP instruction retrieves the most recent pushed value.
• ESP register always points to the top of the stack.
• As program adds data to the stack, the stack grows downward from high memory to low memory.
• When items removed from the stack, stack shrinks upward from low to high memory.
• When a word value is pushed onto the stack, the assembler decreases the ESP (Stack Pointer) register by 2.
• When a word value is popped off the stack, the assembler increases the ESP register by 2.
• When a double word value is pushed/popped off the stack, the assembler decreases/increases the ESP register by 4.

So, I think that is all I want to say at this stage about the basics of assembler programming. As we move forward we will talk about the other stuff that you need to know and how the ideas presented here are actually used in the process of developing programs.

In the next tutorial we will create the default program template for Visual MASM that we will use for every program that we write, so until then, enjoy.