[Book] PC Assembly Language
Last Update:
Word Count:
Read Time:
Background
It has been a while since the last article noting a book.
If you have never read those articles, then I would like to introduce this type of articles.
This type of articles are used for keeping notes of books that I read. Unlike other articles, such as malware analysis. This is a “notebook”.
The content will be continuously updated.
Introduction
I started reading PC Assembly Language to strengthen my understanding of low-level execution, which is essential for analyzing bootkits (e.g., Petya), rootkits, and memory corruption vulnerabilities.
Unlike typical summaries, this article serves as a long-term notebook. However, I will also highlight concepts that are directly useful for malware analysis.
The content will be continuously updated as I read through the book.
El libro
Why This Matters for Malware Analysis
Assembly language is not just a programming language, it is the ground truth of how programs execute.
For example:
- Bootkits operate before the OS is loaded -> requires understanding of low-level execution
- Shellcode directly manipulates registers and memory
- Reverse engineering often requires reading compiler output in assembly
Therefore, understanding calling conventions, stack layout, and register usage is critical.
Chapter 1 - Introduction
Key Concept: C Calling Convention
One important concept introduced in this chapter is the cdecl calling convention.
This convention defines:
- How arguments are passed (stack)
- Who cleans the stack (caller)
- How return values are passed (EAX)
This is extremely important in reverse engineering because many malware samples rely on standard calling conventions.
Skeleton program, this program can be used for any program that you want to develop:
1 | |
The original author of this book developed three significant scripts for importing into other program and are widely used through the entire book:
cdecl.hcdecl.casm_io.inc
The source code of these script are available in this GitHub repository.
The author published this book years ago, the platform that the author used is different from today’s platforms. Therefore, some compiling instructions might lead unexpected errors. After investigation, the corrected compiling procedure is shown below:
1 | |
Chapter 2 - Basic Assembly Language
2.1 - Integer Operations
The program below demonstrates how to use IO system:
1 | |
2.2 - Control Structures
The adc and sbb instructions use this information in the carry flag. The adc instruction performs the following operation:
The sbb instruction performs:
How are they used? Consider the sum of 64-bit integers in EDX:EAX and EBX:ECX. The following code would store the sum in EDX:EAX:
1 | |
Subtraction is very similar. The following code subtracts EBX:ECX from EDX:EAX:
1 | |
For large numbers, a loop could be used. For a sum loop, it would be convenient to use adc instruction for every iteration.
Comparison
In assembly, comparison does not directly return a boolean value. Instead, the result is stored in the FLAGS register.
This is different from high-level languages like C, where comparisons return true/false.
Instead:
cmpperforms subtraction internally- The result is reflected in FLAGS (ZF, CF, SF, OF)
This means that control flow depends on how we interpret these flags.
When the difference vleft - vright is computed, the flags are set accordingly. If the difference of the cmp is zero, vleft = vright, then ZF is set (i.e. 1) and the CF is unset (i.e. 0). If vleft > vright, then ZF is unset and CF is unset (no borrow). If vleft < vright, then ZF is unset and CF is set (borrow).
For signed integers, there are three flags that are important: the zero (ZF) flag, the overflow (OF) flag and the sign (SF) flag. The overflow flag is set if the result of an operation is overflow (or underflow). The sign flag is set if the result of an operation is negative. If vleft = vright, the ZF is set (just as for unsigned integers). If vleft > vright, the ZF is unset and SF = OF. If vleft < vright, ZF is unset and SF != OF.
Why does
SF = OFifvleft > vright? If there is no overflow, then the difference will have the correct value and must be non-negative. Thus,SF = OF = 0. However, if there is an overflow, the difference will not have the correct value (and in fact will be negative). Thus,SF = OF = 1.
An example is shown below:
1 | |
The following example demonstrates how conditional branching is implemented using FLAGS:
1 | |
Another example is shown below:1
2
3
4if (EAX >= 5)
EBX = 1;
else
EBX = 2;
If EAX is greater than or equal to five, the ZF may be set or unset and SF will equal OF. Therefore, the pseudo code can be converted below:
1 | |
The above code is awkward. Fortunately, the 80x86 provides additional branch instructions to make these type of tests much easier.
1 | |
Loop
The 80x86 provides several instructions designed to implement for-like loops:
loop: Decrements ECX, if ECX not equal 0, branches to labelloope,loopz: Decrements ECX (FLAGS register is not modified), if ECX not equal 0, branches to labelloopne,loopnz: Decrements ECX (FLAGS unchanged), if ECX not equal 0 and ZF = 0, branches to label
An example is shown below:
1 | |
The pseudo code can be converted below:
1 | |
2.4 - Example: Finding Prime Numbers
1 | |
This pseudo code can be converted below:
1 | |
Note: Using different branch instructions can help us to understand how CPU and registers handle integers.
Chapter 3 - Bit Operations
Shift Operations
Logical shifts
The number of positions to shift can be either be a constant or can be stored in the CL register. The last bit shifted out of the data is stored in the carry flag.
An example is shown below:
1 | |
Arithmetic shifts
The left shift remains the same. However, for right shfts, the leftmost bit (sign bit) is replicated in the vacated positions to preserve the sign of the number.
The instructions is shown below:
sal: Shift arithmetic leftsar: Shift arithmetic right
Rotate shifts
Unlike logical or arithmetic shifts, no bits are lost, and there is no padding with zeros (or sign bit). The bits “wrap around” to the opposite side.
Boolean Bitwise Operations
Chapter 4 - Subprograms
Simple Subprogram Example
A subprogram is an independent unit of code that can be used from different parts of a program. In other words, a subprogram is like a function in C.
A jump can be used to invoke the subprogram, but returning presents a problem. If the subprogram is to be used by different parts of the program, it must return back to the section of code that invoked it. Thus, the jump back from the subprogram can not be hard coded to a label.
1 | |