In the previous post we've looked at some of the most commonly used registers, the stack, some x86 instructions, and how to call a higher level Win32 API function. We've combined these to create an ASM program which displays a message box with a custom message and title.
The function expects a pointer to a null-terminated string and returns the length, i.e. the number of characters that make up the string without including the terminating null character. As a trivial example let's consider the string "HELLO" shown below:
As portrayed in the image, the string "HELLO" constitutes of 5 characters and the NULL character, hence passing a pointer to this string to strlen() returns 5.
The
Arithmetic operations can be performed on registers and memory locations. We can add two registers with
In this post we'll look at some more interesting x86 instructions and construct the strlen() C++ function from scratch.
strlen()
x86 Instructions
To keep the theory at a minimum, only instructions required to achieve our goal, and some of their variations, will be covered in this section.
MOVe
The
MOV
instruction is probably the most common ASM instruction. As the name suggests it moves data from one place to another. To be more precise it copies data across rather than move it, since the source value remains intact. The operands can be a register (e.g. EAX
), a memory location (e.g. [EAX]
; the memory location pointed at by EAX
) or an immediate value (e.g. 0FFh
). The following are some examples of such:mov eax, F003h ; immediate -> reg ; EAX = 0xF003 mov [eax], F003h ; immediate -> mem ; [EAX] = 0xF003 mov ebx, eax ; reg -> reg ; EBX = EAX mov eax, [ebx] ; mem -> reg ; EAX = [EBX] mov [ebx], eax ; reg -> mem ; [EBX] = EAX mov eax, [eax+ebx*4] ; reg -> reg ; EAX =[EAX+EBX*4]
Notice that data moves from right to left. This is the Intel syntax, and we'll be sticking exclusively to it. For those of you who are interested, the other major syntax is the AT&T syntax and is predominantly used in *nix environments.
ADD/SUB & INC/DEC
ADD
, subtract an immediate value from a memory location with SUB
, increment values with INC
and decrement values with DEC
.
add esp, 14h ; esp = esp + 0x14 add ecx, eax ; ecx = ecx + eax sub ecx, eax ; ecx = ecx - eax sub esp, 0Ch ; esp = esp - 0xC inc eax ; eax = eax + 1 dec eax ; eax = eax - 1 inc dword [eax] ; [eax] = [eax] + 1 add word [eax], 0FFFFh ; [eax] = [eax] + 0xFFFF add dword [eax], 0FFFFh ; [eax] = [eax] + 0xFFFF
Notice that when the destination is a memory location, the size of the memory that we want to deal with has to be explicitly specified. Let's see why this is important.
Consider the last 2 of the examples above which at first glance look deceptively equivalent. Say that
EAX
points to an arbitrary location in memory. Also say that if we read a dword in length starting from this location, we retrieve 0x41424344
. This means that if we read a word in length starting from the same location, we get 0x4344
. So:
add dword [eax], 0FFFFh => dword [eax] = 0x41424344 + 0xFFFF => dword [eax] = 0x41434343 & word [eax] = 0x4343 & CF=0 add word [eax], 0FFFFh => word [eax] = 0x4344 + 0xFFFF => dword [eax] = 0x41424343 & word [eax] = 0x4343 & CF=1Notice the differences. For the
word
-case, the most significant part is untouched since we explicitly told it to consider a word
in length and the Carry Flag has been set to 1 as the result is larger than a word
.
OR/XOR
These instructions perform bitwise (eXclusive)OR of the operands and place the result in the first operand specified. The following are some self-explanatory examples:or ebx, ebx ; ebx = ebx | ebx xor eax, [ecx] ; eax = eax ^ [ecx] or [edx], eax ; [edx] = [edx] | eax xor eax, 0fh ; eax = eax ^ 0xF or word [eax], 0FFFFh ; [eax] = [eax] | 0xFFFFAs in the previous case, when dealing with memory locations, the size has to be explicitly specified.
XOR
has 2 interesting properties: XORing twice with the same value yields the original value and XORing a value with itself results in zeroes. The latter is used to clear a register whereas the former is sometimes used as a simple shellcode obfuscating technique to evade Antivirus programs.
TEST & CMP
TEST
performs a bitwise AND
but does not save the result and CMP
performs a SUB
without saving the result. These operations are used to set the appropriate flags for subsequent operations which may perform different actions depending on these flags.
cmp eax, ecx ; eax - ecx test [ebx], eax ; [ebx] & eax cmp edx, 0FFh ; edx - 0xFF test dword [eax], 0ABCDh ; [eax] & 0xABCDOnce again make sure the length is specified when dealing with memory locations.
JZ(JE), JNZ(JNE) & JMP
Jumps are used to control the flow of an ASM program just like "if..then..else" and "switch" for higher-level programming languages. Jump if Zero (JZ
) and Jump if Equal (JE
) are interchangeable since the Zero Flag is set to 1 when two equal values are compared, and same for Jump if Not Zero (JNZ
) and Jump if Not Equal (JNE
). JMP
is an unconditional jump and transfers the control flow irrelevant of the flag values.
jmp procedure ; jump to "procedure" je procedure ; jump to "procedure" if ZF=1 jz procedure ; jump to "procedure" if ZF=0
INT
INT
generates a software interrupt and takes a single hexadecimal value. We will not go through the numerous available interrupts but we talk briefly on the following: 0x21 and 0x3. INT 0x21
calls an MS-DOS API call depending on the values in the registers. INT 0x3
(0xCC) is used by debuggers to set a breakpoint.
int 3 ; execution breaks here if attached to a debuggerA complete Intel Syntax Reference guide can be found here.
Constructing strlen()
- Set CTR = 0
- Is char at position CTR = 0x00 ?
- If Yes GOTO 6
- Increment CTR
- GOTO 2
- DONE: Return CTR
;nasm -fwin32 strlen.asm ;GoLink /entry _main strlen.obj ;Run under a debugger global _main section .data input db "What is the length of this string?",0 ; string to compute length on section .text _main: xor ecx, ecx ; clear registers to be used xor ebx, ebx ; mov edi, input ; move pointer to start of input to edi check_next: mov bl, [edi+ecx] ; move char to bl test bl, bl ; check if char = 0x00 (i.e. end of string) jz done ; if ZF=1 (i.e. bl=0x00), jump to done inc ecx ; increment ecx (counter) jmp check_next ; jmp to check_next done: int 3 ; debugger interrupt
xor ???, ???
ECX
is used both as a counter and a pointer to the next character while the lower part of EBX
holds the character we are comparing to.mov edi, input
EDI
, i.e. [EDI]
= "W".check_next
mov bl, [edi+ecx]
[EDI+ECX]
to the BL
register.test bl, bl
TEST
performs a bitwise AND
without storing the result. The only value that returns zero when AND
ed to itself, is zero. So, if BL
= 0x0 then BL & BL
= 0x0, hence ZF
=1. If BL
<> 0x0, then BL & BL
<> 0x0, hence ZF
=0.jz done
ZF
=1, i.e. the previous result was 0x0, i.e. we hit the end of the string, jump to the "done" label.inc ecx
ECX
which holds a counter to the string length.jmp check_next
int 3
ECX
register which, at the point it reaches the interrupt, should contain the length of the string in hexadecimal format.
Command Prompt
C:\>nasm -fwin32 strlen.asm
C:\>GoLink /entry _main strlen.obj
GoLink.Exe Version 1.0.1.0 - Copyright Jeremy Gordon 2002-2014 - JG@JGnet.co.uk
Output file: strlen.exe
Format: Win32 Size: 1,536 bytes
C:\>
As you might have realized by now, we can't just run the executable. We need to attach it to a debugger to be able to view the result in the ECX
register. In a future post I might demonstrate how to format a registry value and display it in a message box. It's more involved than it sounds.
The following screenshot shows the status of the registers after the debugger hits the INT
instruction:
ECX
contains 0x22 which translates to 34, the length of "What is the length of this string?". Play around with it; modify the string and run it step by step to get a better feel for ASM.
This comment has been removed by a blog administrator.
ReplyDelete