In today\\’s era where malware otherwise known as malicious software have affected (and it is still affecting) companies all over the world, it is recommended to have the ability to analysis malware residing on both windows and linux platforms.
However we will focus only on how to analysis malware on linux platforms. You can make use of the same methods discussed here to analysis malware on Windows.
It is necessary as a malware analyst to understand how compilers such as gcc compiles a progam.
So We will look at the following sub-topics:
Prerequisites:
Four Stages of Binary Compilation:
The four stages of c program compilation are as follows:
By default, gcc compiler compiles a group of files/files as execuatble program as shown below:
gcc hello.c
The above command emits the following output:
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2ae6a46529ec896b291cd7671dc19ee578abf8bf, not stripped
The gcc compiler automatically compiled our simple c program to an executable file.
Preprocessing stage:
In this stage, the c preprocessor
which is a a separate program from the compiler reads the content of the system file header and insert it into a file with the .i
suffix.
Execute the command below to read the system file headers
cpp hello.c > hello.i
Open the hello.i
file to see the output. This is not the complete output.
hello.c # 1 \ # 1 \ # 31 \ # 1 /usr/include/stdc-predef.h 1 3 4 # 32 \ 2 # 1 hello.c # 1 /usr/include/stdio.h 1 3 4 # 27 /usr/include/stdio.h 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/libc-header-start.h 1 3 4 # 33 /usr/include/x86_64-linux-gnu/bits/libc-header-start.h 3 4 # 1 /usr/include/features.h 1 3 4 # 424 /usr/include/features.h 3 4 # 1 /usr/include/x86_64-linux-gnu/sys/cdefs.h 1 3 4 # 442 /usr/include/x86_64-linux-gnu/sys/cdefs.h 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/wordsize.h 1 3 4 # 443 /usr/include/x86_64-linux-gnu/sys/cdefs.h 2 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/long-double.h 1 3 4 # 444 /usr/include/x86_64-linux-gnu/sys/cdefs.h 2 3 4 # 425 /usr/include/features.h 2 3 4 # 448 /usr/include/features.h 3 4 # 1 /usr/include/x86_64-linux-gnu/gnu/stubs.h 1 3 4 # 10 /usr/include/x86_64-linux-gnu/gnu/stubs.h 3 4 typedef unsigned char __u_char; typedef unsigned short int __u_short; typedef unsigned int __u_int; typedef unsigned long int __u_long; typedef signed char __int8_t; typedef unsigned char __uint8_t; typedef signed short int __int16_t; typedef unsigned short int __uint16_t; typedef signed int __int32_t; typedef unsigned int __uint32_t; typedef signed long int __int64_t; typedef unsigned long int __uint64_t;
Compilation Stage:
In this stage, we will cause the gcc compiler to emit or produce assembly file preferably Intel assembly syntax because it is more readable than AT&T syntax.
You can do by executing the following command:
gcc -S -masm=intel hello.c
It produces assembly file hello.s
. You can open it and view the contents. We will look at the contents later .
vim hello.s
.file "hello.c"
.intel_syntax noprefix
.text
.section .rodata
.LC0:
.string "Hello, World!"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
push rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
mov rbp, rsp
.cfi_def_cfa_register 6
lea rdi, .LC0[rip]
mov eax, 0
call printf@PLT
mov eax, 0
pop rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",@progbits
As you can see, Intel’s assembly syntax is quite readable than AT&T shown below:
.file "hello.c"
.text
.section .rodata
.LC0:
.string "Hello, World!"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
movl $0, %eax
call printf@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 8.3.0-6) 8.3.0"
.section .note.GNU-stack,"",@progbits
Assembly Stage
In this stage, the gcc compiler will convert the assembly file into an object file using the command below:
gcc -c hello.c
An object file can be describe as a relocatable file because files are compiled independently from each other.
Also at the time of compilation the assembler can not detect the memory address of files(most legitimate program consist of individual files).
Therefore it is makes it possible for the linker to to bring together all files as an executable
You can check whether a file is an object file by using the file utility described here:
file hello
hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Again we will look at stripped and non-stripped files in our next articles because these files are essential to a malware analyst.
Linking Stage
Here we will link together all the individual files to create an executable using this command gcc
followed by the name of the c program.
At this stage it is possible to resolve symbolic references to static libraries. References to dynamic libraries are not resolved till the program is loaded into memory.
We will discuss symbolic references to static and dynamic libraries in our next article.
gcc hello.c
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2ae6a46529ec896b291cd7671dc19ee578abf8bf, not stripped
The main difference between an object file and an executable file is references made to libraries are not resolved in an object file but are resolved in an executable file(except for dynamic libraries)
Now we understand how a gcc compiler compiles a program.
In our next article, we will take a look at contents of both object and executable file, static and dynamic libraries.
Written by: Michael Aboagye
Sign up our newsletter for update information, insight and promotion.