iSA_LOGO_FINAL-new-3 (1)
Categories
Innovation

Introduction to Linux Malware Analysis

Add Your Heading Text Here

Share it:

In today\\’s era where malware otherwise known as malicious software have affected (and it is still affecting) companies all over the world, it is recommended to have the ability to analysis malware residing on both windows and linux platforms.

However we will focus only on how to analysis malware on linux platforms. You can make use of the same methods discussed here to analysis malware on Windows.

It is necessary as a malware analyst to understand how compilers such as gcc compiles a progam.

So We will look at the following sub-topics:

  • The four stages of binary compilation
  • Difference between an object file and executable file

Prerequisites:

  • gcc compiler
  • linux debian/kali
  • a simple C program to compile

Four Stages of Binary Compilation:

The four stages of c program compilation are as follows:

  • Preprocessing
  • Compilation
  • Assembly
  • Linking

By default, gcc compiler compiles a group of files/files as execuatble program as shown below:

gcc hello.c

The above command emits the following output:

a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2ae6a46529ec896b291cd7671dc19ee578abf8bf, not stripped

The gcc compiler automatically compiled our simple c program to an executable file.

Preprocessing stage:

In this stage, the c preprocessor which is a a separate program from the compiler reads the content of the system file header and insert it into a file with the .i suffix.

Execute the command below to read the system file headers

cpp hello.c > hello.i

Open the hello.i file to see the output. This is not the complete output.

hello.c # 1 \ # 1 \ # 31 \ # 1 /usr/include/stdc-predef.h 1 3 4 # 32 \ 2 # 1 hello.c # 1 /usr/include/stdio.h 1 3 4 # 27 /usr/include/stdio.h 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/libc-header-start.h 1 3 4 # 33 /usr/include/x86_64-linux-gnu/bits/libc-header-start.h 3 4 # 1 /usr/include/features.h 1 3 4 # 424 /usr/include/features.h 3 4 # 1 /usr/include/x86_64-linux-gnu/sys/cdefs.h 1 3 4 # 442 /usr/include/x86_64-linux-gnu/sys/cdefs.h 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/wordsize.h 1 3 4 # 443 /usr/include/x86_64-linux-gnu/sys/cdefs.h 2 3 4 # 1 /usr/include/x86_64-linux-gnu/bits/long-double.h 1 3 4 # 444 /usr/include/x86_64-linux-gnu/sys/cdefs.h 2 3 4 # 425 /usr/include/features.h 2 3 4 # 448 /usr/include/features.h 3 4 # 1 /usr/include/x86_64-linux-gnu/gnu/stubs.h 1 3 4 # 10 /usr/include/x86_64-linux-gnu/gnu/stubs.h 3 4 typedef unsigned char __u_char; typedef unsigned short int __u_short; typedef unsigned int __u_int; typedef unsigned long int __u_long; typedef signed char __int8_t; typedef unsigned char __uint8_t; typedef signed short int __int16_t; typedef unsigned short int __uint16_t; typedef signed int __int32_t; typedef unsigned int __uint32_t; typedef signed long int __int64_t; typedef unsigned long int __uint64_t;

Compilation Stage:

In this stage, we will cause the gcc compiler to emit or produce assembly file preferably Intel assembly syntax because it is more readable than AT&T syntax.

You can do by executing the following command:

gcc -S -masm=intel hello.c

It produces assembly file hello.s. You can open it and view the contents. We will look at the contents later .

vim hello.s

    .file   "hello.c"
    .intel_syntax noprefix
    .text
    .section    .rodata
.LC0:
    .string "Hello, World!"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    push    rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    mov rbp, rsp
    .cfi_def_cfa_register 6
    lea rdi, .LC0[rip]
    mov eax, 0
    call    printf@PLT
    mov eax, 0
    pop rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Debian 8.3.0-6) 8.3.0"
    .section    .note.GNU-stack,"",@progbits

As you can see, Intel’s assembly syntax is quite readable than AT&T shown below:

    .file   "hello.c"
    .text
    .section    .rodata
.LC0:
    .string "Hello, World!"
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE0:
    .size   main, .-main
    .ident  "GCC: (Debian 8.3.0-6) 8.3.0"
    .section    .note.GNU-stack,"",@progbits

Assembly Stage

In this stage, the gcc compiler will convert the assembly file into an object file using the command below:

gcc -c hello.c

An object file can be describe as a relocatable file because files are compiled independently from each other.

Also at the time of compilation the assembler can not detect the memory address of files(most legitimate program consist of individual files).

Therefore it is makes it possible for the linker to to bring together all files as an executable

You can check whether a file is an object file by using the file utility described here:

file hello

hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

Again we will look at stripped and non-stripped files in our next articles because these files are essential to a malware analyst.

Linking Stage

Here we will link together all the individual files to create an executable using this command gcc followed by the name of the c program.

At this stage it is possible to resolve symbolic references to static libraries. References to dynamic libraries are not resolved till the program is loaded into memory.

We will discuss symbolic references to static and dynamic libraries in our next article.

gcc hello.c

a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=2ae6a46529ec896b291cd7671dc19ee578abf8bf, not stripped

The main difference between an object file and an executable file is references made to libraries are not resolved in an object file but are resolved in an executable file(except for dynamic libraries)

Now we understand how a gcc compiler compiles a program.

In our next article, we will take a look at contents of both object and executable file, static and dynamic libraries.

Written by: Michael Aboagye