Learning Assembly Language With a Compiled C Program

On this blog, I will investigate the relationship between a basic C source code and the output of the C compiler. I will use windows subsystem for Linux to test the code with GCC compiler. As an example, I will use a simple C program HelloWorld.c and we will see the different compilation processes. We will see the results of the compiler in ELF format (Executable and Linkable Format) which contains multiple sections that describe the code in greater details such as, runtime execution, important data for linking, relocation and much more.

Let’s begin with the simple C program, which basically contains the following source code:

Hello World.c - Source Code
#include <stdio.h>
     int main(){
    printf("Hello World!\n");

This program will print the message to the screen, simple as that. Next, I’m going to compile this file with GCC compiler with the following options:

enable debugging information

does not optimize

do not use builtin function optimizations

Toggle Content

Running this command will produce the executable file (ELF), which contains multiple sections that we can check their details (object code, link tables, debugging symbols). To check these details, I will use objdump command:

  1. Let’s find the section that contains the source code running objdump –source HelloWorld the “–source” prefix enable us to see the source code along with other sections information.

To find the output of the program, I have used objdump -s HelloWorld this will display per-section summary information. I was able to find the printed text in .rodata section. .rodata is a segment of a constant data so to my understanding the string “Hello World!” (and newline) is the constant data for this segment. These are just 2 sections of the ELF file, The file has much more details but this is the basis of what we need for now.

Recompiling The File With Different Variants

1. Adding The Compiler Option -static

After recompiling the code using gcc -g -O0 -fno-builtin -static -o HelloWorld_v1 HelloWorld.c I have noticed a few changes when inspecting the ELF file. First, the file size has changed from 10KB to 893KB, this is because, when using -static as an option the compilation process includes the libraries on the file instead dynamically removing them, therefore an increase in file size.

 Secondly, there are changes under the section; the callq line has changed from printf@plt to _IO_printf and 2 lines at the button have been added. On the above example, we dynamically create the ELF file, this is why we have this function call printf@plt. @plt is basically a pointer which calls GOT (think of it as a table) that will eventually call the printf function, but in the bottom example, we have used static that does not need linking.

2. Removing The Compiler Option -fno-builtin

When removing this option (gcc -g -O0 -static -o HelloWorld_v2 HelloWorld.c), the compiler will try to optimize our code. What has changed here is printf@plt to puts@plt. Puts() is a simpler and better function to use instead of printf(), puts() automatically insert a newline after the last string and printf you can control and format how you display with or without arguments. This seems to be more efficient and faster in performance-wise as I have even tested with the time command.

3. Removing The Compiler Option -g

When compiling the code without the -g option (gcc -O0 -fno-builtin -o HelloWorld_v3 HelloWorld.c) I can see a small decrease in file size, this because debugging information section is not included. Also, I have noticed that the main section does not include the program source code.

4. Adding Additional Arguments to The printf() Function

Adding few arguments to the printf and compiling with gcc -g -O0 -fno-builtin -o HelloWorld_v4 HelloWorld.c I can see that the arguments are showing in the main section and each one is assigned to a register name. the list of register names are:

  1. %esi
  2. %edx
  3. %ecx
  4. %r8d
  5. %r9d
The rest are not showing because we have pushq, which pushes the registers to stack. This basically means that the arguments are being placed in a collection of elements in a memory location.

5. Moving printf() to a Separate Function Called output()

When adding another function we can except that there will be another section to be created. As we can see here the new function output section has been created and the printf is showing up there. In the main section, we can see the function calling output().

6. Replacing -O0 with -03

By replacing the -O0 with -03 option we can except a slightly improved performance in our program. As we can see in the image below xor operator have been added or replaced with some of the mov operators. This is because, in an in modern system architecture xor operator is preferred over the mov. Xor operator requires resources but it is better than mov in performance. Another noticeable change is the return __printf_chk, which basically is an interface similar to the printf expect it will check for stack overflow

So far this is being an interesting experience and a bit challenging one, understanding assembly language is pretty tough compared to high-level languages today.