Optimizing & Benchmarking SPO600 Project Stage 2 – Part 1

Reminder on Stage 1

Previously on stage 1, I have discussed the plan and strategy to attempt to optimize the hashdeep project that I found on GitHub. Hashdeep enables to encrypt a file into unique hash numbers. I’ll try to optimize the SHA256 update function and its time performance on various platform architectures. On this blog, I will begin the optimization and compare the results with alternate build options and from there I will move on to seek to further optimize the function.

Altered Build Options

To change the build option for this project I must navigate to the src directory. Inside that directory, there is a file that is called “Makefile”, which contains all the build options to execute and create the ELF (executable link format) files. This saves tremendous amount of time for programmers to test and rebuild multiple files at the same time because they can just type the “make” command, which will rebuild all the files instantly instead of manually typing them. I will be using Visual Studio Code to modify the flag option to set it to -O3. The current build optimization option the project uses is -O2. The makefile is generated after building the project. Here is below, the current build options flag for sha256deep.

CFLAGS = -pthread -g -O2 -MD -D_FORTIFY_SOURCE=2
            -Wpointer-arith -Wmissing-declarations -Wmissing-prototypes
            -Wshadow -Wwrite-strings -Wcast-align -Waggregate-return -Wbad-function-cast
            -Wcast-qual -Wundef -Wredundant-decls -Wdisabled-optimization
            -Wfloat-equal -Wmissing-format-attribute -Wmultichar -Wc++-compat -Wmissing-noreturn -funit-at-a-time

First, I will test again the overall CPU time that it takes to encrypt a 10mb, 100mb and 1gb text files using the current optimization flag -O2 that Is implemented in the project. Then I will change the build option to -O3, which is the highest optimization level. I have also tested the new build option flag on the other CPU architecture (x86_64) along with other 2 servers (Betty & Charlie), which are the same as Aarchie64 but have better and larger memory space.

10MB File

Xerxes x86_64 Server – O2

Process

Time

Real

0m0.094s

User

0m0.095s

Sys

0m0.006s

Aarchie64 Server – O2

Process

Time

Real

0m0.092s

User

0m0.066s

Sys

0m0.016s

Betty Server – O2

Process

Time

Real

0m0.104s

User

0m0.106s

Sys

0m0.000s

Charlie Server – O2

Process

Time

Real

0m0.103s

User

0m0.095s

Sys

0m0.010s

Xerxes x86_64 Server – O3

Process

Time

Real

0m0.093s

User

0m0.094s

Sys

0m0.004s

100MB File

Aarchie64 Server – O3

Process

Time

Real

0m0.075s

User

0m0.067s

Sys

0m0.010s

Betty Server – O3

Process

Time

Real

0m0.103s

User

0m0.105s

Sys

0m0.000s

Charlie Server – O3

Process

Time

Real

0m0.103s

User

0m0.095s

Sys

0m0.010s

Xerxes x86_64 Server – O2

Process

Time

Real

0m0.861s

User

0m0.882s

Sys

0m0.048s

Aarchie64 Server – O2

Process

Time

Real

0m0.759s

User

0m0.668s

Sys

0m0.099s

Betty Server – O2

Process

Time

Real

0m0.996s

User

0m0.959s

Sys

0m0.060s

Charlie Server – O2

Process

Time

Real

0m1.005s

User

0m0.895s

Sys

0m0.129s

Xerxes x86_64 Server – O3

Process

Time

Real

0m0.864s

User

0m0.870s

Sys

0m0.062s

1GB File

Aarchie64 Server – O3

Process

Time

Real

0m0.705s

User

0m0.662s

Sys

0m0.060s

Betty Server – O3

Process

Time

Real

0m0.996s

User

0m0.977s

Sys

0m0.039s

Charlie Server – O3

Process

Time

Real

0m0.996s

User

0m0.966s

Sys

0m0.049s

Xerxes x86_64 Server – O2

Process

Time

Real

0m8.762s

User

0m8.946s

Sys

0m0.490s

Aarchie64 Server – O2

Process

Time

Real

0m7.690s

User

0m6.799s

Sys

0m1.013s

Betty Server – O2

Process

Time

Real

0m10.244s

User

0m9.845s

Sys

0m0.606s

Charlie Server – O2

Process

Time

Real

0m10.143s

User

0m9.772s

Sys

0m0.569s

Xerxes x86_64 Server – O3

Process

Time

Real

0m8.790s

User

0m8.938s

Sys

0m0.535s

Aarchie64 Server – O3

Process

Time

Real

0m7.170s

User

0m6.698s

Sys

0m0.655s

Betty Server – O3

Process

Time

Real

0m10.147s

User

0m9.822s

Sys

0m0.520s

Charlie Server – O3

Process

Time

Real

0m10.144s

User

0m9.866s

Sys

0m0.470s

Based on the build option changes we can see a slight performance improvement on the total CPU (User + Sys) time on Aarchie64. For the 10mb file, we have an improvement of 5 ms (milliseconds), 100mb an improvement of 45 ms and last for the 1gb file an improvement of 459 ms. This is pretty good considering that these are small file size. The performance will be noticeable if we try to encrypt a very large file. Ranking all the tests that are done above, it appears to be that Aarchie64 is the fastest to encrypt a file followed by Xerxes, Betty and then Charlie servers

However, for the x86 architecture the performance does not seem to favor when using large files. There aren’t any improvements for the 100mb and 1gb but only for the 10mb an increase of 2 msI wonder why the flag option -O3 isn’t being used instead of -O2. I will ask the community on GitHub regarding this and will report on stage 3.