Optimizing & Benchmarking SPO600 Project Stage 2 – Part 2

Algorithm and Profiling

Profiling a program enable us to see the total amount of time that is spend on all of the functions of a specific execution of an ELF file. Therefore, this means we can view which functions get calls the most and the execution time that is spent. It is useful to determine which function has the longest execution time, because it can be potentially optimized. To profile a program there is a special tool that is called gprof that can create a file graph containing all the information regarding the program that has been executed. Here is how I have used gprof on sha256deep and verified again that sha256_update function is constantly being called. 

 

To generate the gprof profile file, we need to add to the build option flag the following: –pg this will enable the gprof to generate the file which contains the info as described above. After the build is complete I did execute sha256deep normally. Now, this is the part where I can generate the file by typing the following command: gprof sha256deep > info. The info is the file that has all the information and can be accessed using any text editor. In summary, these are the necessary steps to generate the file:

  1. Add -pg to the build option (in the Makefile)
  2. Execute the ELF file: time ./sha256deep ../../1GB.test
  3. gprof sha256deep > out

Results:

  %   cumulative   self              self     total           

 time   seconds   seconds    calls  us/call  us/call  name    

  …

  0.49      6.16     0.03   131072     0.23    46.96  hash_update_sha256

   …

  0.08      6.17     0.01                                                  sha256_starts

  0.00      6.17     0.00           0.00     0.37              hash_final_sha256

  0.00      6.17     0.00           0.00     0.00              hash_init_sha256

   …

There are other options to visually see the results such as using gprof2dot, which is an open source tool that enables to generate a visual graph. Based on the results, I can see that hash_update_sha256 is being called the most (131072 calls). But, why is it hash_update_sha256 instead of update_sha256 like I have mentioned on stage 1? This is because that function hash_update_sha256 actually calls the sha256_update. I believe that to better understand which function gets called is it use gprof2dot as it is better to visualize it (as showen in the image to the right). Next, I will talk about my experience trying to optimize the function from stage 1.

void hash_update_sha256(void * ctx, const unsigned char *buf, size_t len)

{

  sha256_update((context_sha256_t *)ctx,buf,(uint32_t) len);

}

gprof2dot Visual Profile

I have examined the code for sha256_update function and it seems to me that it is already optimized for many reasons. First, the function is already using the appropriate and fastest memory copy method (memcpy). I have benchmarked with other functions instead of memcpy, such as memmove and memset, they do not improve the performance at all. I believe that there are ways to write a faster memcpy code using inline assembler, but since I have a little experience with it unfortunately, this couldn’t be implemented. Secondly, the function’s variables is using the appropriate data type of unsigned int (uint8_t & uint32_t) this is the most useful data-type when doing computations on bits or storing small values to save as much memory space as possible. This function calls sha256_process so this also couldn’t be changed. I do believe that this may be optimized using inline assembler but being a beginner on inline assembler is not enough because you need to truly understand what you are doing.

Summary

Even though I couldn’t optimize the code any further with the current algorithm, there are still improvements to time that it takes to encrypt a file, which I did on part 1. With the altered build option of -O3. I will check with the hashdeep community why this option isn’t being used and will create a PR with the build option changes to the Makefile and see if they accept it. Overall, this was a good experience to dive into a  fairly large project and investigate the ways on how to try to optimize it.

void sha256_update( context_sha256_t *ctx, const uint8_t *input, uint32_t length )
{
uint32_t left, fill;
if( ! length ) return;
left = ctx->total[0] & 0x3F;
fill = 64 – left;
ctx->total[0] += length;
ctx->total[0] &= 0xFFFFFFFF;
if( ctx->total[0] < length )
ctx->total[1]++;
if( left && length >= fill )
{
memcpy( (void *) (ctx->buffer + left),
   (const void *) input, fill );
sha256_process( ctx, ctx->buffer );
length -= fill;
input += fill;
left = 0;
}
while( length >= 64 )
{
sha256_process( ctx, input );
length -= 64;
input += 64;
}
if( length )
{
memcpy( (void *) (ctx->buffer + left),
   (const void *) input, length );
}
}