On this blog, I will test the performance of a sound scale program with different algorithms. I will be performing a simple benchmark test to see the amount of time this takes to run and check if it has improved or not by modifying the source code logic. Just a brief background about this program, it calculates and creates sound samples, also it scales the volume of the sound speakers. It contains approximate 500k of random sound samples that are stored in an array. I will later increase the array size to 5 million for testing purposes, this is being tested on the AArchi64.
Benchmarking Version 1
Let’s begin by testing the current state of the program algorithm to check if it’s producing the same output and how long does it take to run. This is the first version of the source code as I will include the other versions with different algorithm approaches to see if this will improve in performance. Below here, we have the original source code pretty straight forward, we use int62_t data type because digital sound is typically represented this way in 16-bit. Going trough the code, whats happening here is basically we are filling a large amount of sound samples with random number values, notice that the sample values accepts negative values. After, we are scaling all the sound samples of the array with the volume factor of 0.75 (max is 1.00 which indicates that it’s full volume and 0 is silent). Then, when this is finished we are summing up the values and printing the results. Now, the task is to check by using different algorithms to see if we can improve the amount of time this takes to calculate and process all the samples.
As we can see, the program produce the same results but not the same amount of processing time it takes to run (command used view the execution time is “time ./vol1”).
To know the exact time, how much it takes for the CPU to run the program, I have inserted the clock method in the source code that starts before the loop of scaling the volume and will end after this procedure is done. I have recompiled the file so this is why the results has changed.
Pre-Calculated Look-up Table - Version 2
For this approach, I have created pre-calculated a lookup table (array) of all possible sample values multiplied by the volume factor (0.75) After calculations are finished I have obtained each sample in that table to get the scaled values. Comparing the two programs it appears that using this approach has improved the performance speed but of course this is mainly due to eliminating some code that isn’t being used.
Bit Shifting & Conversion - Version 3
For this approach I have created a fix-point integer and assigned it to multiple by a binary number representing a fixed-point of the value of 1, I have used 0b100000000 which represent 1.00. Then in the loop I will shift the results to the right. This also seems to improve the performance quite a bit. Also I have included a detailed results by using the command /usr/bin/time -v.
In conclusion it appears that the fastest version is 2(506.7 ms) followed by version 3(522.66 ms) and version 1 (651.36 ms). It’s worth mentioning that all these versions were compiled with -O3 optimization enabled as this improve the performance time. Reason being why version 2 is fastest is because I have eliminated few of the data types and the scale_sample function = less processing and registering for the CPU.