Automatic Vectorization is an interesting topic, mainly because of its performance capabilities. Back then, when computers compilers had to execute large chunks of code in a sequential order (one at a time) it affected the performance speed and time to build the code up. Later, when modern computers were able to compile and do multiple things at a time, a compiler based feature called vectorization was created. This feature is sole to optimize the code, it takes multiple modules from the code and runs them parallel (at the same time) instead of in pseudo-order. Enabling and understanding how to correctly use the power of auto-vectorization can lead to a significant increase performance.
I will demonstrate how to enable vectorization on a simple C program code. One of my tasks was to create 2 integer array elements, which are made of 1000 distinct random numbers. Then, I have to combine each element to another array, which will hold the total number of array 1 and array 2 so it would be for example, arrayOneTwo would equal to = 100 if arrayOne (50) and arrayTwo (50), this is just an example, not the actual code to achieve this. Then, I have to sum all the 1000 elements integer values to a single integer variable. To achieve this, I will have to create multiple loops and this is where our loop vectorization will occur when I will compile the file with -O3 option. This enables optimization/Vectorization. You may read this article, which explains vectorization in greater detail.
This was tested on AArch64, the command I have used to compile this code is gcc -O3 -o lab4 VectorizationLab.c (-O3 enables vectorization). Now, to check if vectorization is successful, I have used objdump -d command and inspected the main section. The code on the left is with -O3 enabled. We can see that this has been vectorized by looking at the SIMD register and Instructions. To identify, if the this has been vectorized we can see that the registers on the button are with starting with v0.4s.
Overall, this was an interesting lab, vectorization is an interesting topic and great feature. The simple program wasn’t hard to create. The challenging part was to inspect the disassembly output code and to translate and describe what is really happening there.