Inline Assembler & Traverso DAW!

Today’s blog will be separated into two parts. In part A, I will compare my previous sound sample program with a similar program that was provided by our professor. I will then, test the performance and compare it with my solution. You may check the previous blog about benchmarking and testing digital sound sample program to get the basic understanding of the code. After, in part B, I discuss about an open source software called Traverso DAW primarily about it’s assembly-code. Here is the source code provided to us.

Source Code

// vol_simd.c :: volume scaling in C using AArch64 SIMD
// Chris Tyler 2017.11.29-2018.02.20

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include “vol.h”

int main() {

    int16_t*        in;     // input array
    int16_t*        limit;      // end of input array
    int16_t*        out;        // output array

    // these variables will be used in our assembler code, so we’re going
    // to hand-allocate which register they are placed in
    // Q: what is an alternate approach?
    register int16_t*   in_cursor   asm(“r20”); // input cursor
    register int16_t*   out_cursor  asm(“r21”); // output cursor
    register int16_t    vol_int     asm(“r22”); // volume as int16_t

    int         x;      // array interator
    int         ttl;        // array total

    in=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
    out=(int16_t*) calloc(SAMPLES, sizeof(int16_t));

    printf(“Generating sample data.\n);
    for (x = 0; x < SAMPLES; x++) {
        in[x] = (rand()%65536)-32768;

// ——————————————————————–

    in_cursor = in;
    out_cursor = out;
    limit = in + SAMPLES ;

    // set vol_int to fixed-point representation of 0.75
    // Q: should we use 32767 or 32768 in next line? why?
    vol_int = (int16_t) (0.75 * 32767.0);

    printf(“Scaling samples.\n);

    // Q: what does it mean to “duplicate” values in the next line?
    __asm__ (“dup v1.8h,%w0″::“r”(vol_int)); // duplicate vol_int into v1.8h

    while ( in_cursor < limit ) {
        __asm__ (
            “ldr q0, [%[in]],#16        \n\t
            // load eight samples into q0 (v0.8h)
            // from in_cursor, and post-increment
            // in_cursor by 16 bytes

            “sqdmulh v0.8h, v0.8h, v1.8h    \n\t
            // multiply each lane in v0 by v1*2
            // saturate results
            // store upper 16 bits of results into v0
            “str q0, [%[out]],#16       \n\t
            // store eight samples to out_cursor
            // post-increment out_cursor by 16 bytes

            // Q: what happens if we remove the following
            // two lines? Why?
            : [in]“+r”(in_cursor)
            : “0”(in_cursor),[out]“r”(out_cursor)

// ——————————————————————–

    printf(“Summing samples.\n);
    for (x = 0; x < SAMPLES; x++) {

    // Q: are the results usable? are they correct?
    printf(“Result: %d\n, ttl);

    return 0;

Performance Comparison

I have modified the amount of samples of both of the programs to 5 million. When initiating the time command, it appears that the version 3 using the bit shift algorithm seems to perform faster. 

Newer Version
Version 3


    // Q: what is an alternate approach?

A1: The alternate approach here is to remove the assigned registers asm(“r20”) and let the compiler decide which register to assign. There are no errors doing this way. 

    // Q: should we use 32767 or 32768 in next line? why?

A2:  The value should be 32767 because, it is the max value of this data-type that can hold for int16_t short.

Microsoft Data Table

    // Q: what does it mean to “duplicate” values in the        //    next line?

A3: My assumption here is that we are duplicating the value of w0 into v1.8h. when I ran the command objdump -d to view the duplication what I see there is register w22 value that is being inserted into v1.8h.

Results of Main Section

    // Q: what happens if we remove the following

    // two lines? Why?

A4: What happened is, we get a compiler error. This is because the compiler can’t find in the _asm_ (inline assembler) the operands %in and %out.

Compiler Error

    // Q: are the results usable? are they correct?

A5:  The results are usable and seems to be correct because it is within the range of generating data sample.

File Execution

For this part I have to choose an open source package and find it’s assembly-language code. Traverso DAW is a cross-platform multi track audio recording and audio editing suite with support for CD mastering and non-linear processing. It is free software, licensed under GNU General Public License. The Source code can be downloaded from GitHub. There are two assembly files that I was able to find with the file extension of .S. 

Traverso is a cross-platform software (Windows, Linux, macOS) so it’s fairly compatible with the most CPU architectures just to list a few from the list of compatible platforms:

As I have mentioned there are 2 assembly files, and they are mainly made of SSE functions for x86 & 64bit. SSE (Streaming SIMD Extensions) functions perform operations on existing data type and size. This can be used to improve performance and optimization of the program. It’s pretty hard to value the complexity of the code for developers who have little experience with assembly. Though this could be adding benefits to the software by optimization, portability and performance wise, to intercept the assembly source code with not much documentation, it’s virtually impossible for beginners level to understand fully. From my observation, I can tell that there are registers with assigned values and couple of loops, I believe this is used to assign the data types in a temporary memory location for a fast look up so this potentially does improve the software performance efficiently.