Alternative Version









April 15, 2023

Quad Delays 1.1 (update)


Version 1.1



This is the third update of the year after MisterPug and Klang which will be on Quad Delays this time.

Quick recap: Quad delays is a VST effect plugin that adds up to 4 delays. My goal was to recreate the same effect that we use inside old trackers to improve the sound; which adds the same track twice and adds offset and reduces the volume by half/quarter.

Here is the new change for the version 1.1:
- Right delay target right channel output by default now
- Optimization (CPU usage)


He came to get the body. He's killing us one at a time






Above is one slide from an old presentation that I did a long time ago at Ubisoft about Optimization.

The topic was about SIMD and Intrinsics. The image slide came from the movie Predator.

SIMD is for Single Instruction, Multiple Data which means that in one instruction, you can compute and update multiple data.

Here is example in assembly code: vaddps ymm0, ymm1, ymm2

vaddps: Add packed single-precision (32-bit) floating-point elements in ymm1 and ymm2, and store the results in ymm0.

The vaddps instruction does exactly what the for loop does below but in one faster execution:

FOR j := 0 to 7
    i := j * 32
    dst[i+31:i] := a[i+31:i] + b[i+31:i]
ENDFOR

[Source]

The Instrincs provide features to write partial assembly code without the need to manage registers (CPU variables), stacks and other requirements that we need to do in assembly code.

Here is one example like the example above but in Instrincs mode:

__m256 yPixels0; // 8 float {0.0f, 0.1f, 0.2f, 0.3f, 0.4f, 0.5f, 0.6f, 0.7f};
__m256 yPixels1; // 8 float {0.8f, 0.7f, 0.6f, 0.5f, 0.4f, 0.3f, 0.2f, 0.1f};
__m256 yPixels = _mm256_add_ps(yPixels0, yPixels1);

We don't need to select the registers (e.g. ymm1) for yPixels0, yPixels1 and yPixels variables. The compiler will do it for us.

SIMD and intrinsics are popular in gaming, video, audio and AI. Your graphic card (GPU) also uses this kind of instructions but at a deeper and a faster level.

The first time I tried SIMD was on my Pentium II 266 Mhz (MMX) in 199X.
It was 64 bits registers MMX (MM0-MM7) and I tried some pixel effects and even a 3D software rasterizer.

For Quad Delay, the compiler already used SIMD but it was not in the most optimal way so I added/updated some sections with Intrinsic instructions and it runs 2.03 times faster.

I also added a selector where if your computer doesn't support this kind of SIMD instructions, it will fallback on the old version. Otherwise, the app could crash...


Roads? Where we're going, we don't need roads



The Quad Delays interface is very inspired from the movie Back to the Future.

You can find the references in the intro part (video above or the image comparison below).






Thanks for reading,

JS.