Sladky Player

Multithreaded Audio Output and Hardware DSP

A portable audio player built on system-level Rust. High-performance GUI rendering via WGPU API, low-level audio output powered by the cross-platform Cubeb library, asynchronous Symphonia decoder, and a fully customizable 32-bit Float DSP pipeline.

ARCHITECTURE SPECIFICATION

Signal Path and Low-Level DSP

Audio signal processing is executed entirely in 32-bit Float stereo mode. The DSP pipeline is composed of consecutive independent modules performing precise mathematical correction of the source samples.

⚙️

1. Correction and Filtering (Pre-processing)

The SignalCorrector module aligns channel delays (Time Alignment) in the ring buffer down to the sample level, handles polar phase inversion, and performs downmixing to mono. The DynamicsPre module applies subsonic filtering (Subsonic Filter, 20Hz HPF, Linkwitz-Riley 4th-order via cascaded biquad filters) and ultrasonic filtering (22kHz LPF), combined with Automatic Gain Control (AGC) driven by an RMS detector (targeting -16 dBFS).

📉

2. Equalization and Multiband Saturation

A parametric 12-band equalizer uses peaking filters with stable quality factor Q=1.41 at frequencies from 16 Hz to 16 kHz. The Goodizer module performs signal separation into 3 bands (LPF crossover at 250 Hz and HPF at 4000 Hz) with independent compression (adjustable attack/release ratios, threshold, and ratio parameters) and non-linear saturation (tanh wave-shaping) based on the selected mode (A, B, C, D).

🧠

3. SoundUP RT Psychoacoustic Upscaling

A multi-component engine: a subharmonic resonator (LPF at 100 Hz) generates low-frequency harmonics using the ADAA (Anti-Aliasing Double Approximation) method to eliminate aliasing. The Haas decorrelator (HPF at 800 Hz, delay of 1.2 ms) widens the stereo image. The ATR module isolates transients (6.5 kHz band) using fast and slow envelope detectors to dynamically accentuate their onset phase. The HF exciter performs polynomial synthesis of even and odd harmonics above 11 kHz combined with modulated noise.

🎧

4. Binaural Spatial Rendering

The Hrtf3DVirtualizer module performs Mid/Side decomposition. The Mid channel is processed via a peaking filter (3 kHz, Q=1.0) to simulate ear canal resonance and a notch filter (6.5 kHz, Q=2.0). Acoustic distance is emulated by early reflections (delay of 0.9 ms, level at -15.5 dB). The Side channel is corrected by an 8 kHz notch filter. Crosstalk is delayed by ITD (0.28 ms, 13 samples at 48 kHz) and filtered using a high-shelf filter to simulate head acoustic shadowing (-10 dB above 1.5 kHz).

RUST

// Lock-Free real-time audio pipeline on Cubeb SPSC ring buffers
pub fn process_audio_callback(
output_buffer: &mut [f32],
cons_main: &mut Consumer<f32>,
cons_amb: &mut Consumer<f32>,
dsp: &mut DspProcessor,
quick_dsp: &mut QuickDspState,
settings: &SharedAudioSettings
) -> isize {
let frames_requested = output_buffer.len() / 2;
let mut amb_buffer = vec![0.0f32; output_buffer.len()];

let amb_popped = cons_amb.pop_slice(&mut amb_buffer);
let popped = cons_main.pop_slice(output_buffer);

let vol = settings.get_float(&settings.volume);
let bal = settings.get_float(&settings.balance_lr);
let pitch = settings.get_float(&settings.pitch);
let reverb_wet = settings.get_float(&settings.reverb_wet);
let reverb_room = settings.get_float(&settings.reverb_room);
let crossfeed = settings.get_float(&settings.crossfeed);

let l_gain = if bal > 0.0 { 1.0 - bal } else { 1.0 };
let r_gain = if bal < 0.0 { 1.0 + bal } else { 1.0 };

for i in (0..popped).step_by(2) {
    let mut l = output_buffer[i];
    let mut r = output_buffer[i+1];

    let (l_pt, r_pt) = quick_dsp.pitch_shifter.process(l, r, pitch);
    let (l_rev, r_rev) = quick_dsp.reverb.process(l_pt, r_pt, reverb_wet, reverb_room);
    let (l_cf, r_cf) = quick_dsp.crossfeed.process(l_rev, r_rev, crossfeed);
    let (proc_l, proc_r) = dsp.process_stereo(l_cf, r_cf);

    l = proc_l * l_gain; 
    r = proc_r * r_gain;
    if i < amb_popped {
        l += amb_buffer[i];
        r += amb_buffer[i+1];
    }
    output_buffer[i] = l * vol;
    output_buffer[i+1] = r * vol;
}
frames_requested as isize
}

// 3-Band Linkwitz-Riley Crossover with Envelope Tracked Compression
impl Goodizer {
pub fn process_frame(&mut self, mut l: f32, mut r: f32) -> (f32, f32) {
    if self.amount <= 0.0 { return (l, r); }
    let (dry_l, dry_r) = (l, r);

    // 1. Поканальный кроссовер (LPF на 250 Гц, HPF на 4000 Гц)
    let low_l = self.lpf_l.process(l);
    let low_r = self.lpf_r.process(r);

    let high_l = self.hpf_l.process(l);
    let high_r = self.hpf_r.process(r);

    let mid_l = l - low_l - high_l;
    let mid_r = r - low_r - high_r;

    // 2. Поканальная компрессия выделенных частотных диапазонов
    let comp_low_l = self.c_low_l.process(low_l);
    let comp_low_r = self.c_low_r.process(low_r);

    let comp_mid_l = self.c_mid_l.process(mid_l);
    let comp_mid_r = self.c_mid_r.process(mid_r);

    let comp_high_l = self.c_high_l.process(high_l);
    let comp_high_r = self.c_high_r.process(high_r);

    // 3. Рекомпозиция и смешивание обработанного сигнала с сухим
    let mix_l = comp_low_l + comp_mid_l + comp_high_l;
    let mix_r = comp_low_r + comp_mid_r + comp_high_r;

    (
        dry_l * (1.0 - self.amount) + mix_l * self.amount,
        dry_r * (1.0 - self.amount) + mix_r * self.amount
    )
}
}

// ADAA Non-Linear Waveshaping (Anti-Aliasing Double Approximation)
fn ln_cosh(x: f32) -> f32 {
let ax = x.abs();
if ax > 4.0 {
    ax - 0.69314718
} else {
    let x2 = x * x;
    x2 * (6.0 + x2) / (3.0 * x2)
}
}

fn adaa_tanh(x: f32, x_prev: f32) -> f32 {
let diff = x - x_prev;
if diff.abs() < 1e-5 {
    x.tanh()
} else {
    // Вычисление первой производной первообразной ln(cosh(x))
    (ln_cosh(x) - ln_cosh(x_prev)) / diff
}
}

AUDIO ENGINE STREAM MONITOR

Interface/API:

WASAPI Exclusive (Bit-Perfect)

Output Format:

32-bit Float @ 384000 Hz

Hardware Buffer:

50 samples (~1 ms latency)

Resampling DSP:

Sinc Interpolation (64000 Taps)

Quantize Dither:

NS9 Noise Shaping

PCM-to-DSD SDM:

ASDM7EC (DSD512)

Clipping Guard:

Soft Saturation ON (Thresh: -1.0 dB)

GUI Backend:

WGPU Vulkan 144Hz

Lock-Free Multithreaded SPSC Architecture

The real-time audio output thread is completely isolated from the GUI thread. Decoding, resampling, dynamic crossfading along customizable Bezier curves, and ambient synthesis are executed in parallel threads, exchanging audio data via lock-free SPSC (Single Producer Single Consumer) ring buffers allocated on the heap. This ensures stable audio packet delivery to the system audio card without the risk of buffer underruns and playback stuttering.

CRYSTALLIZATION

Audio Upscaling

Sound reconstruction driven by SoundUP algorithms.
Optimized for real-time mode.

Демонстрация

Original SoundUP

0:00 / 0:00

Upscale Intensity

MIN

MAX

SOUNDUP RT

||||||||||||||||||||

Portability and Performance

The player compiles into a single static executable file without external runtime dependencies. All configuration and DSP pipeline settings are stored locally in settings.json. Hardware acceleration via the WGPU API ensures instantaneous interface rendering with zero CPU overhead.

DOWNLOAD FOR WINDOWS (X64)

Архитектурные спецификации плеера
Подсистема	Спецификация реализации	Потоковая модель
Декодер	Symphonia (MP3, FLAC, M4A, OPUS, WAV)	Асинхронный фоновый поток
Ресемплер	Windowed Sinc (Kaiser Window) / до 1M Taps	Интерполяция в потоке декодера
Интерфейс вывода	Cubeb (WASAPI Shared / Exclusive, ASIO)	Аудиопоток реального времени
Буферизация	Lock-free Ring Buffer (SPSC Heap)	Межпотоковый обмен без блокировок
Рендеринг GUI	WGPU API (Vulkan / DX12) + WGSL шейдеры	Главный поток отрисовки (144 FPS)

Technical FAQ on Sladky Player Architecture

What is the difference between WASAPI Shared and Exclusive modes in the player?: In Shared mode, the player passes the decoded audio stream to the Windows system mixer, which resamples it according to system settings and mixes it with external system sounds. In Exclusive mode, the audio engine captures the hardware device directly. This locks the system audio stack, letting the stream bypass the Windows mixer and go directly to the DAC without third-party interference (Bit-Perfect), eliminating phase distortion and forced upsampling.
Why does Exclusive mode throw an error on certain sample rates?: In exclusive mode, the hardware audio card driver only accepts sample rate and bit depth parameters that are physically supported by your DAC chip. If the player settings are set to 384 kHz while the device supports a maximum of 192 kHz, the stream initialization will fail on the hardware level. In this case, the player will safely redirect the stream back to Shared mode.
How does the multithreaded SPSC scheme work, and why does it prevent pops and clicks?: The signal path is divided into three isolated threads: decoding/resampling, ambient background synthesis, and low-level audio output. These are linked via lock-free Single Producer Single Consumer (SPSC) ring buffers allocated on the heap. The audio callback retrieves ready-to-play samples from the buffer in microseconds, without waiting for disk read operations or heavy decoding mathematics, which prevents buffer underruns and associated clicking artifacts.
Why is there an integrated sigma-delta modulator (PCM-to-DSD)?: The modulator converts the PCM stream into a 1-bit DSD format (up to DSD512) in real time. The mathematical algorithm pushes quantization noise far above the audible limit (Noise Shaping), preserving the useful signal in ideal linear phase. Schemes with 'EC' (Error Correction) post-filtering also inject weak analog dither to smooth out DAC non-linearities, placing a heavier load on the processor than standard algorithms.

What does the SoundUP RT algorithm actually do during upscaling?: Unlike simple equalizers, SoundUP RT deterministically synthesizes new high frequencies. The module analyzes the tonal base of the Mid/Low range and projects its overtones into the HF range using non-linear polynomial synthesis. The Clarity (ATR) module calculates the dynamic envelope of transients to emphasize attacks, the Haas decorrelator introduces a 1.2 ms phase delay into the HF spectrum of one of the channels to widen the stereo image, and the exciter mixes modulated pink noise to create an airy sensation.
What is ADAA in the sub-bass resonator, and why is it necessary?: When creating subharmonics using non-linear saturation (such as tanh) in a digital environment, aliasing inevitably occurs—where parasitic frequencies reflect off the Nyquist frequency back into the audible spectrum, muddling the bass. The ADAA (Anti-Aliasing Double Approximation) algorithm resolves this issue by numerically integrating the non-linear function. By calculating the first derivative of the signal's antiderivative, ADAA suppresses spectral aliasing almost entirely, keeping the sub-bass deep and clean.
How does binaural 3D audio (HRTF) work?: Activating HRTF mode splits the stereo signal into Mid (center) and Side (stereo width) components. The center channel is filtered to simulate the anatomical resonance of the pinna (boosting at 3 kHz and cutting at 6.5 kHz) and supplemented with an early reflections delay (0.9 ms, -15.5 dB) to remove the 'in-the-head' localization effect. The side channels are delayed by the Interaural Time Difference ITD (0.28 ms) and processed with a head shadowing high-shelf filter (attenuating HF by 10 dB above 1.5 kHz for the opposite ear), establishing a realistic binaural space.
How does the peak limiter protect against clipping?: The peak limiter uses a 2 ms look-ahead buffer. The player delays the main audio stream by this duration, allowing it to analyze peak amplitudes before they reach the output. If a threshold violation is detected, fast gain reduction is triggered (0.5 ms attack) without hard-clipping the waveform. If the overload is too severe, soft saturation kicks in, smoothly rounding off peaks along a hyperbolic tangent (tanh) curve, which prevents harsh digital distortion.

How are gapless playback (Gapless) and smooth transitions (Crossfade) implemented?: The decoder works in advance. As the current track nears completion, background decoding of the next file is initiated. If Crossfade is enabled, the player smoothly reduces the volume of the fading audio stream and increases the volume of the incoming one using cubic Bezier curve tables defined in the settings. This ensures no micro-pauses, clicks, or delays during track changes.
What is the difference between WSOLA and Phase Vocoder algorithms when shifting pitch?: The WSOLA algorithm operates in the time domain: it slices the signal into overlapping segments, finds points of maximum cross-correlation (waveform similarity), and overlaps them with a shift. This method is ideal for vocals and instruments because it preserves the phase structure of attacks. The Phase Vocoder algorithm converts the signal to the frequency domain via Fourier transform, corrects the phases of the bins to compensate for the time shift, and performs the inverse transform. It is better suited for complex polyphonic mixes but can cause phase smearing artifacts.
How is the playlist database structured, and why are recursive queries used?: The database (music.db) contains relational link tables. Playlists support a hierarchical structure using a parent playlist field. When a parent playlist is selected, the player executes a recursive CTE query that automatically traverses the nesting tree downwards, gathers IDs of all child playlists, and returns a flat track list while preserving the user's custom playback order.

Why is the player written in WGPU instead of Electron, and what is the actual RAM consumption?: Electron runs a full Chromium instance, consuming 150-300 MB of RAM just for browser engine needs. Our graphics engine based on WGPU communicates with the video card directly via low-level APIs. The player executable is lightweight and consumes about 6-8 MB of RAM at the application level. However, total process memory consumption under the OS will be higher (around 40-70 MB) due to modern graphics APIs (DirectX 12 or Vulkan) forcing the video driver to reserve significant memory blocks for swapchains, pipeline compilation, and texture caching.
How does the spectrum visualizer work, and does it overhead the system?: A mono sum of samples from the audio stream is copied into a real-time ring buffer. A dedicated background thread retrieves blocks (ranging from 512 to 8192 samples depending on FFT settings in the UI), applies a Hann window, and computes the spectrogram using the Fast Fourier Transform (FFT). The resulting frequency amplitudes are asynchronously packed into a texture. Spectrogram rendering and bar decay occur on the GPU using custom WGSL shaders, keeping CPU overhead minimal.
What does the thin blue line on the spectrogram indicate?: This is a visual indicator of the actual cutoff frequency of the source audio file. The player scans the frame spectrum, finds the range above which the energy level drops below -96 dB, and displays this boundary. This allows you to instantly distinguish original high-quality files from artificially upscaled low-quality ones (for example, a 128 kbps MP3 saved as a 96 kHz FLAC, where the cutoff line will clearly freeze at 16 kHz).