Spectrogram Merger

Spectrogram Merging & Morphing

Professional tool for audio hybridization. Deterministic spectral merging with advanced phase reconstruction. Powered by Rust and WGPU.

LEARN MORE
Product Interface

Core Technologies

📊

Synchronization & Splitting

The algorithm automatically aligns tracks with sub-millisecond precision using phase cross-correlation. Audio is split into LF, MF, and HF bands.

🧠

Intelligent Masking

Amplitude merging does not happen by simple addition. The system generates dynamic masks (PSF, Wiener, Softmax) modulated in real-time.

🔄

Phase Reconstruction

The main challenge of spectrum merging is phase distortion. Merger uses a hybrid adaptive approach: Bregman divergences (Bregman-KL) are applied for harmonics.

🎛️

Studio Post-Processing

The final stage includes HF envelope transplantation, suppression of narrow-band resonances, and EMD (Empirical Mode Decomposition) filtering of high-frequency artifacts.

Adaptive STFT Resolution

Different ranges require different resolutions. Bass is processed with windows up to 4096 samples, while transients are processed with 512-sample windows, minimizing time and frequency anomalies.

📉

Bregman Divergences

Iterative optimization of the phase grid using gradient descent minimizes the mathematical distance between target and current magnitudes for tonal signals.

🔬

Empirical Mode Decomposition (EMD)

The algorithm decomposes the high-frequency signal into intrinsic mode functions (IMFs), selectively smoothing those identified as unnatural artifacts or metallic ringing.

🥁

Transient Isolation

HPSS and LWS algorithms instantly calculate a consistent phase for percussive elements, preventing phase chaos and preserving the impact of transient hits.

Architecture

Language: Rust

Memory safety, no garbage collector, and performance on par with C/C++. An ideal choice for heavy DSP tasks.

Acceleration: WGPU

Cross-platform graphics API. Runs STFT matrix operations and phase iterations on thousands of GPU cores.

Determinism

No black boxes or AI. Mathematically precise algorithms guarantee 100% repeatability of results given identical settings.

Interface: Iced / egui

Modern, reactive GUI rendered directly via WGPU alongside calculations, ensuring zero interface latency.

Memory Safety

Thanks to Rust's ownership system, the application is completely secure against memory leaks and data races during multi-threaded audio processing.

Cross-Platform

A single codebase compiles for Windows, macOS, and Linux, ensuring identical behavior of the DSP engine and hardware renderer across all platforms.

Full Visual Control

Evaluate phase coherence and spectral balance before and after merging with built-in high-resolution analyzers.

Workflow UI

Download Spectrogram Merger

The program is delivered as a cross-platform desktop application, requiring no internet connection or complex installations. Calculations are performed locally.

Technical FAQ on Spectrogram Merging

Architecture & Filtering

How does spectral merging fundamentally differ from normal signal summation (mixing)?
Normal mixing in the time domain adds signals linearly, which inevitably leads to mutual masking of overlapping frequencies, phase cancellation (comb filter effect), and muddiness in the mix. Spectral merging converts signals to the time-frequency plane using Short-Time Fourier Transform (STFT). Based on mathematical masks (PSF, Wiener, Softmax), the algorithm divides the energy of each bin individually. This allows isolating and combining the useful components of one source with the spectral components of another without destructive interference.
Why is pre-splitting the audio signal into three frequency bands (Low, Mid, High) necessary?
The physics of sound require different mathematical resolutions for different ranges. Low frequencies (bass, kick) are critical to frequency grid precision, requiring large analysis windows (2048–4096 samples). High frequencies (transients, sibilants) require high temporal resolution to avoid attack blurring and pre-echo, which is achieved with 512–1024 sample windows. Splitting into bands using Linkwitz-Riley or Biquad filters allows processing each spectral zone with the optimal window size and hop, minimizing time and frequency anomalies.
How do adaptive crossover filters work based on spectral centroid?
Instead of fixed crossover cutoff frequencies, the algorithm continuously calculates the spectral centroid (center of gravity of the spectrum) of the combined signal. The smoothed time trend of the centroid dynamically modulates the cutoff frequencies of the filters within designated ranges. This allows dynamically expanding the MF band when bright vocals appear, or narrowing it in favor of the bass zone when low frequencies dominate.

Masking & Modulations

What is the difference between Weights (Softmax), Wiener, and PSF masks?
Each mask solves a specific mathematical task. Weights (Softmax) exponentially distributes weights based on the magnitude difference and curvature coefficient (k), creating a flexible and controllable non-linear separation. Wiener builds an optimal mean-square estimate based on power density, subtracting noise. PSF (Phase-Sensitive) takes the phase coherence of the two signals into account: if the bin phases are out of phase, the mask reduces their amplitude contribution, preventing cancellations during summation.
How do mask modulators work based on spectral flux (SuperFlux), tonality, and local SNR?
The base mask is modulated by the physical properties of the signal. SuperFlux calculates the difference of log spectra over frequency and time, identifying sharp attacks and stationary zones; it forces the mask to 0.5 on transients to preserve attacks. Tonality (Flatness) measures the flatness of the spectrum: tonal components are preserved more strictly, while noise components are smoothed. The SNR modulator estimates the noise level using minimum statistics in a sliding window and weakens the mask of the noisy source.

Phase Reconstruction

What is Locally Weighted Sums (LWS) and why is it a non-iterative method?
LWS is an analytical method that reconstructs the phase directly from STFT consistency, minimizing discrepancies between overlapping frames (phase inconsistency) without iterations. It works instantly and is used as a reference method for mixing phase angles of two signals proportionally to their amplitude masks. This prevents phase chaos typical of simple mathematical addition.
How do iterative phase reconstruction methods work (Bregman, RTISI-LA)?
They search for a consistent phase by repeatedly passing the signal through forward and inverse Fourier transforms (STFT/ISTFT). RTISI-LA uses a look-ahead buffer and a dynamic alpha weight coefficient calculated from the local root-mean-square error, which minimizes latency. Bregman divergence (Bregman-KL / IS) minimizes the mathematical distance between target and current magnitudes, optimizing the phase grid with gradient descent at each step.
Why separate phase into harmonic and percussive components (HPSS mixing)?
Harmonics (prolonged tonal signals) and percussion (sharp transients) require different mathematical approaches. The phase of harmonics must be smooth over time — ideal for Bregman-KL or Explicit Relation methods. The phase of percussion must be aligned over frequency — where LWS or SPSI are indispensable. The engine separates the input STFT into H (Harmonic) and P (Percussive) layers using median filtering or NMF, reconstructs their phases independently with optimal methods, and sums them back, eliminating transient blurring and 'jitter' of tones.

Post-Processing & Cleanup

What is Empirical Mode Decomposition (EMD) and how does it clean the HF spectrum?
During complex mixing of high frequencies, metallic beats and 'rustle' can occur. EMD decomposes the high-frequency signal in the time domain into a set of intrinsic mode functions (IMFs). The algorithm analyzes the instantaneous frequency (via Hilbert transform) and entropy of each mode. Modes identified as noise artifacts are selectively weakened or smoothed over time, cleaning the upper range without losing detail.
How do envelope transplantation and resonance suppression work?
Envelope transplantation calculates the macro-shape of the spectrum (envelope) of the LF region using linear predictive coding (LPC) or cepstral analysis and/or extrapolates it to the HF region, correcting local balance distortions. Resonance suppression continuously searches for ultra-narrow peaks with a high Q-factor and high phase stability over time (via local group delay LGD variance), then dynamically attenuates them, freeing the mix from whistling.