Changelog

Version 0.9.2 (17.02.2025)

  • Performance Optimization via Numba JIT (Adaptive STFT): The computational loop of the adaptive STFT _adaptive_stft_numba_loop is now fully compiled into machine code using the Numba compiler (decorated with @jit(nopython=True, parallel=True, cache=True)). Hanna window generation for each frame has been moved to the internal JIT function _numba_hanning_window_internal.
  • Signal Energy Conservation: Implemented automatic normalization of generated window weights of arbitrary length by their sum of squares directly within the JIT loop. This guarantees a stable audio signal amplitude when transitioning between windows of different lengths over time.
  • Adaptive 2D Mask Smoothing: The main loop of Gaussian mask smoothing with a time-varying radius is now compiled into the high-performance _numba_adaptive_gaussian_2d_jit function, allowing resource-intensive 2D filtering of high-resolution matrices in real time.
  • Launch Stability Improvements: Moved initialization of the global unhandled exception interceptor sys.excepthook to the stage before the QApplication instance is created. This ensures reliable capture and logging of errors during the startup of the Qt graphics subsystem.
  • Resolved Numba Warnings in Softmax: Added explicit extraction of the real part of complex numbers (np_module.real) before returning arrays to prevent Numba compiler warnings about losing the imaginary part during operations on spectrograms.

Version 0.9.1 (15.02.2025)

  • Logging Architecture Redesign (PrintLogger): Developed a lightweight PrintLogger to replace standard file handlers. The setup_logging function now outputs messages directly to sys.stdout and sys.stderr with forced flush() calls. This prevents file locking and permission issues in isolated sandbox environments (e.g., PyInstaller builds).
  • Strict Parameter Typing and Enum Support in API: Refactored the generate_parameter_descriptions function to automatically export enumeration metadata to the API. Parameters typed as Enums (like StereoIPDSource or EMDMethod) are now serialized to their base string values, eliminating invalid parameter transfers. Separated literal types and Enum classes inside typed_dicts.py.
  • Crossover Mathematical Bug Fix: Fixed the high-pass band calculation in dsp/filters/time_varying.py. The formula H_hp[:, n] = (H_hp1 * H_hp2).astype(np.float32) was corrected to its mathematically accurate form, ensuring complete Linkwitz-Riley complementarity for the three-band crossover and eliminating phase dips in the HF range.
  • Fallback Protections in Boundary Conditions: Added explicit array dimension checks before calling np.argpartition and np.squeeze in process_band_adaptive_phase. This prevents IndexError and ValueError crashes when processing degenerate spectrograms (with a single frequency bin or a single frame).
  • Phase Blending Stability in Quiet Sections: Added a small epsilon constant to the denominator when normalizing Softmax weights in calculate_blend_weights, preventing NaN indefiniteness during absolute silence.
  • Chunk Processing Boundary Control: Added strict boundary control when slicing 2D frequency filter matrices (H_lp_tf, H_bp_tf, H_hp_tf) inside the parallel audio chunk processing worker, fixing crashes at the boundary of the last chunk caused by length rounding errors.
  • Mandatory Numba Dependency: Promoted numba to a mandatory dependency (REQUIRED_PACKAGES) since compiled JIT functions are now critical for meeting adaptive STFT performance targets.

Version 0.9.0 (11.02.2025)

  • Suppressing Vertical HF Artifacts (New DSP Module): Developed an algorithm for detecting and smoothing local spectral spikes (transient vertical bands in the HF range) inside the hf_peak_suppression.py module. The threshold for anomaly detection is calculated dynamically based on the median of spectral flux and its standard deviation.
  • Frequency 2D Anomaly Smoothing: For detected problem frames, the complex values of STFT bins are smoothed with a 1D Gaussian filter along the frequency axis, applied separately to the real and imaginary parts. Implemented an Envelope Protection mechanism to limit maximum attenuation and preserve audio brightness.
  • Stereo Parameters Migration to Enums: Replaced the raw STEREO_IPD_SOURCE string config with a strict StereoIPDSource Enum (featuring MIX and DOMINANT values). Updated LWS blending and CoupledIPD phase coupling algorithms to safely handle types, eliminating processing failures.
  • GUI Parameter Controls for Anomaly Detector: Added slider controls to the Parameter Editor for the HF spike suppression algorithm (toggles for activation, frequency range, sensitivity, smoothing strength, and envelope protection).
  • Isolating Channel Filtering Failures: Improved exception handling in the preview orchestrator. If Butter filter execution fails on one of the audio channels (due to insufficient length or non-numeric data), the worker isolates the crash, marks that channel as inactive, and continues processing the remaining channels without crashing the thread.
  • Adaptive Caching for Band Processing: Spectrograms and masks computed for each band during preview are now cached in RAM with a unique hash key tied to masking parameters. If EQ or Influence Map settings change, this cache layer is reused, reducing CPU load.

Version 0.8.3 (17.10.2024)

  • Interactive Coordinate and Level Tracking (Tooltip): Implemented real-time cursor tracking on the Matplotlib canvas. Cursor coordinates are dynamically mapped to frame and frequency bin indices. The user is shown a floating tooltip showing time (s), frequency (Hz), and the exact point level in decibels (for spectrograms) or linear contribution coefficients (for masks).
  • RAM Caching for Preview Band Processing: Integrated an optimization layer for background previews. Before running heavy filtering, a hash key is generated based on file paths, segment boundaries, and parameter values. If the key matches, spectrograms and masks are retrieved instantly from RAM, accelerating Mid/HF band toggling in the GUI.
  • EMD Parameters Migration to Profile Level: Empirical Mode Decomposition (EMD) parameters were moved from global settings to profile editor settings (under "Post-processing: Phase and EMD" section). This allows custom EMD smoothing parameters to be applied to different stages within the same batch.
  • New Artifact Suppression Parameters: Added a SHARPNESS_REDUCTION_FREQ_START parameter to the editor to control the shelf frequency in the sharpness reduction plugin.
  • Shifting Frequency Thresholds Upwards: Shifted HPSS percussion cleaning, phase smoothing, and spectral tilt correction thresholds higher (above 10,000 Hz) to maximize mid-range transparency and detail preservation.
  • Safe Multichannel Audio Transposition: Transposing of arrays in the input preparation module has been moved to the stage strictly after peak normalization and spectral balancing, eliminating axis alignment failures.
  • Fixing Array Lengths in A-Weighting: Added a mandatory call to librosa.util.fix_length to align weighting arrays with the current number of frames, preventing dimensional mismatch errors during complex STFT steps.

Version 0.8.2 (14.10.2024)

  • Seamless Playback Resumption (A/B Testing): Integrated a playback_start_time attribute into AppState. When switching tabs or altering EQ parameters, the program calculates the current playback position, stops the audio stream, and instantly restarts it with sample-accurate offsets, facilitating immediate A/B comparisons.
  • Adaptive Frequency-Dependent Percussion Smoothing in HPSS: Implemented a frequency-dependent weight generation algorithm with a sigmoid transition. The percussion signal is filtered in parallel by two median filters of different sizes: preserving sharp, undeformed transients in the LF/MF range while applying a maximum window size in the HF range to suppress phase artifacts.
  • Decoupling Preview Logic: The process_segment_for_preview function was refactored to use the BatchGroupData and EditTarget metadata structures instead of direct string file paths. Finding paths to merge sources (AudioSR/FlashSR) is now performed dynamically within the preview module.
  • Optional Instrumental Track Export: Added a "Save Instr." checkbox to the group widget. When activated, the saving worker runs a parallel finalization pipeline for the accumulated instrumental signal and exports it as a separate WAV file with an _INSTRUMENTAL suffix.
  • Removing Complex Values in Softmax: Added explicit np.real extraction before casting to np.float32 during weight calculations, eliminating warnings about losing the imaginary part during amplitude spectrogram mixing.

Version 0.8.1 (13.10.2024)

  • Transition from ZIP Archives to Disk-Based Project Groups: Discarded temporary ZIP archiving in batch mode to reduce disk I/O overhead. The MVSepZipWorker class was rewritten into MVSepPrepWorker; demixing outputs and params_*.json configurations are now saved directly to structured folders inside the Temp_SpectrogramMerger directory.
  • Auto-Generating Parameters on Folder Import: Replaced the ZIP import button with a "Load Groups..." button. When a directory is selected, the program scans the file structure and automatically generates missing parameter JSON configuration files using default values.
  • WOLA-based Spectrogram and Mask Reconstruction: Implemented a system of accumulating input complex spectrograms and calculated masks into OLA buffers in parallel during chunk processing. On stream completion, precise normalization is performed against the sum of squares of the synthesis windows according to the WOLA formula.
  • WOLA Integration with Mask-Guided Phase Iteration: Full-resolution complex spectrograms and masks are now passed directly to the finalizer. This allows the mask-guided iterative phase refinement algorithm to work with mathematically accurate spectral data, completely eliminating the need to recalculate them from the synthesized time signal.
  • Psychoacoustic Masking Threshold Model: Integrated a new _calculate_masking_threshold_simplified_peaks algorithm. Using scipy.signal.find_peaks, the program identifies maskers above the Absolute Threshold of Hearing (ATH) and calculates individual masking thresholds on the Bark scale based on the MPEG-1 spreading function.
  • Clipping Protection in LUFS Normalization: Refactored normalize_loudness: if the peak level exceeds 0 dBFS after applying gain to reach the target LUFS, the gain factor is proportionally scaled down to a safe limit to prevent hard clipping.
  • Safe Audio Export: Built an exception interceptor into the audio writing block during dithering and noise shaping. In case of failure, the algorithm automatically falls back to safe clipping of the original data in the [-1.0, 1.0] range, preventing output file corruption.

Version 0.8.0 (11.10.2024)

  • Architectural Reorganization (Modular Refactoring): Split the project codebase into isolated packages (introducing core/band_processor/, dsp/features/, dsp/hpss/, dsp/masking/, dsp/phase/, dsp/postprocessing/, gui/creation/tabs/, io/, and utils/ structures).
  • Base Background Worker Class: Designed an abstract BaseWorker class for QThread background execution. Integrated logging.LoggerAdapter to automatically inject a unique task_id into log messages, simplifying multi-threaded debug tracing.
  • Adaptive Phase Blending via TF Features: Developed the process_band_adaptive_phase algorithm. Based on frame-by-frame tonality and SNR analysis, the engine calculates weights for different phase reconstruction methods (LWS, RTISI-LA, Bregman, Complex, Blend) and blends phase candidates geometrically in the time domain.
  • New Phase Recovery Methods:
    • ExplicitRelation: phase reconstruction based on explicit mathematical relationships between the derivatives of log-magnitude and phase over time and frequency with 2D least-squares integration via FFT (solving the Poisson equation).
    • SPSI: Single Pass Spectrogram Inversion method with quadratic interpolation of spectral peaks for accurate instantaneous frequency estimation.
    • Bregman Projections: phase optimization supporting generalized Bregman divergences for different beta values (Itakura-Saito, Kullback-Leibler, Frobenius).
    • Mask-Guided Phase Iteration: iterative pulling of the mix phase toward the dominant source phase with weights proportional to mask confidence.
  • NMF-HPSS Signal Separation: Implemented a harmonic-percussive separation algorithm based on Non-negative Matrix Factorization (NMF) with multiplicative updates and L2 regularization of temporal activation smoothness.
  • 10-Stage Mix Post-Processing Cascade (finalizer.py): Implemented a coordinated finalization pipeline for the blended mix:
    1. Spectral Tilt Correction: aligns the HF slope with the target dB/octave tilt using an error-function transition.
    2. HPSS Refinement: isolates and smooths harmonics temporally (Gaussian) and percussion spectrally (median) to eliminate phase chatter.
    3. HF Envelope Smoothing: applies 1D temporal smoothing to HF bin envelopes.
    4. Phase Local Smoothing: applies 2D local smoothing (Gaussian/median) to phase in the HF range to reduce harshness.
    5. Resonance Suppression: detects whistling peaks via Q-factor and phase stability, applying adaptive notch filters.
    6. Envelope Transplantation: extrapolates the LF/MF envelope to the HF range using LPC or cepstral estimation, followed by magnitude correction.
    7. Perceptual Masking: attenuates spectral components lying below the psychoacoustic masking threshold (MPEG-1 model).
    8. Phase Retrieval: runs iterative phase retrieval constrained by Gaussian-smoothed magnitudes.
    9. Post-filtering: filters crossover boundaries using Linkwitz-Riley filters.
    10. Peak Normalization: performs final peak level normalization to 0 dBFS.
  • Disk I/O Caching: Audio file reading has been optimized using an @lru_cache decorator with a 16-segment size limit, drastically reducing disk read loads during preview updates. Moved dithering and noise shaping to a dedicated dithering.py module.

Version 0.7.0 (05.10.2024)

  • Multi-Scale STFT Analysis: Implemented an adaptive STFT resolution system based on local signal transience. The orchestrator computes three STFT grids with different window sizes (12, 46, and 93 ms) and matching hop sizes.
  • Dynamic Time-Frequency Scale Blending: Based on a global transience measure, the engine computes frame-by-frame blending weights: alpha0 (for HF/transients - smallest window), alpha2 (for LF/stationary tones - largest window), and alpha1 (intermediate window). Complex matrices are blended before masking, minimizing pre-echo artifacts on attacks while preserving frequency resolution on stationary tones.
  • Transience Evaluation: Added a calculate_transience_measure function to features.py to estimate transience using Spectral Flatness Change, Spectral Entropy Change, or SuperFlux methods.
  • Streaming Dithering and Noise Shaping (PCM Quantization): Integrated triangular dither (TPDF) and 1st-order noise shaping into the audio writing pipeline. When exporting to integer PCM formats (16/24/32-bit), symmetric noise of 1 LSB amplitude is mixed, and quantization error is filtered with feedback (0.85 feedback coefficient) to shift noise to high-frequency zones.
  • Mandatory Float PCM Export: Configured temporary audio files to be written in raw pcm_f32le format (32-bit Float) via FFmpeg to prevent precision loss.
  • Arbitrary Segment Audio Reading: Expanded the read_audio function with start_frame and num_frames parameters, enabling direct reading of arbitrary segments without loading the entire track into RAM.
  • Safe Parameter Profile Parser: Input fields for lists, tuples, and dicts in the GUI now use Python's safe literal evaluator ast.literal_eval instead of standard eval, eliminating code execution risks when importing profile configurations.

Version 0.6.0 (01.10.2024)

  • Moving to Multi-Stage Processing Groups (Stages): Rewrote the batch manager to process a unified list of projects (zip_list_combined) instead of independent ZIP archives. Created a composite FileGroupWidget to display file groups (projects).
  • Individual Stage Status in the GUI: The right side of the group widget dynamically renders processing rows for each active Stage matching the group type (schemes: single_apollo, pair_apollo, single_sr, pair_sr). Each stage row has an editor button and a color-coded status label.
  • New Stage Type System (EditTarget): Introduced the EditTarget type, which describes individual editing stages (separate AudioSR, FlashSR and their merged outputs for vocals and instruments). AppState now tracks the currently active file group and stage.
  • Multi-Model Super-Resolution Pipeline (SR Merge): Added super-resolution merging stages (sr_merge_instr, sr_merge_vocal) to combine outputs of different super-resolution models (AudioSR and FlashSR). They operate directly on cached intermediate files within the same group.
  • Cascaded Rendering of Groups (BatchSaveWorkerThread): During final export, the pair_sr worker sequentially runs AudioSR merging, FlashSR merging, merges both models based on sr_merge stage parameters, sums the resulting signals in the time domain, and applies peak limiter normalization.
  • Cascaded Previews for SR Merge: Implemented rapid preview generation for sr_merge stages. PreviewWorker loads an 8-second segment of the inputs, merges the AudioSR segment, merges the FlashSR segment, and finally blends both intermediate outputs based on sr_merge parameters to display the final output on the Matplotlib canvas.
  • Onset Interpolation Fix: Added automatic interpolation of the transientness_degree_aligned array to match the frame count of the currently processed band, fixing shape mismatch errors during transient reassignment (TR).
  • Optimizing Softmax Mask Calculation: Removed the strict shape-matching requirement between the k coefficient and the spectrogram in calculate_softmax_weights, enabling 1D time-varying k(t) arrays to be broadcasted during weight calculations.
  • Fixing Crashes in SNR Modulation: Added the missing n_fft argument and fixed the parameter order in apply_weights_modulations when calling noise estimation, preventing thread crashes when SNR masking is enabled.
  • Direct Parameter Sync: Removed signal blocking from parameter change slots. When any parameter is changed in the GUI, the active stage status is instantly updated to [Edited] and the new values are written to disk.
  • Tuning Default Parameters: The relative threshold for transient reassignment TR_MAGNITUDE_THRESHOLD_REL was changed from 1e-4 to 1e-5 for accurate transient localization. The target loudness is set to -23.0 LUFS (EBU R128). The HF N_FFT_HIGH was increased to 4096 with a hop of 512. The incoherence penalty factor was raised to 0.50.
  • Mandatory coloredlogs Dependency and Fallback GUI: The coloredlogs library is now required. In main.py, a check for PySide6 graphic backend availability is executed before importing GUI modules; if missing, a fallback error window is loaded via Tkinter, preventing silent crash loops.

Version 0.5.0 (06.09.2024)

  • Parallel ZIP Archive Lists in Batch Tab: The "Batch" tab layout has been split horizontally into two lists: "Music / Single" (zip_list_music) and "Vocals (Pair)" (zip_list_vocal).
  • Synchronizing Paired Selection: Implemented a select_paired_item_slot slot: when a music ZIP is clicked, the program automatically finds and highlights the corresponding vocal ZIP in the paired list (and vice versa).
  • Inline Parameter Editing in List Rows: Integrated custom BatchZipItemWidget controls directly into list rows, enabling rapid inline configuration of Genre, Accent, Performance, and AI Target settings without entering the full editor. Added color-coded status states: white (waiting), light-blue (analyzed), light-green (edited).
  • Simplified MP3 Imports: Replaced the old configuration sliders in Mp3ItemWidget with a single "Has Vocals" checkbox for easier batch importing.
  • Consolidating Noise Estimation Methods: Reorganized dsp/features.py to route all noise estimation requests through a unified estimate_noise function, wrapping LocalMinStats, GlobalPercentile, MedianFilterTime, and SpectralSubtractionAvg methods.
  • Fault-Tolerant Spectral Flux: Refactored calculate_robust_spectral_flux to gracefully fall back from LocalMedian to Max calculation if SciPy is missing.
  • SOCKS Proxy Support: Added the PySocks>=1.7.0 dependency. Added USE_PROXY and PROXY_URL configuration variables to route API calls over HTTP, HTTPS, or SOCKS proxies. SettingsManager applies these environment variables prior to initializing networking clients.

Version 0.4.1 (03.09.2024)

  • Portable Text Configuration (INI): Rewrote SettingsManager. Instead of relying on Windows Registry/QSettings, the application stores all global configurations, API credentials, and prompt templates in a plain-text merger_settings.ini file located in the application's root directory.
  • Rotational File Logging: Configured standard Python logging with file rotation (10MB limit, keeping up to 5 backups inside the /logs folder). Critical DSP stages, warning notices, and API transaction calls are written to spectrogram_merger.log.
  • UI Log Pane Enhancements: Increased the height of the status_text log pane to 160px to display detailed DSP pipeline status reports comfortably.
  • Matplotlib Plot Tuning: Reduced the tight padding parameter of the Matplotlib canvas from 0.5 to 0.3 to maximize the visual rendering area. Adjusted colorbar paddings and proportions. Reduced the font size of parameter descriptions to 7pt to increase information density.

Version 0.4.0 (01.09.2024)

  • Dual-Track Saving Architecture (Music + Vocals): Rewrote the final rendering workflow in batch_save_worker.py. If a ZIP archive is marked as a "Pair" (pair), the worker runs separate pipelines:
    1. Loads and normalizes the original music WAVs, computing linear gains (gain_music).
    2. Applies those gains to the vocal WAVs to keep the balance.
    3. Blends music spectrograms (skipping finalization) and vocal spectrograms based on their individual parameter profiles.
    4. Sums the resulting time-domain signals of music and vocals.
    5. Applies the final peak limiter and writes the final mix to a single WAV file.
  • Automated Representative Audio Segment Identification: Developed a find_representative_segment algorithm in audio_analysis.py. It reads the first 300 seconds of audio, calculates local RMS envelopes and FFT-based Spectral Flatness. Using a sliding window convolution, it locates the 8-second segment containing the highest energy density and tonal stability, bypassing silent intros. This segment is used for testing.
  • User Adjustments to Test Segment: Removed PREVIEW_START_SEC and PREVIEW_END_SEC from global configurations. Added dedicated spinboxes (test_start_adjust_spinbox, test_end_adjust_spinbox) to the GUI, allowing users to shift the automatically detected boundaries manually.
  • Hybrid Phase Recovery ('RTISI-LA-Hybrid'): Implemented a hybrid phase retrieval method: the engine computes a reference phase using LWS first, and then feeds it as a high-fidelity starting seed to the RTISI-LA algorithm. This minimizes metallic artifacts. Added RTISI_LA_BETA and RTISI_LA_ALPHA_SMOOTH_MS control parameters.
  • Spectral Mask Smoothing: Added the VERTICAL_SMOOTH_BLEND_FACTOR parameter to interpolate between the original mask and a frequency-smoothed version. Stripped technical suffixes (e.g., (Fast Recalc)) from band names to prevent KeyError exceptions.

Version 0.3.2 (24.07.2024)

  • Automatic Temporal Audio Alignment: Integrated an align_audio_signals(y1, y2, sr, max_shift_ms, ...) function inside steps.py. The algorithm calculates the cross-correlation of mono versions of the signals using fast Fourier transforms (FFT) via scipy.signal.correlate(method='fft').
  • Time Alignment for Coherent Phase Overlap: The engine calculates the optimal alignment lag within the bounds defined by MAX_ALIGNMENT_SHIFT_MS. It shifts the waveforms on the timeline: padding the lagging signal with zeros at the beginning and trimming the leading signal, establishing phase coherence.
  • Adaptive Alignment Parameters: Added AUTO_ALIGN_AUDIO (activation flag, defaults to False) and MAX_ALIGNMENT_SHIFT_MS (maximum search lag, defaults to 50.0 ms) to ProcessingParams.
  • Dynamic GUI Visibility: Grouped alignment parameters together inside the Parameter Editor; they are dynamically shown or hidden based on the AUTO_ALIGN_AUDIO checkbox state.
  • Streamlining Progress Metrics: Increased total progress steps in merge_spectrograms from 10 to 11 to track the alignment phase. Added fallback logic to truncate unequal inputs if alignment is disabled.

Version 0.3.1 (23.07.2024)

  • ZIP-Based Batch Processing Project Files: Introduced a new batch format using ZIP archives containing file1.wav, file2.wav, and params.json. Batch jobs are packed into archives in the Output_Zips_SpectrogramMerger directory.
  • Isolating Temporary Assets: Imported ZIPs are extracted into isolated subfolders inside the Temp_SpectrogramMerger directory, preventing file clutter.
  • Two-Tier Batch Processing GUI (tabs.py): Split the "Batch" tab into two panels using a QSplitter:
    1. Top Section ("API -> ZIP"): holds raw input files, triggers preliminary external demixing, and packages assets into ZIPs.
    2. Bottom Section ("ZIP Archives"): manages generated ZIP projects and runs batch processing and final exports.
  • Interactive Batch List Widget: Created a custom BatchZipItemWidget. It features compact dropdown selectors (Genre, Accent, Quality) and test segment spinboxes to configure project settings inline. Features color-coded status states: white (idle), blue (analyzed), green (edited).
  • Headless-Friendly DSP Operations: Designed a fallback qt_fallbacks.py module that emulates PySide6 widgets, signals, and slots when Qt is missing. This allows running the DSP pipeline on headless remote servers or virtual environments without X11 or Qt installed.
  • A-Weighting Mathematical Refinements: Fixed the apply_a_weighting function inside perceptual_models.py to handle frequency bins above a small epsilon ($f > \epsilon$). The 0 Hz bin (DC component) is forced to $-\infty$ dB, preventing division-by-zero errors in certain Librosa versions.

Version 0.3.0 (04.07.2024)

  • Harmonic-Percussive Source Separation (HPSS) Integration: Integrated harmonic-percussive separation (librosa.decompose.hpss). When HPSS is activated, processing.py splits the inputs into harmonic and percussive parts, processing them as separate pipelines across bands and channels.
  • Adaptive Percussion Smoothing: Percussion components are smoothed using tailored parameters (scaling Gaussian sigma and shortening crossfade durations). This preserves transient sharpness while preventing phase smearing.
  • Dynamic Time Reassignment (TR): Removed static accumulation mode selections from reassignment.py. The choice between 'magnitude' and 'energy' integration is evaluated dynamically per frame based on transience.
  • Time-Varying k(t) and alpha(t) Coefficients: The Softmax curvature ($k$) and perceptual weighting balance ($\alpha$) have been converted into time-varying arrays, modulated by the onset envelope.
  • Centralized Typed Configuration (config/types.py): Created types.py to hold literals (MaskingMethod, PhaseMethod, NoiseEstimationMethod, ComponentType) and the ProcessingParams TypedDict. Removed obsolete options (REASSIGNMENT_MODE, USE_TRANSIENT_PROCESSING).
  • Batch Item Configuration Inspector: Added an "Item Settings" panel to the Batch tab. Users can configure Genre, Accent, Quality, and preview boundaries for each file in the batch individually.

Version 0.2.1 (29.04.2024)

  • Rapid Preview Recomputation Engine (fast_recalculation.py): Developed an algorithm to refresh Matplotlib plots rapidly when adjusting EQ sliders or influence maps. It bypasses heavy steps (crossover filtering, forward STFT, secondary modulations, and ISTFT) by applying gain updates directly to cached mid-band magnitudes.
  • Dynamic Preview Spinbox Factory: Moved PREVIEW_START_SEC and PREVIEW_END_SEC into param_widgets, generating them via the unified widget factory.
  • New Analytical Phase Retrieval Methods (phase.py):
    • LWS (Locally Weighted Sums): reconstructs phase from STFT consistency, blending phase angles geometrically based on amplitude mask ratios.
    • RTISI-LA: added a signature placeholder for real-time iterative spectrum inversion with look-ahead.
    • Expanded PHASE_METHOD with 'LWS' and 'RTISI-LA' options.
  • IBM and IRM Reference Masks: Added Ideal Binary Mask (IBM) and Ideal Ratio Mask (IRM) estimators to base_masks.py, dividing energy based on source dominance thresholds.
  • Noise-Subtracted Wiener Filtering: Integrated MedianFilterTime and SpectralSubtractionAvg noise estimators into base_masks.py to subtract noise power spectra before calculating Wiener masks.
  • Thread-Safe Parallel Processing (band_processor.py): Removed stop_event checks from the signatures of inner band processing tasks when invoked through joblib (loky), preventing serialization issues with Qt-bound objects.

Version 0.2.0 (26.04.2024)

  • Centralized Parameter Validation (utils/validation.py): Designed a unified PARAMETER_METADATA registry detailing types, safe boundaries (min/max), step increments, and options for all settings.
  • Sanitizing Parameter Dictionaries: Moved clamping functions (_clamp, _clamp_int) to the validation module. Added validate_value and sanitize_parameters to clean configuration dictionaries and return warning lists.
  • Validating Profiles and GUI Widgets: Refactored profile loading in config/presets.py to use validate_value. The widget factory (gui/widgets/factory.py) queries PARAMETER_METADATA directly, preventing the GUI from falling out of sync with DSP limits.
  • 1D Temporal Median Noise Estimation: Added estimate_noise_median_filter_time to dsp/features.py, estimating noise by applying a 1D median filter along the time axis for each frequency bin.
  • Intro-Segment Average Noise Estimation: Added estimate_noise_spectral_subtraction_avg to dsp/features.py, computing a static noise profile from an initial silent segment (defined by NOISE_EST_SS_START_SEC).
  • Configuring Wiener Noise Sources: Added WIENER_NOISE_METHOD to configurations. Passing sr and hop_length to calculate_base_masks enables spectral subtraction. The Wiener filter and SNR modulator support all 4 noise estimation methods.
  • Adaptive Transient-Aware Mask Crossfading: Added WEIGHT_CROSSFADE_FRAMES_TRANSIENT. In dsp/masking/smoothing.py, apply_mask_smoothing uses the standard crossfade length WEIGHT_CROSSFADE_FRAMES on stationary segments, and a shortened crossfade length on transient segments to prevent smear.
  • Verbosely Logging DSP Operations: Implemented a detailed logging sequence in dsp/masking/smoothing.py to trace applied modulations. apply_vertical_smoothing_crossfade queries the band name dynamically to construct clear status reports.

Version 0.1.1 (23.04.2024)

  • Reorganizing Left-Panel Tabs: Renamed the "Single Merge" tab to "Parameter Editor". Removed manual file input fields (le_file1, le_file2, le_output), prioritizing batch-driven project workflows.
  • Editor Access Control: The Parameter Editor is locked by default, activating only when a project is selected from the batch queue.
  • Preview Panel Optimization: Removed the single-run processing button from the preview panel. Renamed "Test" to "Test Segment" and moved preview boundary selectors into the Parameter Editor under "Main/Test" settings.
  • High-Density GUI Layout: Reorganized the editor grid from a 2-column layout to a dense 4-column scheme. Merged separate Low, Mid, and High STFT parameter groups.
  • Support for Prepared Waveform Pairs: Split the "Add Files..." button into "Add MP3..." (with optional demixing) and "Add WAV Pair..." (for direct dual-file loading).
  • Automated Profile Loading: Selecting an item in the batch queue automatically loads its settings into the Parameter Editor and unlocks the tab.
  • Code Cleanups: Deleted gui/slots/file_slots.py as manual single-file merging is deprecated. Updated widget_state.py and profile_io.py to handle tab locking based on batch item status.

Version 0.1.0 (18.04.2024)

  • Initial Alpha Build: Laid down the foundations for modular processing, asynchronous execution, and external API integrations.
  • Modular Architecture: Divided the codebase into separate packages: configurations (config), orchestrators (core), DSP operations (dsp), interfaces (gui), I/O handling (io), and external APIs (services).
  • Asynchronous Background Execution (QThread): Moved expensive calculations to QThread background workers (ProcessingWorker, PreviewWorker) to prevent GUI freezes. Implemented basic external demixing integration.
  • Tab-Based Layout: Added "Batch Processing" and "Single Merge" tabs to the left-hand panel.
  • Dynamic Parameter Editor: Built a dynamic parameter editor based on a widget factory (factory.py), translating configuration structures into GUI controls.
  • Logarithmic EQ and Influence Sliders: Built specialized 12-band logarithmic sliders for equalizers and influence maps.
  • Matplotlib Visualization: Integrated a Matplotlib canvas to render spectrograms and blending masks.
  • 5th-Order Butterworth Crossover Filters: Implemented a 5th-order Butterworth crossover to divide input signals into Low, Mid, and High bands.
  • Base Blending Masks: Implemented Softmax and Wiener mask calculation.
  • Mask Modulation Filters: Added mask modulation filters based on Spectral Flatness, phase coherence, Spectral Flux, and Local Minimum noise statistics.
  • Phase Retrieval: Added Complex, Dominant, and Blend phase blending methods alongside Fast Griffin-Lim (FGLA) phase retrieval.
  • Temporal Reassignment (TR): Integrated a Time Reassignment algorithm to sharpen spectral transients.