DSP Archives - Jhonatan López

NoiseRverb: Heritage Acoustics in a Plugin

Jhonatan López — Sat, 31 May 2025 02:00:00 +0000

New blog post — I’ve been meaning to write this for a long time. It’s about a concept that has fascinated me from the moment I encountered it. I’ll also share a small contribution I made in this area with NoiseRverb, a heritage acoustics plugin that captures the sonic identity of historic churches in Quito through convolution reverb.

We often associate heritage with visual beauty—particularly in architecture. This connection largely stems from the preservation of artistic landmarks due to their historical significance. As a result, many of these sites have become popular tourist attractions.

But what if we consider a different kind of heritage — one we can hear? Can sound hold historical value? Can it be preserved? The answer is yes. While this isn’t a new idea — researchers have studied it before — it’s starting to find its place in the audio industry as well.

While some heritage spaces are preserved through sound art, others live on through live recordings or songs. For example, the Festival Internacional de Música Sacra (FIMUSAQ) in Quito has helped preserve local culture through sacred music. While it’s not the only example globally, it clearly shows how sound can carry historical weight.

What Is Heritage Acoustics?

Heritage acoustics is a field that explores and protects the sound of historic spaces. It combines architecture, physics, and sound engineering. From cathedrals to theatres, their sound is as iconic as their structure.

The goal is to study how sound behaves in these spaces. It involves looking at reverberation, reflections, absorption, and how sound waves move. There are three main techniques used:

On-site measurements: Microphones and speakers are placed in the space. Specific signals are played (like sine sweeps), and the room’s response is recorded. This captures real acoustic data.
Impulse response (IR) capture: This technique records how a room reacts to a quick, broad sound. The result is a unique audio fingerprint of the space.
Computer modelling: 3D simulations of the space are created using architectural drawings. This allows engineers to predict how sound will behave, even without access to the real location.

These methods help protect the unique sound of historical buildings and let others experience it — even from far away.

Applications in the Music Industry

The study of heritage acoustics has led to exciting tools for musicians and producers. These include:

Convolution reverb plugins that recreate historic spaces.
Acoustic modelling for virtual concerts.
Restoring old recordings with their original acoustics.
Immersive museum or VR audio experiences.

All of these are interesting, but I’ll focus on convolution reverb, as it’s where I work most.

Convolution Reverb

To understand convolution reverb, you need to know what an impulse response (IR) is. An IR is a recording of how a space reacts to a short, full-range sound — like a clap or a burst of noise. It contains key information about the room’s reflections and decay.

There are many ways to create an IR: clapping, popping a balloon, or playing a sine sweep. The results are similar, no matter the method.

Convolution reverb takes this IR and applies it to any audio signal. The result is a realistic simulation of how that signal would sound in the original space. This is done using a mathematical process called convolution. It combines the original sound with the impulse response to recreate the acoustic experience.

This technique is popular for its realism. It lets musicians take a studio recording and place it in a real, historic space — without leaving their DAW.

Heritage Acoustics Plugin

As part of our exploration, I worked with engineers Analí Pinto and Fausto Espinoza to create a VST3 plugin called NoiseRverb. It’s based on impulse responses captured in seven churches in Quito: San Francisco, Basílica, Catedral, Compañía, Guápulo, El Sagrario, and Santo Domingo.

The plugin lets musicians and producers experience these spaces inside their usual music-making software.

We recorded the impulse responses on-site and processed them digitally. Then, we used real-time convolution to bring those spaces to life inside the plugin. The result is an authentic, immersive experience.

NoiseRverb is free to download here.

Learn more about heritage acoustics and its role in cultural preservation in this introductory article on architectural acoustics and in the study on acoustic and perceptual variables in three heritage churches of Quito.

The post NoiseRverb: Heritage Acoustics in a Plugin appeared first on Jhonatan López.

AllYouNeedIsSound 3: Spectral Representations and Feature Extraction

Jhonatan López — Mon, 24 Mar 2025 13:33:36 +0000

A modern and minimalist interpretation of spectral audio analysis created with DALL·E.

Have you ever wondered how machines understand the nuances of sound? In my previous post, we explored spectral analysis and learned how spectrograms reveal the frequency content of audio signals using the Short-Time Fourier Transform (STFT). Now, let’s dive deeper into advanced spectral representations for audio analysis, including Mel Spectrograms, CQT, and HCQT, and show how they can be used for perceptual audio analysis and feature extraction. These tools are essential for building machine learning models for tasks like audio classification, a field I’m currently exploring.

Why Feature Extraction?

Spectral analysis provides us a visual map of audio frequencies, but for machine learning, we need compact, meaningful features that capture the essence of sound. Raw spectrograms are rich but high-dimensional, making them inefficient for direct use in models. By refining them into perceptually relevant or musically meaningful representations, we can extract features that align with how we hear or interpret audio. This is crucial for applications like genre classification, pitch detection, or environmental sound recognition.

Advanced Spectral Representations

Let’s explore three advanced spectral representations that address the limitations of STFT-based spectrograms: Mel Spectrograms, Constant-Q Transform (CQT), and Harmonic-CQT (HCQT). Each of these tools offers unique advantages for audio analysis and feature extraction.

Mel Spectrogram (MEL) and Log-Mel Spectrogram (LMS)

What Are They?

The Mel Spectrogram adapts the STFT to the Mel scale, a perceptual scale of pitch that reflects how humans hear frequency differences (e.g., we’re more sensitive to changes at lower frequencies). It compresses the frequency axis into Mel bins, reducing dimensionality while prioritizing auditory perception. The Log-Mel Spectrogram takes this further by applying a logarithmic transformation to the amplitude, mimicking the logarithmic response of our ears to loudness.

Why Use Them?

Perceptual Relevance: Mel Spectrograms align with human hearing, making them ideal for speech and music analysis.
Machine Learning Ready: Log-Mel Spectrograms are compact and widely used as input features for deep learning models.

Example in Python

```python
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Load audio
y, sr = librosa.load('/content/drive/My Drive/audio_files/sample.wav')

# Compute Mel Spectrogram
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_db = librosa.power_to_db(S, ref=np.max)  # Log-Mel Spectrogram

# Plot
plt.figure(figsize=(14, 5))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Violin Log-Mel Spectrogram')
plt.show()
```

n_mels=128: Number of Mel bins (adjustable based on your needs).
Output: Time vs. Mel frequency, with colour showing log-amplitude.

Figure 1: Example Log-Mel Spectrogram generated from an audio file using the code above. The x-axis represents time, and the y-axis shows frequency, giving a visual representation of the sound’s intensity over time.

Constant-Q Transform (CQT)

What Is It?

The Constant-Q Transform (CQT) is an alternative to STFT that uses a logarithmic frequency scale, where the frequency resolution is constant relative to the center frequency (constant Q-factor). Unlike STFT’s fixed window size, CQT’s window size varies—longer for low frequencies, shorter for high ones.

Why Use It?

Musical Advantage: Its logarithmic scale matches the intervals of musical notes (e.g., octaves), making it perfect for pitch-related tasks like chord recognition or music transcription.
Better Resolution: It captures low-frequency details (e.g., bass notes) better than STFT.

Example in Python

The following example was implemented following the code used before.

```python
# Compute CQT
C = librosa.cqt(y, sr=sr)
C_db = librosa.amplitude_to_db(abs(C), ref=np.max)

# Plot
plt.figure(figsize=(14, 5))
librosa.display.specshow(C_db, sr=sr, x_axis='time', y_axis='cqt_note')
plt.colorbar(format='%+2.0f dB')
plt.title('Violin Constant-Q Transform')
plt.show()
```

y_axis=’cqt_note’: Labels the y-axis with musical notes (e.g., C4, D4), emphasizing its musical focus.

Figure 2: Example Constant-Q Transform generated from an audio file using the code above. The x-axis represents time, and the y-axis shows musical notation, giving a visual representation of the sound’s intensity over time.

Harmonic-CQT (HCQT)

What Is It?

The Harmonic Constant-Q Transform (HCQT) extends CQT by analysing harmonic structures. It computes CQTs at multiple harmonic multiples (e.g., fundamental frequency and its overtones) and stacks them into a 3D representation.

Why Use It?

Pitch-Related Applications: HCQT excels at separating harmonic content (e.g., a piano’s notes) from noise or percussive elements, ideal for pitch detection or source separation.
Research Edge: It’s advanced and less common, showcasing cutting-edge techniques.

Note on Implementation

`Librosa` doesn’t directly provide HCQT, but you can approximate it by computing CQTs for harmonic multiples manually or use external libraries like `nnAudio`. Here are simplified examples using both libraries:

With `Librosa`:

```python
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Load audio file
y, sr = librosa.load('/content/drive/My Drive/audio_files/sample.wav', sr=22050)  # Replace with your file path
hop_length = 512  # Number of samples between successive frames
harmonics = [1, 2, 3]  # Harmonics to analyze (fundamental + overtones)

# Compute HCQT for the fundamental (h=1)
fmin = librosa.note_to_hz('C1') * 1  # Convert note C1 to Hz (~32.7 Hz)
n_bins = 60  # Total bins (5 octaves: 60/12 = 5)

# Check Nyquist limit (prevents aliasing)
nyquist_limit = fmin * (2 ** (n_bins / 12)) 
if nyquist_limit < sr / 2:
    # Compute Constant-Q Transform
    cqt = librosa.cqt(y, sr=sr, hop_length=hop_length, 
                     fmin=fmin, n_bins=n_bins, bins_per_octave=12)
else:
    raise ValueError("Nyquist limit exceeded! Adjust parameters.")

# Convert CQT magnitude to decibels (normalized to max amplitude)
cqt_db = librosa.amplitude_to_db(np.abs(cqt), ref=np.max)

# Generate CQT frequency axis (logarithmic scale)
frequencies = librosa.cqt_frequencies(n_bins=n_bins, fmin=fmin, bins_per_octave=12)

# Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(cqt_db, sr=sr, hop_length=hop_length,
                        y_axis='cqt_hz', x_axis='time',  # Log-frequency axis
                        fmin=fmin, bins_per_octave=12, 
                        vmin=-80, vmax=0)  # dB range and optional colourmap add , cmap='viridis'
plt.colorbar(format='%+2.0f dB', label='Amplitude (dB)')
plt.ylim(frequencies[0], frequencies[-1])  # Set frequency axis limits
plt.title('Violin Harmonic-CQT (Fundamental) - Librosa')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.show()
```

Limitations:

AllYouNeedIsSound 2: From Waveforms to Spectral Representations

Jhonatan López — Mon, 17 Mar 2025 16:48:23 +0000

A modern visualization of digital sound frequencies created with DALL·E.

In my last post, I showed how to load and visualize audio waveforms using Python. Now, let’s dive deeper into spectral analysis with Python, a powerful technique for understanding the frequency content of audio signals. By using this approach, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis.

What is Spectral Analysis?

Spectral analysis helps us break down an audio signal into its individual frequencies, making it easier to understand its components. For example, while a waveform shows amplitude over time, spectral analysis reveals the frequency components hidden within the sound.

Why is Spectral Analysis Important?

Frequencies are the building blocks of sound. Therefore, analysing them allows us to distinguish between different types of audio, such as a guitar note versus a drum beat. Additionally, this technique is crucial for tasks like music genre classification and speech recognition.

Key Concepts

Spectrogram

A spectrogram is a visual representation of how the frequencies in an audio signal change over time. It’s like a “heatmap” of sound, where:

The x-axis represents time.
The y-axis represents frequency.
The colour intensity represents amplitude (e.g., brighter colours mean louder frequencies).

Short-Time Fourier Transform (STFT)

The Short-Time Fourier Transform (STFT) is a mathematical tool used to create spectrograms. Unlike the standard Fourier Transform, which analyses the entire signal at once, the STFT breaks the audio into short, overlapping segments and applies the Fourier Transform to each segment. This allows us to see how frequencies evolve over time, making it ideal for analysing real-world audio, which is rarely steady like a pure tone.

A Teaser for Future Posts

While STFT-based spectrograms are powerful, they’re just the beginning. In future posts, we’ll explore advanced features like Mel spectrograms and MFCCs (Mel-Frequency Cepstral Coefficients), which are widely used in machine learning for audio classification.

Practical Example: Computing and Visualizing a Spectrogram with Python

Let’s put theory into practice. First, we’ll load an audio file using Librosa. Then, we’ll compute the STFT and visualize the spectrogram. Finally, we’ll interpret the results to understand the audio’s frequency content. Here’s a step-by-step guide:

0 Mount Google Drive

You can omit this step if you following my previous post.

```python
from google.colab import drive
drive.mount('/content/drive')
```

1 Load the Audio File

```python
import librosa
import librosa.display
import numpy as np  # Import numpy as np
import matplotlib.pyplot as plt

# Load an audio file
y, sr = librosa.load('/content/drive/path/to/your/audio.wav')
```

2 Compute the STFT and Convert to Decibels

```python
# Compute the STFT and convert to decibels
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_db = librosa.power_to_db(S, ref=np.max)  # Log-Mel Spectrogram
```

3 Plot the Spectrogram

```python
# Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Song Log-Mel Spectrogram')
plt.show()
```

Here’s what the output looks like after running the code with a sample audio file:

Figure 1: Example Log-Mel Spectrogram generated from an audio file using the code above. The x-axis represents time, and the y-axis shows frequency, giving a visual representation of the sound’s intensity over time.

Understanding the Output

Time (x-axis): Shows how the audio evolves over time.
Frequency (y-axis): Shows the range of frequencies present in the audio.
Colour Intensity: Represents amplitude (louder frequencies appear brighter).

For example:

Figure 2: Example Log-Mel Spectrogram generated from an audio file using the code above on a sustained violin note. It appear as a horizontal line at a specific frequency.

Figure 3: Example Log-Mel Spectrogram generated from an audio file using the code above on a drum hit. It appear as a vertical spike across multiple frequencies.

What Does the Spectrogram Tell Us?

Spectrograms provide a wealth of information that waveforms cannot. For instance:

Horizontal Lines: Indicate sustained tones, such as a violin note or a humming sound.
Vertical Spikes: Represent short, sharp sounds, like a drum hit or a clap.
Patterns: Repeated patterns in the spectrogram might correspond to musical rhythms or speech phonemes.

These features provide valuable insights into the structure and content of audio signals, making spectrograms invaluable for tasks like sound classification, speech recognition, and music analysis.

Reflection

Learning spectral analysis has been a transformative experience for me. It opened my eyes to the complexity of audio signals and deepened my appreciation for the mathematical tools that make audio processing possible. One of the challenges I faced was understanding how to choose the right window size for the STFT. Too short, and the frequency resolution suffers; too long, and the time resolution becomes blurry. Through experimentation and research, I learned to balance these trade-offs.

This journey has reinforced my belief that spectral analysis is not just a technical skill but a gateway to understanding the rich, hidden world of sound. As I continue to explore advanced techniques like CQT and HCQT, I’m excited to share my discoveries and challenges in future posts.

Conclusion

Spectral analysis is a powerful tool for unlocking the frequency content of audio signals. Additionally, by moving beyond waveforms and exploring spectrograms, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis. In this post, we’ve covered the basics of spectral analysis, introduced the Short-Time Fourier Transform (STFT), and demonstrated how to compute and visualize spectrograms using Python.

Additional Resources

Librosa Documentation: A comprehensive guide to the Librosa library.
Google Colab: A free, cloud-based environment for running Python code.
Freesound.org: A repository of free audio samples for experimentation.
Deep Learning 101 for Audio-based MIR, ISMIR 2024 Tutorial by Geoffroy Peeters et al. (2024).
Kinsler, L. E., Frey, A. R., Coppens, A. B., & Sanders, J. V. (2000). Fundamentos de Acústica (4ª ed.). Wiley.

The post AllYouNeedIsSound 2: From Waveforms to Spectral Representations appeared first on Jhonatan López.