<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>DSP Archives - Jhonatan López</title>
	<atom:link href="https://www.jhonatanlopez.com/category/dsp/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.jhonatanlopez.com/category/dsp/</link>
	<description>Engineering &#38; Sound Design</description>
	<lastBuildDate>Fri, 08 Aug 2025 03:00:57 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://www.jhonatanlopez.com/wp-content/uploads/2019/01/cropped-Logo-Web-Jhonatan2-1-32x32.png</url>
	<title>DSP Archives - Jhonatan López</title>
	<link>https://www.jhonatanlopez.com/category/dsp/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>NoiseRverb: Heritage Acoustics in a Plugin</title>
		<link>https://www.jhonatanlopez.com/heritage-acoustics-plugin/</link>
		
		<dc:creator><![CDATA[Jhonatan López]]></dc:creator>
		<pubDate>Sat, 31 May 2025 02:00:00 +0000</pubDate>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[Plugin]]></category>
		<guid isPermaLink="false">https://www.jhonatanlopez.com/?p=4853</guid>

					<description><![CDATA[<p>New blog post — I’ve been meaning to write this for a long time. It’s about a concept that has fascinated me from the moment I encountered it. I’ll also share a small contribution I made in this area with NoiseRverb, a heritage acoustics plugin that captures the sonic identity of historic churches in Quito [&#8230;]</p>
<p>The post <a href="https://www.jhonatanlopez.com/heritage-acoustics-plugin/">NoiseRverb: Heritage Acoustics in a Plugin</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="471" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image-1024x471.png" alt="NoiseRverb heritage acoustics plugin interface." class="wp-image-4854" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image-1024x471.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image-600x276.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image-300x138.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image-768x353.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/05/image.png 1137w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>New blog post — I’ve been meaning to write this for a long time. It’s about a concept that has fascinated me from the moment I encountered it. I’ll also share a small contribution I made in this area with <em>NoiseRverb</em>, a heritage acoustics plugin that captures the sonic identity of historic churches in Quito through convolution reverb.</p>



<p>We often associate heritage with visual beauty—particularly in architecture. This connection largely stems from the preservation of artistic landmarks due to their historical significance. As a result, many of these sites have become popular tourist attractions.</p>



<p>But what if we consider a different kind of heritage — one we can hear? Can sound hold historical value? Can it be preserved? The answer is yes. While this isn’t a new idea — researchers have studied it before — it’s starting to find its place in the audio industry as well.</p>



<p>While some heritage spaces are preserved through sound art, others live on through live recordings or songs. For example, the <em>Festival Internacional de Música Sacra</em> (FIMUSAQ) in Quito has helped preserve local culture through sacred music. While it&#8217;s not the only example globally, it clearly shows how sound can carry historical weight.</p>



<h2 class="wp-block-heading">What Is Heritage Acoustics?</h2>



<p>Heritage acoustics is a field that explores and protects the sound of historic spaces. It combines architecture, physics, and sound engineering. From cathedrals to theatres, their sound is as iconic as their structure.</p>



<p>The goal is to study how sound behaves in these spaces. It involves looking at reverberation, reflections, absorption, and how sound waves move. There are three main techniques used:</p>



<ul class="wp-block-list">
<li><strong>On-site measurements</strong>: Microphones and speakers are placed in the space. Specific signals are played (like sine sweeps), and the room&#8217;s response is recorded. This captures real acoustic data.</li>



<li><strong>Impulse response (IR) capture</strong>: This technique records how a room reacts to a quick, broad sound. The result is a unique audio fingerprint of the space.</li>



<li><strong>Computer modelling</strong>: 3D simulations of the space are created using architectural drawings. This allows engineers to predict how sound will behave, even without access to the real location.</li>
</ul>



<p>These methods help protect the unique sound of historical buildings and let others experience it — even from far away.</p>



<h2 class="wp-block-heading">Applications in the Music Industry</h2>



<p>The study of heritage acoustics has led to exciting tools for musicians and producers. These include:</p>



<ul class="wp-block-list">
<li>Convolution reverb plugins that recreate historic spaces.</li>



<li>Acoustic modelling for virtual concerts.</li>



<li>Restoring old recordings with their original acoustics.</li>



<li>Immersive museum or VR audio experiences.</li>
</ul>



<p>All of these are interesting, but I’ll focus on convolution reverb, as it’s where I work most.</p>



<h2 class="wp-block-heading">Convolution Reverb</h2>



<p>To understand convolution reverb, you need to know what an <strong>impulse response (IR)</strong> is. An IR is a recording of how a space reacts to a short, full-range sound — like a clap or a burst of noise. It contains key information about the room’s reflections and decay.</p>



<p>There are many ways to create an IR: clapping, popping a balloon, or playing a sine sweep. The results are similar, no matter the method.</p>



<p>Convolution reverb takes this IR and applies it to any audio signal. The result is a realistic simulation of how that signal would sound in the original space. This is done using a mathematical process called convolution. It combines the original sound with the impulse response to recreate the acoustic experience.</p>



<p>This technique is popular for its realism. It lets musicians take a studio recording and place it in a real, historic space — without leaving their DAW.</p>



<h2 class="wp-block-heading">Heritage Acoustics Plugin</h2>



<p>As part of our exploration, I worked with engineers <strong>Analí Pinto</strong> and <strong>Fausto Espinoza</strong> to create a VST3 plugin called <em>NoiseRverb</em>. It’s based on impulse responses captured in seven churches in Quito: <strong>San Francisco, Basílica, Catedral, Compañía, Guápulo, El Sagrario, and Santo Domingo</strong>.</p>



<p>The plugin lets musicians and producers experience these spaces inside their usual music-making software.</p>



<p>We recorded the impulse responses on-site and processed them digitally. Then, we used real-time convolution to bring those spaces to life inside the plugin. The result is an authentic, immersive experience.</p>



<p><strong>NoiseRverb </strong>is free to download<strong> </strong><a href="https://www.jhonatanlopez.com/sound-design/">here</a><strong>.</strong></p>



<p>Learn more about heritage acoustics and its role in cultural preservation in this <a class="" href="https://www.sciencedirect.com/topics/engineering/architectural-acoustics">introductory article on architectural acoustics</a> and in the <a class="" href="https://www.mdpi.com/2075-5309/15/15/2639">study on acoustic and perceptual variables in three heritage churches of Quito</a>.</p>



<p></p>
<p>The post <a href="https://www.jhonatanlopez.com/heritage-acoustics-plugin/">NoiseRverb: Heritage Acoustics in a Plugin</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AllYouNeedIsSound 3: Spectral Representations and Feature Extraction</title>
		<link>https://www.jhonatanlopez.com/advanced-spectral-representations-audio-analysis/</link>
		
		<dc:creator><![CDATA[Jhonatan López]]></dc:creator>
		<pubDate>Mon, 24 Mar 2025 13:33:36 +0000</pubDate>
				<category><![CDATA[DSP]]></category>
		<category><![CDATA[analysis]]></category>
		<guid isPermaLink="false">https://www.jhonatanlopez.com/?p=4602</guid>

					<description><![CDATA[<p>Have you ever wondered how machines understand the nuances of sound? In my previous post, we explored spectral analysis and learned how spectrograms reveal the frequency content of audio signals using the Short-Time Fourier Transform (STFT). Now, let’s dive deeper into advanced spectral representations for audio analysis, including Mel Spectrograms, CQT, and HCQT, and show how they [&#8230;]</p>
<p>The post <a href="https://www.jhonatanlopez.com/advanced-spectral-representations-audio-analysis/">AllYouNeedIsSound 3: Spectral Representations and Feature Extraction</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img decoding="async" width="1024" height="1024" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3.webp" alt="Three-dimensional waves representing spectral audio analysis in grey and light blue tones on a white background, with a stylized graphic equalizer." class="wp-image-4618" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3.webp 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3-300x300.webp 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3-100x100.webp 100w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3-600x600.webp 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3-150x150.webp 150w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound3-768x768.webp 768w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A modern and minimalist interpretation of spectral audio analysis created with DALL·E.</figcaption></figure>



<p>Have you ever wondered how machines understand the nuances of sound? In my previous post, we explored <a href="https://www.jhonatanlopez.com/spectral-analysis/">spectral analysis</a> and learned how spectrograms reveal the frequency content of audio signals using the Short-Time Fourier Transform (STFT). Now, let’s dive deeper into advanced spectral representations for audio analysis, including Mel Spectrograms, CQT, and HCQT, and show how they can be used for perceptual audio analysis and feature extraction. These tools are essential for building machine learning models for tasks like audio classification, a field I’m currently exploring.</p>



<h2 class="wp-block-heading">Why Feature Extraction?</h2>



<p>Spectral analysis provides us a visual map of audio frequencies, but for machine learning, we need compact, meaningful features that capture the essence of sound. Raw spectrograms are rich but high-dimensional, making them inefficient for direct use in models. By refining them into perceptually relevant or musically meaningful representations, we can extract features that align with how we hear or interpret audio. This is crucial for applications like genre classification, pitch detection, or environmental sound recognition.</p>



<h2 class="wp-block-heading">Advanced Spectral Representations</h2>



<p>Let’s explore three advanced spectral representations that address the limitations of STFT-based spectrograms: Mel Spectrograms, Constant-Q Transform (CQT), and Harmonic-CQT (HCQT). Each of these tools offers unique advantages for audio analysis and feature extraction.</p>



<h3 class="wp-block-heading">Mel Spectrogram (MEL) and Log-Mel Spectrogram (LMS)</h3>



<h4 class="wp-block-heading">What Are They?</h4>



<p>The Mel Spectrogram adapts the STFT to the Mel scale, a perceptual scale of pitch that reflects how humans hear frequency differences (e.g., we’re more sensitive to changes at lower frequencies). It compresses the frequency axis into Mel bins, reducing dimensionality while prioritizing auditory perception. The Log-Mel Spectrogram takes this further by applying a logarithmic transformation to the amplitude, mimicking the logarithmic response of our ears to loudness.</p>



<h4 class="wp-block-heading">Why Use Them?</h4>



<ul class="wp-block-list">
<li><strong>Perceptual Relevance:</strong> Mel Spectrograms align with human hearing, making them ideal for speech and music analysis.</li>



<li><strong>Machine Learning Ready:</strong> Log-Mel Spectrograms are compact and widely used as input features for deep learning models.</li>
</ul>



<h4 class="wp-block-heading">Example in Python</h4>



<pre class="wp-block-code"><code>```python
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Load audio
y, sr = librosa.load('/content/drive/My Drive/audio_files/sample.wav')

# Compute Mel Spectrogram
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_db = librosa.power_to_db(S, ref=np.max)  # Log-Mel Spectrogram

# Plot
plt.figure(figsize=(14, 5))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Violin Log-Mel Spectrogram')
plt.show()
```</code></pre>



<ul class="wp-block-list">
<li><strong>n_mels=128:</strong> Number of Mel bins (adjustable based on your needs).</li>



<li><strong>Output:</strong> Time vs. Mel frequency, with colour showing log-amplitude.</li>
</ul>



<figure class="wp-block-image size-large is-resized"><img decoding="async" width="1024" height="445" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-1024x445.png" alt="Log-Mel Spectrogram for advanced spectral representations in audio analysis" class="wp-image-4555" style="width:656px;height:auto" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-1024x445.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-600x261.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-300x130.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-768x334.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram.png 1081w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em><strong>Figure 1: </strong>Example Log-Mel Spectrogram generated from an audio file using the code above. The x-axis represents time, and the y-axis shows frequency, giving a visual representation of the sound’s intensity over time.</em></figcaption></figure>



<h3 class="wp-block-heading">Constant-Q Transform (CQT)</h3>



<h4 class="wp-block-heading">What Is It?</h4>



<p>The Constant-Q Transform (CQT) is an alternative to STFT that uses a logarithmic frequency scale, where the frequency resolution is constant relative to the center frequency (constant Q-factor). Unlike STFT’s fixed window size, CQT’s window size varies—longer for low frequencies, shorter for high ones.</p>



<h4 class="wp-block-heading">Why Use It?</h4>



<ul class="wp-block-list">
<li><strong>Musical Advantage:</strong> Its logarithmic scale matches the intervals of musical notes (e.g., octaves), making it perfect for pitch-related tasks like chord recognition or music transcription.</li>



<li><strong>Better Resolution:</strong> It captures low-frequency details (e.g., bass notes) better than STFT.</li>
</ul>



<h4 class="wp-block-heading">Example in Python</h4>



<p>The following example was implemented following the code used before.</p>



<pre class="wp-block-code"><code>```python
# Compute CQT
C = librosa.cqt(y, sr=sr)
C_db = librosa.amplitude_to_db(abs(C), ref=np.max)

# Plot
plt.figure(figsize=(14, 5))
librosa.display.specshow(C_db, sr=sr, x_axis='time', y_axis='cqt_note')
plt.colorbar(format='%+2.0f dB')
plt.title('Violin Constant-Q Transform')
plt.show()
```</code></pre>



<p><strong>y_axis=&#8217;cqt_note&#8217;:</strong> Labels the y-axis with musical notes (e.g., C4, D4), emphasizing its musical focus.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="452" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT-1024x452.png" alt="Constant-Q Transform (CQT) for advanced spectral representations in audio analysis." class="wp-image-4554" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT-1024x452.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT-600x265.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT-300x133.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT-768x339.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinCQT.png 1064w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em><strong>Figure 2:</strong> Example Constant-Q Transform generated from an audio file using the code above. The x-axis represents time, and the y-axis shows musical notation, giving a visual representation of the sound’s intensity over time.</em></figcaption></figure>



<h3 class="wp-block-heading">Harmonic-CQT (HCQT)</h3>



<h4 class="wp-block-heading">What Is It?</h4>



<p>The Harmonic Constant-Q Transform (HCQT) extends CQT by analysing harmonic structures. It computes CQTs at multiple harmonic multiples (e.g., fundamental frequency and its overtones) and stacks them into a 3D representation.</p>



<h4 class="wp-block-heading">Why Use It?</h4>



<ul class="wp-block-list">
<li><strong>Pitch-Related Applications:</strong> HCQT excels at separating harmonic content (e.g., a piano’s notes) from noise or percussive elements, ideal for pitch detection or source separation.</li>



<li><strong>Research Edge:</strong> It’s advanced and less common, showcasing cutting-edge techniques.</li>
</ul>



<h4 class="wp-block-heading">Note on Implementation</h4>



<p><code>`Librosa` </code>doesn’t directly provide HCQT, but you can approximate it by computing CQTs for harmonic multiples manually or use external libraries like <code>`nnAudio`</code>. Here are simplified examples using both libraries:</p>



<p>With <code>`Librosa`</code>:</p>



<pre class="wp-block-code"><code>```python
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np

# Load audio file
y, sr = librosa.load('/content/drive/My Drive/audio_files/sample.wav', sr=22050)  # Replace with your file path
hop_length = 512  # Number of samples between successive frames
harmonics = &#91;1, 2, 3]  # Harmonics to analyze (fundamental + overtones)

# Compute HCQT for the fundamental (h=1)
fmin = librosa.note_to_hz('C1') * 1  # Convert note C1 to Hz (~32.7 Hz)
n_bins = 60  # Total bins (5 octaves: 60/12 = 5)

# Check Nyquist limit (prevents aliasing)
nyquist_limit = fmin * (2 ** (n_bins / 12)) 
if nyquist_limit &lt; sr / 2:
    # Compute Constant-Q Transform
    cqt = librosa.cqt(y, sr=sr, hop_length=hop_length, 
                     fmin=fmin, n_bins=n_bins, bins_per_octave=12)
else:
    raise ValueError("Nyquist limit exceeded! Adjust parameters.")

# Convert CQT magnitude to decibels (normalized to max amplitude)
cqt_db = librosa.amplitude_to_db(np.abs(cqt), ref=np.max)

# Generate CQT frequency axis (logarithmic scale)
frequencies = librosa.cqt_frequencies(n_bins=n_bins, fmin=fmin, bins_per_octave=12)

# Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(cqt_db, sr=sr, hop_length=hop_length,
                        y_axis='cqt_hz', x_axis='time',  # Log-frequency axis
                        fmin=fmin, bins_per_octave=12, 
                        vmin=-80, vmax=0)  # dB range and optional colourmap add , cmap='viridis'
plt.colorbar(format='%+2.0f dB', label='Amplitude (dB)')
plt.ylim(frequencies&#91;0], frequencies&#91;-1])  # Set frequency axis limits
plt.title('Violin Harmonic-CQT (Fundamental) - Librosa')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.show()
```</code></pre>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary><strong>Limitations:</strong></summary>
<p>&nbsp; &#8211; Tedious manual setup.</p>



<p>&nbsp; &#8211; No native harmonic stacking.</p>



<p>&nbsp; &#8211; Limited to CPU computation.</p>
</details>



<p><em>For efficient HCQT computation, we use `nnAudio`, a PyTorch-based library that leverages GPU acceleration. First, install it:</em></p>



<pre class="wp-block-code"><code>```python
pip install nnAudio
```</code></pre>



<p>Then, run the following code:</p>



<pre class="wp-block-code"><code>```python
import torch
from nnAudio.features.cqt import CQT
import matplotlib.pyplot as plt

# Parameters
sr = 22050  # Sample rate
hop_length = 512  # Hop size
n_bins = 60  # Number of frequency bins (reduced to avoid Nyquist issues)
fmin = 32.7  # Minimum frequency (C1 in Hz)
harmonics = &#91;1, 2, 3]  # Harmonics to compute

# Load audio (using librosa)
y, _ = librosa.load("/content/drive/My Drive/audio_files/sample.wav", sr=sr)

# Convert to PyTorch tensor
y_tensor = torch.tensor(y).float()

# Compute HCQT for each harmonic
hcqt = &#91;]
for h in harmonics:
    cqt = CQT(sr=sr, hop_length=hop_length, n_bins=n_bins,
              fmin=fmin * h, bins_per_octave=12, output_format='Magnitude')
    cqt_output = cqt(y_tensor)  # Shape: (1, n_bins, time)
    cqt_db = 20 * torch.log10(torch.clamp(cqt_output, min=1e-5))  # Avoid log(0)
    hcqt.append(cqt_db)

# Plot the fundamental harmonic
if hcqt:
    plt.figure(figsize=(14, 5))
    plt.imshow(hcqt&#91;0].squeeze().numpy(), aspect='auto', origin='lower', cmap='viridis', vmin=-80, vmax=0, interpolation='bilinear')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Violin Harmonic-CQT (Fundamental) - nnAudio')
    plt.xlabel('Time')
    plt.ylabel('Frequency (bins)')
    plt.show()
```</code></pre>



<details class="wp-block-details is-layout-flow wp-block-details-is-layout-flow"><summary><strong>Advantages:</strong></summary>
<p>&nbsp; &#8211; GPU Acceleration: Faster computation for large datasets.</p>



<p>&nbsp; &#8211; Native Harmonic Support: Streamlined parameter setup.</p>



<p>&nbsp; &#8211; PyTorch Integration: Direct compatibility with deep learning pipelines.</p>
</details>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="449" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1-1024x449.png" alt="Violin Harmonic-CQT (Fundamental) computed using Librosa, showing frequency and amplitude variations over time." class="wp-image-4550" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1-1024x449.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1-600x263.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1-300x132.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1-768x337.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison1.png 1072w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="453" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2-1024x453.png" alt="Violin Harmonic-CQT (Fundamental) computed using nnAudio, showing frequency and amplitude variations over time." class="wp-image-4551" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2-1024x453.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2-600x265.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2-300x133.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2-768x340.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/harmonicCQTComparison2.png 1063w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><strong>Figure 3: </strong>HCQT computed with librosa (top) vs. nnAudio (bottom). The nnAudio implementation offers cleaner harmonic separation due to GPU-optimized computation.<br><em>The axis are labelled different but basic programming configurations to plot are the same.</em></figcaption></figure>



<h2 class="wp-block-heading">What Do These Representations Tell Us?</h2>



<ul class="wp-block-list">
<li><strong>Mel/Log-Mel:</strong> Highlights perceptually significant frequencies (e.g., speech formants or musical timbre).</li>



<li><strong>CQT:</strong> Reveals musical structure (e.g., note transitions in a melody).</li>



<li><strong>HCQT:</strong> Isolates harmonic patterns (e.g., a chord’s overtones), distinguishing pitched sounds from noise.</li>
</ul>



<p>These features are more targeted than raw STFT spectrograms, making them powerful inputs for machine learning models.</p>



<h2 class="wp-block-heading">Reflection</h2>



<p>Exploring these spectral representations has been a transformative experience for me. Initially, I relied heavily on STFT, but discovering Mel Spectrograms showed me how aligning analysis with human perception could significantly boost classification accuracy—something I’m currently testing with various audio datasets. Implementing CQT was a revelation for its musical precision, though working with HCQT pushed my coding skills to the limit. I spent hours digging into research papers and experimenting with harmonic stacking to get it right. These challenges have deepened my understanding of audio feature extraction and increased my excitement for applying these techniques to machine learning models.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Spectral representations like Mel Spectrograms, CQT, and HCQT take us beyond basic spectrograms, offering perceptually and musically relevant features for audio analysis.</p>



<p>In this post, we’ve explored&nbsp;advanced spectral representations for audio analysis, including Mel Spectrograms, CQT, and HCQT, and seen how they can be used for audio analysis and feature extraction. These tools take us beyond waveforms and basic spectrograms, offering perceptually and musically relevant features that are essential for machine learning tasks.</p>



<h2 class="wp-block-heading">Additional Resources</h2>



<ul class="wp-block-list">
<li>Librosa Documentation:&nbsp;<a href="https://librosa.org/doc/">librosa.org/doc</a></li>



<li>nnAudio:&nbsp;<a href="https://kinwaicheuk.github.io/nnAudio/v0.2.0/index.html">nnAudio 0.2.0</a></li>



<li><a href="https://geoffroypeeters.github.io/deeplearning-101-audiomir_book">Deep Learning 101 for Audio-based MIR, ISMIR 2024 Tutorial</a>&nbsp;by Geoffroy Peeters et al. (2024).</li>



<li>Z. Rafii, &#8220;The Constant-Q Harmonic Coefficients: A timbre feature designed for music signals [Lecture Notes],&#8221; in IEEE Signal Processing Magazine, vol. 39, no. 3, pp. 90-96, May 2022, doi: 10.1109/MSP.2021.3138870. keywords: {Cepstral analysis;Instruments;Transforms;Speech recognition;Power system harmonics;Harmonic analysis;Feature extraction},</li>



<li>K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, &#8220;nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks,&#8221; in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.</li>
</ul>
<p>The post <a href="https://www.jhonatanlopez.com/advanced-spectral-representations-audio-analysis/">AllYouNeedIsSound 3: Spectral Representations and Feature Extraction</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>AllYouNeedIsSound 2: From Waveforms to Spectral Representations</title>
		<link>https://www.jhonatanlopez.com/spectral-analysis/</link>
		
		<dc:creator><![CDATA[Jhonatan López]]></dc:creator>
		<pubDate>Mon, 17 Mar 2025 16:48:23 +0000</pubDate>
				<category><![CDATA[DSP]]></category>
		<guid isPermaLink="false">https://www.jhonatanlopez.com/?p=4586</guid>

					<description><![CDATA[<p>In my last post, I showed how to load and visualize audio waveforms using Python. Now, let’s dive deeper into&#160;spectral analysis with Python, a powerful technique for understanding the frequency content of audio signals.&#160;By using this approach, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music [&#8230;]</p>
<p>The post <a href="https://www.jhonatanlopez.com/spectral-analysis/">AllYouNeedIsSound 2: From Waveforms to Spectral Representations</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-full"><img loading="lazy" decoding="async" width="1024" height="1024" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2.webp" alt="An abstract digital sound spectrum with smooth, flowing waves in gray and blue tones, featuring a clean and stylized equalizer on a white background." class="wp-image-4617" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2.webp 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2-300x300.webp 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2-100x100.webp 100w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2-600x600.webp 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2-150x150.webp 150w, https://www.jhonatanlopez.com/wp-content/uploads/2025/03/allyouneedissound2-768x768.webp 768w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">A modern visualization of digital sound frequencies created with DALL·E.</figcaption></figure>



<p>In my last post, I showed <a href="https://www.jhonatanlopez.com/audio-digital-analysis-python/">how to load and visualize audio waveforms using Python</a>. Now, let’s dive deeper into&nbsp;spectral analysis with Python, a powerful technique for understanding the frequency content of audio signals.&nbsp;By using this approach, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis.</p>



<h2 class="wp-block-heading">What is Spectral Analysis?</h2>



<p>Spectral analysis helps us break down an audio signal into its individual frequencies, making it easier to understand its components.&nbsp;For example, while a waveform shows amplitude over time, spectral analysis reveals the frequency components hidden within the sound.</p>



<h3 class="wp-block-heading">Why is Spectral Analysis Important?</h3>



<p>Frequencies are the building blocks of sound.&nbsp;Therefore, analysing them allows us to distinguish between different types of audio, such as a guitar note versus a drum beat.&nbsp;Additionally, this technique is crucial for tasks like music genre classification and speech recognition.</p>



<h2 class="wp-block-heading">Key Concepts</h2>



<h3 class="wp-block-heading">Spectrogram</h3>



<p>A spectrogram is a visual representation of how the frequencies in an audio signal change over time. It’s like a &#8220;heatmap&#8221; of sound, where:</p>



<ul class="wp-block-list">
<li>The x-axis represents time.</li>



<li>The y-axis represents frequency.</li>



<li>The colour intensity represents amplitude (e.g., brighter colours mean louder frequencies).</li>
</ul>



<h3 class="wp-block-heading">Short-Time Fourier Transform (STFT)</h3>



<p>The Short-Time Fourier Transform (STFT) is a mathematical tool used to create spectrograms. Unlike the standard Fourier Transform, which analyses the entire signal at once, the STFT breaks the audio into short, overlapping segments and applies the Fourier Transform to each segment. This allows us to see how frequencies evolve over time, making it ideal for analysing real-world audio, which is rarely steady like a pure tone.</p>



<h3 class="wp-block-heading">A Teaser for Future Posts</h3>



<p>While STFT-based spectrograms are powerful, they’re just the beginning. In future posts, we’ll explore advanced features like Mel spectrograms and MFCCs (Mel-Frequency Cepstral Coefficients), which are widely used in machine learning for audio classification.</p>



<h2 class="wp-block-heading">Practical Example: Computing and Visualizing a Spectrogram with Python</h2>



<p>Let’s put theory into practice.&nbsp;First, we’ll load an audio file using Librosa.&nbsp;Then, we’ll compute the STFT and visualize the spectrogram.&nbsp;Finally, we’ll interpret the results to understand the audio’s frequency content. Here’s a step-by-step guide:</p>



<h3 class="wp-block-heading">0 Mount Google Drive</h3>



<p>You can omit this step if you following my previous post.</p>



<pre class="wp-block-code"><code>```python
from google.colab import drive
drive.mount('/content/drive')
```</code></pre>



<h3 class="wp-block-heading">1 Load the Audio File</h3>



<pre class="wp-block-code"><code>```python
import librosa
import librosa.display
import numpy as np &nbsp;# Import numpy as np
import matplotlib.pyplot as plt

# Load an audio file
y, sr = librosa.load('/content/drive/path/to/your/audio.wav')
```</code></pre>



<h3 class="wp-block-heading">2 Compute the STFT and Convert to Decibels</h3>



<pre class="wp-block-code"><code>```python
# Compute the STFT and convert to decibels
S = librosa.feature.melspectrogram(y=y, sr=sr, n_mels=128)
S_db = librosa.power_to_db(S, ref=np.max) &nbsp;# Log-Mel Spectrogram
```</code></pre>



<h3 class="wp-block-heading">3 Plot the Spectrogram</h3>



<pre class="wp-block-code"><code>```python
# Plot the spectrogram
plt.figure(figsize=(14, 5))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='mel')
plt.colorbar(format='%+2.0f dB')
plt.title('Song Log-Mel Spectrogram')
plt.show()
```</code></pre>



<p>Here’s what the output looks like after running the code with a sample audio file:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="445" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram-1024x445.png" alt="Log-Mel Spectrogram generated with Python for spectral analysis." class="wp-image-4552" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram-1024x445.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram-600x261.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram-300x130.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram-768x334.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/logMelSpectrogram.png 1081w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><strong><em>Figure 1:</em></strong> <em>Example Log-Mel Spectrogram generated from an audio file using the code above. The x-axis represents time, and the y-axis shows frequency, giving a visual representation of the sound’s intensity over time.</em></figcaption></figure>



<h3 class="wp-block-heading">Understanding the Output</h3>



<ul class="wp-block-list">
<li><strong>Time (x-axis):</strong> Shows how the audio evolves over time.</li>



<li><strong>Frequency (y-axis):</strong> Shows the range of frequencies present in the audio.</li>



<li><strong>Colour Intensity:</strong> Represents amplitude (louder frequencies appear brighter).</li>
</ul>



<p>For example:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="445" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-1024x445.png" alt="Violin frequency analysis using Python and spectrograms" class="wp-image-4555" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-1024x445.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-600x261.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-300x130.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram-768x334.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/violinLogMelSpectrogram.png 1081w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em><strong>Figure 2:</strong> Example Log-Mel Spectrogram generated from an audio file using the code above on a sustained violin note. It appear as a horizontal line at a specific frequency.</em></figcaption></figure>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="445" src="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram-1024x445.png" alt="Drum hit visualization with Python and spectral analysis" class="wp-image-4553" srcset="https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram-1024x445.png 1024w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram-600x261.png 600w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram-300x130.png 300w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram-768x334.png 768w, https://www.jhonatanlopez.com/wp-content/uploads/2025/02/snareLogMelSpectrogram.png 1081w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption"><em><strong>Figure 3:</strong> Example Log-Mel Spectrogram generated from an audio file using the code above on a drum hit. It appear as a vertical spike across multiple frequencies.</em></figcaption></figure>



<h2 class="wp-block-heading">What Does the Spectrogram Tell Us?</h2>



<p>Spectrograms provide a wealth of information that waveforms cannot. For instance:</p>



<ul class="wp-block-list">
<li><strong>Horizontal Lines:</strong> Indicate sustained tones, such as a violin note or a humming sound.</li>



<li><strong>Vertical Spikes:</strong> Represent short, sharp sounds, like a drum hit or a clap.</li>



<li><strong>Patterns:</strong> Repeated patterns in the spectrogram might correspond to musical rhythms or speech phonemes.</li>
</ul>



<p>These features provide valuable insights into the structure and content of audio signals, making spectrograms invaluable for tasks like sound classification, speech recognition, and music analysis.</p>



<h2 class="wp-block-heading">Reflection</h2>



<p>Learning spectral analysis has been a transformative experience for me. It opened my eyes to the complexity of audio signals and deepened my appreciation for the mathematical tools that make audio processing possible. One of the challenges I faced was understanding how to choose the right window size for the STFT. Too short, and the frequency resolution suffers; too long, and the time resolution becomes blurry. Through experimentation and research, I learned to balance these trade-offs.</p>



<p>This journey has reinforced my belief that spectral analysis is not just a technical skill but a gateway to understanding the rich, hidden world of sound. As I continue to explore advanced techniques like CQT and HCQT, I’m excited to share my discoveries and challenges in future posts.</p>



<h2 class="wp-block-heading">Conclusion</h2>



<p>Spectral analysis is a powerful tool for unlocking the frequency content of audio signals. Additionally, by moving beyond waveforms and exploring spectrograms, we can uncover patterns and features that are essential for tasks like sound classification, speech recognition, and music analysis. In this post, we’ve covered the basics of spectral analysis, introduced the Short-Time Fourier Transform (STFT), and demonstrated how to compute and visualize spectrograms using Python.</p>



<h2 class="wp-block-heading">Additional Resources</h2>



<ul class="wp-block-list">
<li><a href="https://librosa.org/doc/latest/index.html">Librosa Documentation:</a> A comprehensive guide to the Librosa library.</li>



<li><a href="https://colab.research.google.com/">Google Colab:</a> A free, cloud-based environment for running Python code.</li>



<li><a href="https://freesound.org/">Freesound.org:</a> A repository of free audio samples for experimentation.</li>



<li><a href="https://geoffroypeeters.github.io/deeplearning-101-audiomir_book">Deep Learning 101 for Audio-based MIR, ISMIR 2024 Tutorial</a> by Geoffroy Peeters et al. (2024).</li>



<li>Kinsler, L. E., Frey, A. R., Coppens, A. B., &amp; Sanders, J. V. (2000). Fundamentos de Acústica (4ª ed.). Wiley.</li>
</ul>
<p>The post <a href="https://www.jhonatanlopez.com/spectral-analysis/">AllYouNeedIsSound 2: From Waveforms to Spectral Representations</a> appeared first on <a href="https://www.jhonatanlopez.com">Jhonatan López</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
