Rade Kutil - VO+PS Audio Processing (SS26)

Documents for the VO

PS-Exercises

Here is some guitar sound to use as test input, and also some speech sound.

Look at this demo program for how to program the exercises in Python. Send the solutions via the upload page (you will get a personal link via email) before Tuesday 22:00. Please do not send Jupyter notebooks or zip-files, just send .py-scripts.

Implement the bandpass filter with configurable $f_c$ and $f_d$.
Implement a three-way equalizer by first splitting the input signal with a low- and a high-pass filter with the same cut-off frequency, and then splitting the high-pass signal again in the same way. Multiply each channel by some (maybe time-varying) factor and add them back together. Check (and maybe proof) whether the input signal would be unchanged if the factors are all equal to 1.
Implement a phaser with only one allpass. Modulate $f_c$ with a low-frequency oscillator.
Extend the phaser to four parallel allpasses (instead of sequential as in the lecture). This means, the input to all allpasses should be $ph_2$, and their output must be averaged (summed and divided by 4). There are separate parameters for each allpass. Modulate the $f_c$-parameters independently with non-harmonic low-frequencies. Also, implement the feedback loop. The result should sound like this.
Implement an $m$-fold Wah-Wah effect with increasing $m$. Set $f_c$ to 3500Hz and use $q=0.5$. Start with $m=1$, and increase $m$ by 1 every 0.5 seconds, The result should sound like this.
Generate a 3 second rectangular wave (e.g. 50 times 1.0, 50 times -1.0, and repeat) and resample it (from e.g. fs=20000Hz) to an almost similar sampling rate (e.g. 20002Hz) using linear, Lanczos and allpass-interpolation. See if you can hear any amplitude fluctuations.
Implement a chorus effect, i.e. add 3 copies of the input sound, each delayed between 0 and 100 samples, modulated by LFOs with non-harmonic frequencies between 1 and 2 Hz. The result should sound like this.
Implement single-sideband modulation. By using the Hilbert transform (do NOT use the Fourier transform to calculate it), modulate the input sound by a sinusoid with increasing frequency, e.g. $ \cos(20\cdot 2\pi t^2)$ ($t$ in seconds). The result should sound like this.
Implement a noise gate/expander that uses a squarer as detector and reduces levels below -25dB by 1 dB per dB. (Hints: The maximum level is 0dB. Above -25dB, r=0dB (no change), below -25dB, r is linear; r(-25)=0 and r(-35)=-10, and so on.) To test it, read guit3.wav and fox.wav, and mix them together with guit3.wav divided by 10. Choose pretty short attack- and release-time parameters. The result should sound like this.

Caveat: The output of the squarer is converted to dB by 10*log₁₀ because it is squared, otherwise it is 20*log₁₀.
The distortion function $g(x)$ should be designed so that it is $+1$ for $x\ge 1$, $-1$ for $x\le-1$, and $g(x)=ax+bx^3$ for $-1\le x\le 1$, where $a, b$ are chosen so that $g(1)=1$ and $g'(1)=0$. Create then a harmonic signal of 3 seconds with a fundamental frequency of $f=163\text{Hz}$ and 7 harmonics, where the amplitude of the $k$-th harmonic (at frequency $(k+1)f)$ is $1/\sqrt{k+1}$. Use a sampling frequency of 3000Hz. Normalize the signal to $\pm 1$, then multiply it with an increasing gain from 0.5 at the beginning to 3.0 at the end. Feed this into the distortion function. But before distortion, upsample by a factor of 3 (you can use scipy.signal.resample_poly), and, after distortion, downsample again ($y_u$). Also, compare it to applying the distortion without up/downsampling ($y_r$). Concatenate $x, y_u, y_r$ for easier comparison.
Implement an octaver and apply it on fox.wav. First, produce a low-pass filtered signal $l$ (first order, cut-off 50Hz). Then, when $l$ has a positive zero-crossing, change a sign $s$ ($-1 \longleftrightarrow +1$). Finally, mix $3\cdot l\cdot s$ into the source signal. The result should sound like this.
Implement denoising based on STFT. The signal to denoise is fox.wav, the noise signal is guit3.wav, shortened to the length of fox.wav and multiplied by $0.1$. The denoising coefficients $c_w$ should be learned from the noise signal as the average of the absolute value of the coefficients in bin $w$ over all frames, multiplied by $2$. Concatenate the noisy and denoised signals for comparison. The result should sound like this.

The STFT (forward and inverse) is available in the scipy-library:

…
from scipy import signal
…
frameSize = 512
hopSize = frameSize / 4
_, _, X = signal.stft (x, fs, window='hann', nperseg=frameSize, noverlap=frameSize-hopSize)
…
_, y = signal.istft (X, fs, window='hann', nperseg=frameSize, noverlap=frameSize-hopSize)
Implement time-stretching based on STFT. Use fox.wav.
Implement first an oscillator according to the digital resonator. Control the frequency $f$ with an LFO (low frequency oscillator at 13Hz) between $0.01 f_s$ and $0.49 f_s$. Use $f_s = 5000$. There should be audible amplitude variations. Next, implement a complex oscillator, where a complex value (initially $=1$) is iteratively multiplied by $\exp(2\pi f)$. Use the real part as the output signal. There should be no amplitude variations.
Produce a chirp signal of 3 seconds, starting with 400Hz and ending with 800Hz via inverse Fourier transform (istft). First, generate the $C(v)$-function as lookup array: Take the Hann-window (scipy.signal.windows.hann) with frame-length $n$, zero-pad it (symmetrically) to 4 times the size, calculate the FFT (numpy.fft.rfft), and take the real part (the imaginary part should be $\approx 0$). Then, for each frame, fill 9 coefficients in the column of a STFT array around $nf$. Hint: $f=(400+400 t / m) / f_s $, where $t$ is the frame index, $m$ is the number of frames, and $f_s$ is the sampling rate. For the $C(v)$ lookup, you have to interpolate $C[\lfloor 4v\rfloor]$ and $C[\lceil 4v\rceil]$

Beware: Both, fft and istft assume the origin at the beginning of the array/frame, resulting in coefficients with odd indices having opposite signs. Therefore, multiply $C$ with $(-1)^{0,1,2,\ldots}$, and $X[t,w]$ with $(-1)^w$. Also, $C$ contains only the positive halve; so when looking up $C(v)$ for $v<0$, just use $C(|v|)$.
No Python this time. For a signal similar to $$x=(\ldots,0,0,1,2,1,0,-1,-2,-1,0,0,\ldots)\, ,$$ calculate the optimal linear prediction coefficients with the Levinson-Durbin algorithm by hand. No window function is used, i.e. it is constant 1. Calculate also the predictions. You can use the same link as for the Python uploads. You will be presented with an individual signal, where you have to enter all the calculated values. Click on the [1] to get it started. Press [Speichern] (Save) to see whether your input is correct. You can change values and continue later until the deadline. Also, click [Hilfe] (Help) at the top right corner to get info about entering formulas and reusing values as variables by hovering to the left of the input fields.
Implement formant changing with LPC. For blocks of, say, 1024 samples, calculate the first $m=50$ autocorrelation values $r_{xx}$. From that, solve the Toeplitz matrix system to calculate $p$. Calculate the prediction error signal. Choose a new $m_y=\left[\alpha m\right]$. Interpolate autocorrelation values by ryy = np.interp (np.linspace (0, len(rxx) - 1, my + 1), np.arange(len(rxx)), rxx). Solve also this Toeplitz matrix. Apply the new $p_y$ to the above prediction error signal recursively. For each block, the $p$s have to be recalculated.

Some hints:
Let the first block start at $\max(m,m_y)$ and the last at len(x)-blocksize latest.
Consider: scipy.signal.correlate
Consider: scipy.linalg.solve_toeplitz(...)
Consider: np.inner (x[t-1:t-m-1:-1], p)
Implement a pitch detector based on the cepstrum. Calculate the STFT $X$ (framesize n=4096). Calculate $c=\operatorname{irfft}(\log(|X|+0.001))$ for each frame. Set $c[0]=0$. Find the first negative value, and set all values up to this position to zero (thus eliminating the 0-lag peak). Use only c[0:n//2]. Derive the frequency from the position of the largest peak. For the input signal, start with a single sin function with linearly increasing frequency from 50Hz to 2000Hz (hint: sin(np.pi*a*t**2) has frequency a*t at time t), then also include up to 7 harmonics with arbitrary phases and amplitudes. Maybe also add noise. Compare the detected pitches to the correct ones in a plot.
Implement inter-aural differences for a sound source rotating around the head. Calculate an angle $\alpha$ rotating at 0.5Hz. ITD and IID are based on $s=\sin(\alpha)$. For ITD, if $s\ge 0$, set the delay for the right channel to $s\cdot d$, where $d$ is the maximum delay (head size 18cm divided by speed of sound 343m/s), and the delay of the left channel to 0. If $s<0$, the left channel should be delayed in the same way. For IID, first produce a filtered signal of the input signal with a high-pass filter with $f_c = 2\text{kHz}$. With the help of this, the right channel should be filtered by a high-pass shelving filter with $v=1-s$ if $s\ge 0$, and $v=1$ otherwise. The left channel accordingly for $s<0$. Both, IID and ITD are to be applied together. Test on fox.wav with headphones.
Convolve the input signal with two different white-noise signals (length about 4000 samples) to get a decorrelated stereo output. Test it with our test signals and also white noise as input signal. Play the output followed by a convolved but non-decorrelated (use the same white-noise for left and right) signal to hear the difference.
Implement Moorer's reverberator for a 2D room of 10m width and 15m depth. The sound source is at 2m from the left wall and 3m from the back wall, and the listener is 7m from the left and 6m from the back wall. Add the direct sound, three early reflections (one from left, right and back wall), and three IIR comb filters with a delay according to three modes of the room (n=(1,0),(0,1),(1,1)) and no low-pass filtering. Also, no all-pass filter. Choose the feedback of the comb filters at about 0.5 (also try 0.9), and mix their outputs to $y_1$ with the same factor. The speed of sound is 343m/s.