Rade Kutil - VO+PS Audio Processing (SS20)

Documents for the VO

script

list of questions

PS-Exercises

Here is some guitar sound to use as test input, and also some speech sound.

Implement the bandpass filter with configurable $f_c$ and $f_d$.
Implement a three-way equalizer by first splitting the input signal with a low- and a high-pass filter with the same cut-off frequency, and then splitting the high-pass signal again in the same way. Multiply each channel by some (maybe time-varying) factor and add them back together. Check (and maybe proof) whether the input signal would be unchanged if the factors are all equal to 1.
Implement a phaser with only one allpass. Modulate $f_c$ with a low-frequency oscillator.
Extend the phaser to four allpasses with separate parameters for each allpass. Modulate the $f_c$-parameters independently with non-harmonic low-frequencies. Also try to implement the feedback loop. The result should sound like this.
Implement a 4-fold Wah-Wah effect. Modulate $f_c$ with a low-frequency oscillator, and calculate $f_d$ in a constant-q manner. The result should sound like this.
Generate a 5 second sine tone and resample it to an almost similar sampling rate using linear, Lanczos and allpass-interpolation. See if you can hear any difference for high frequencies. (Test with a low sampling frequency.)
Implement a stereo rotary speaker effect. The result should sound like this.
Implement a primitive vocoder based on the Hilbert transform: Read two sounds, e.g. guit3.wav and fox.wav. Transform both by a truncated Hilbert transform. Calculate the instantaneous amplitude of both (i.e. $x^2+y^2$, where $x$ is the original signal, and $y$ is the Hilbert transform). Then substitute the amplitude of guit3 by the amplitude of fox (divide by the one and multiply by the other). The result should sound like this (not great, but the idea counts …).
Implement a compressor that uses a squarer as detector and limits the level at -30dB. To test it, read guit3.wav and fox.wav, and mix them together with guit3.wav divided by 10 (radio host situation). Experiment with attack- and release-time parameters. The result should sound like this.

Some caveats: (1) The output of the squarer is converted to dB by 10*log₁₀ because it is squared, otherwise it is 20*log₁₀. (2) For the second averager, the role of attack and release are reversed.
Implement the distortion transforms from the lecture notes (hard clipping, soft clipping, distortion), and test them on the guitar sound.
Implement an octaver and apply it on fox.wav. To correctly find positive zero-crossings, use a negative amplitude follower $x_n$: If the signal is less than $x_n$ (negative peak reached), set it to the signal, else multiply it by 0.999. Also use two state variables, $r$ (negative peak reached), and $s$ (sound on). $r$ is set when the signal gets less than $x_n$, and unset when it is set and the signal becomes positive. In the latter case, also flip $s$ (on to off or off to on). Pass the signal to the output only when it is positive and $s$ is set. The result should sound like this.
Implement the vocoder effect (mutation, morphing) based on STFT. You can use the following files: stft.m, istft.m.
Implement time-stretching based on STFT. Use fox.wav.
Implement pitch-shifting directly (keeping the hop-size the same). For a pitch-factor $k$ (range 0.5 to 2.0), multiply $\Delta\varphi$ with $k$, and also move each coefficient up (or down) in the frequency bins to position $k w$ (rounded). Bonus: Amplitude interpolation, avoid holes in the array.
Implement an oscillator according to the digital resonator. Control the frequency with an LFO (low frequency oscillator). For large and fast frequency variations there should be audible amplitude variations. Now determine the amplitude by a squarer-detector and an averager (equal attack and release). Correct the amplitude by dividing $x[t]$ and $x[t-1]$ by $(a-\bar{a})/10+1$, where $a$ is the detected amplitude and $\bar{a}$ is the desired expected amplitude.
No MatLab this time. For a signal similar to $$x=(\ldots,0,0,1,2,1,0,-1,-2,-1,0,0,\ldots)\, ,$$ calculate the optimal linear prediction coefficients with the Levinson-Durbin algorithm by hand. No window function is used, i.e. it is constant 1. Calculate also the predictions. You will get a link via email to a web page with an individual signal, where you have to enter all the calculated values.
Implement the vocoder effect (mutation, morphing) with LPC. For blocks of, say, 1024 samples, calculate the first $m$ autocorrelation values. From that, form the Toeplitz matrix, and calculate $p$ with normal MatLab equation solving. Do that for both signals, calculate the prediction error signal of one signal and apply $p$ of the other signal recursively. For each block, the $p$s have to be recalculated.
Implement a pitch detector based on autocorrelation. For each block of a signal, calculate the autocorrelation as $\operatorname{ifft}(|\operatorname{fft}(\operatorname{zeropad}(x))|^2)$, where $\operatorname{zeropad}$ extends the block with zeros to twice the size. (Only the first half of the result is used.) Then, find the first positive zero crossing of the autocorrelation. From there to the end, find the positive maximum. Finally, find the leftmost peak to the right of the first positive zero crossing that is higher than 80% of the maximum. From the position of the peak, calculate the frequency. For the input signal, start with a single sin function with linearly increasing frequency, then also include up to 7 harmonics with arbitrary phases and amplitudes. Maybe also add noise. Compare the detected pitches to the correct ones in a plot.
Convolve the input signal with two different white-noise signals (length about 4000 samples) to get a decorrelated stereo output. Test it with our test signals and also white noise as input signal. Play the output followed by a convolved but non-decorrelated (use the same white-noise for left and right) signal to hear the difference.
Implement Moorer's reverberator for a one-dimensional room of 4m length. The sound source is at 1m from one end, and the listener is in the middle of the room. Add the direct sound, two early reflections (one from each end), and one comb filter with a delay according to the mode of the room (there is only one) and no low-pass filtering. Also, no all-pass filter. The speed of sound is 343m/s.