What conclusions can we draw from the distribution of classes in the training set?
---
%% Cell type:markdown id: tags:
We have only a few examples for `dog`, `music`, and `non-machinery-impact`. We can expect lower performance in the detection of these classes.
%% Cell type:markdown id: tags:
## Audio basics
We will use two libraries for loading and playing audio signals:
1.[Librosa](https://librosa.github.io/librosa/index.html) is a Python package for music and audio processing.
2.[IPython.display.Audio](https://ipython.org/ipython-doc/stable/api/generated/IPython.display.html#IPython.display.Audio) lets you play audio directly in notebooks.
%% Cell type:markdown id: tags:
### Reading audio
Use [`librosa.load`](https://librosa.github.io/librosa/generated/librosa.core.load.html#librosa.core.load) to load an audio file into an audio array. Return both the audio array as well as the sample rate:
Display the length of the audio array and sample rate:
%% Cell type:code id: tags:
``` python
print(x.shape)
print(sr)
```
%% Cell type:code id: tags:
``` python
importresampy
importvggish_params
old_sr=sr
sr=vggish_params.SAMPLE_RATE
x=resampy.resample(x,old_sr,sr)
```
%% Cell type:markdown id: tags:
### Visualizing Audio
%% Cell type:markdown id: tags:
In order to display plots inside the Jupyter notebook, run the following commands:
%% Cell type:code id: tags:
``` python
importmatplotlib.pyplotasplt
```
%% Cell type:code id: tags:
``` python
time_axis=np.arange(0,x.shape[0]/sr,1/sr)
plt.figure(figsize=(7,3))
plt.plot(time_axis,x)
plt.title('waveform')
plt.ylabel('amplitude')
plt.xlabel('time (s)')
```
%% Cell type:markdown id: tags:
### Playing Audio
%% Cell type:markdown id: tags:
Using [`IPython.display.Audio`](http://ipython.org/ipython-doc/2/api/generated/IPython.lib.display.html#IPython.lib.display.Audio), you can play an audio file:
%% Cell type:code id: tags:
``` python
importIPython.displayasipd
ipd.Audio(x,rate=sr)# load a local WAV file
```
%% Cell type:markdown id: tags:
### Writing Audio
%% Cell type:markdown id: tags:
[`librosa.output.write_wav`](https://librosa.github.io/librosa/generated/librosa.output.write_wav.html#librosa.output.write_wav) saves a NumPy array to a WAV file.
%% Cell type:code id: tags:
``` python
librosa.output.write_wav('example.wav',x,sr)
```
%% Cell type:markdown id: tags:
## Mel spectrogram
%% Cell type:markdown id: tags:
In this project, we will work with a time-frequency representation of audio signals called the Mel spectrogram. It is computed as follows:
%% Cell type:markdown id: tags:
#### Framing
The waveform is converted into into a sequence of successive overlapping frames.
%% Cell type:code id: tags:
``` python
# Define the parameters of the short-term analysis
np.arange(window_length_samples)))# "periodic" Hann
X_windowed_frames=X_frames*window[:,np.newaxis]
plt.figure()
plt.plot(window)
print(X_windowed_frames.shape)
plt.title('analysis window')
plt.xlabel('samples')
```
%% Cell type:markdown id: tags:
#### Discrete Fourier transform
%% Cell type:markdown id: tags:
The short-term Fourier transform (STFT) is computed by applying the discrete Fourier transform (DFT) on each windowed frame. The magnitude spectrogram is obtained by taking the modulus of the STFT matrix.