A Recurrent Variational Autoencoder for Speech Enhancement
This repository contains the implementation of the speech enhancement method proposed in:
S. Leglaive, X. Alameda-Pineda, L. Girin, R. Horaud, A Recurrent Variational Autoencoder for Speech Enhancement, in Proc. of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.
It also provides scripts to create the QUT-WSJ0 dataset used in the experiments. You will need a copy of the WSJO and QUT-NOISE datasets (see references in the paper).
Audio examples are available here.
If you use this code, please cite the above-mentioned paper (Bibtex).
Repository structure
.
├── audio
│ ├── mix_qut_wsj0.wav
│ ├── mix_sunrise.wav
│ └── mix_thierry_roland.wav
├── environment.yml
├── LICENSE.txt
├── main.py
├── QUT_WSJ0_scripts
│ ├── create_QUT_WSJ0_json.py
│ ├── create_QUT_WSJ0_mixtures.py
│ ├── pyloudnorm
│ ├── QUT_WSJ0_test_dataset.json
│ ├── QUT_WSJ0_train_dataset.json
│ ├── QUT_WSJ0_val_dataset.json
│ └── resample_QUT.py
├── README.md
├── saved_model
│ ├── WSJ0_2019-07-15-10h01_RVAE_BRNNenc_BRNNdec_latent_dim=16
│ │ ├── final_model_RVAE_epoch145.pt
│ │ ├── loss.pdf
│ │ ├── loss_RVAE.pckl
│ │ ├── parameters.pckl
│ │ └── parameters.txt
│ ├── WSJ0_2019-07-15-10h14_RVAE_RNNenc_RNNdec_latent_dim=16
│ │ ├── final_model_RVAE_epoch121.pt
│ │ ├── loss.pdf
│ │ ├── loss_RVAE.pckl
│ │ ├── parameters.pckl
│ │ └── parameters.txt
│ └── WSJ0_2019-07-15-10h21_FFNN_VAE_latent_dim=16
│ ├── final_model_RVAE_epoch65.pt
│ ├── loss.pdf
│ ├── loss_RVAE.pckl
│ ├── parameters.pckl
│ └── parameters.txt
├── SE_algorithms.py
└── training
├── speech_dataset.py
├── train_BRNN_WSJ0.py
├── train_FFNN_WSJ0.py
├── train_RNN_WSJ0.py
└── VAEs.py
11 directories, 66 files
Python files
-
main.py
: Main script to run the speech enhancement algorithms. If you just want to test the method quickly, run this script. Input and output audio files are located in theaudio
folder. -
SE_algorithms.py
: Implementation of the speech enhancement algorithms (MCEM, PEEM, VEM). -
./training/speech_dataset.py
: Custom Pytorch dataset for training. -
./training/VAEs.py
: Pytorch implementation of the FFNN, RNN and BRNN variational autoencoders (VAEs). -
./training/train_FFNN_WSJ0.py
: Script to train the FFNN VAE. -
./training/train_RNN_WSJ0.py
: Script to train the RNN VAE. -
./training/train_BRNN_WSJ0.py
: Script to train the BRNN VAE. -
./QUT_WSJ0_scripts/resample_QUT.py
: Script to resample the QUT-NOISE dataset at 16 kHz. This is necessary before actually creating the QUT-WSJ0 dataset. -
./QUT_WSJ0_scripts/create_QUT_WSJ0_json.py
: Script to create the json files for the training/val/test sets of the QUT-WSJ0 dataset. -
./QUT_WSJ0_scripts/create_QUT_WSJ0_mixtures.py
: Script to create the mixture wav files for the training/val/test sets of the QUT-WSJ0 dataset.
Conda environment
environment.yml
describes the conda environment that was used for the experiments.
License
GNU Affero General Public License (version 3), see LICENSE.txt
.