Skip to content
Snippets Groups Projects
user avatar
Simon authored
98a97d28
History

A Recurrent Variational Autoencoder for Speech Enhancement

This repository contains the implementation of the speech enhancement method proposed in:

S. Leglaive, X. Alameda-Pineda, L. Girin, R. Horaud, A Recurrent Variational Autoencoder for Speech Enhancement, in Proc. of the IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.

It also provides scripts to create the QUT-WSJ0 dataset used in the experiments. You will need a copy of the WSJO and QUT-NOISE datasets (see references in the paper).

Audio examples are available here.

If you use this code, please cite the above-mentioned paper (Bibtex).

Repository structure

.
├── audio
│   ├── mix_qut_wsj0.wav
│   ├── mix_sunrise.wav
│   └── mix_thierry_roland.wav
├── environment.yml
├── LICENSE.txt
├── main.py
├── QUT_WSJ0_scripts
│   ├── create_QUT_WSJ0_json.py
│   ├── create_QUT_WSJ0_mixtures.py
│   ├── pyloudnorm
│   ├── QUT_WSJ0_test_dataset.json
│   ├── QUT_WSJ0_train_dataset.json
│   ├── QUT_WSJ0_val_dataset.json
│   └── resample_QUT.py
├── README.md
├── saved_model
│   ├── WSJ0_2019-07-15-10h01_RVAE_BRNNenc_BRNNdec_latent_dim=16
│   │   ├── final_model_RVAE_epoch145.pt
│   │   ├── loss.pdf
│   │   ├── loss_RVAE.pckl
│   │   ├── parameters.pckl
│   │   └── parameters.txt
│   ├── WSJ0_2019-07-15-10h14_RVAE_RNNenc_RNNdec_latent_dim=16
│   │   ├── final_model_RVAE_epoch121.pt
│   │   ├── loss.pdf
│   │   ├── loss_RVAE.pckl
│   │   ├── parameters.pckl
│   │   └── parameters.txt
│   └── WSJ0_2019-07-15-10h21_FFNN_VAE_latent_dim=16
│       ├── final_model_RVAE_epoch65.pt
│       ├── loss.pdf
│       ├── loss_RVAE.pckl
│       ├── parameters.pckl
│       └── parameters.txt
├── SE_algorithms.py
└── training
    ├── speech_dataset.py
    ├── train_BRNN_WSJ0.py
    ├── train_FFNN_WSJ0.py
    ├── train_RNN_WSJ0.py
    └── VAEs.py

11 directories, 66 files

Python files

  • main.py: Main script to run the speech enhancement algorithms. If you just want to test the method quickly, run this script. Input and output audio files are located in the audio folder.

  • SE_algorithms.py: Implementation of the speech enhancement algorithms (MCEM, PEEM, VEM).

  • ./training/speech_dataset.py: Custom Pytorch dataset for training.

  • ./training/VAEs.py: Pytorch implementation of the FFNN, RNN and BRNN variational autoencoders (VAEs).

  • ./training/train_FFNN_WSJ0.py: Script to train the FFNN VAE.

  • ./training/train_RNN_WSJ0.py: Script to train the RNN VAE.

  • ./training/train_BRNN_WSJ0.py: Script to train the BRNN VAE.

  • ./QUT_WSJ0_scripts/resample_QUT.py: Script to resample the QUT-NOISE dataset at 16 kHz. This is necessary before actually creating the QUT-WSJ0 dataset.

  • ./QUT_WSJ0_scripts/create_QUT_WSJ0_json.py: Script to create the json files for the training/val/test sets of the QUT-WSJ0 dataset.

  • ./QUT_WSJ0_scripts/create_QUT_WSJ0_mixtures.py: Script to create the mixture wav files for the training/val/test sets of the QUT-WSJ0 dataset.

Conda environment

environment.yml describes the conda environment that was used for the experiments.

License

GNU Affero General Public License (version 3), see LICENSE.txt.