Commit c983b95b authored by Simon's avatar Simon
Browse files

update readme

parent acabbdf4
......@@ -75,9 +75,18 @@ In order to guide you in this project, you have access to the following Jupyter
* `4-model-training.ipynb`: In this notebook, you will build and train a convolutional neural network (CNN) to perform urban sound tagging with [Keras](https://keras.io/). Using transfer learning, your CNN will build upon a model called [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish). It was trained on [AudioSet](https://github.com/tensorflow/models/tree/master/research/audioset), a dataset of over 2 million human-labeled 10-second YouTube video soundtracks, with labels taken from an ontology of more than 600 audio event classes. This represents more than 5 thousand hours of audio. The method you will implement is based on ["Convolutional Neural Networks with Transfer Learning for Urban Sound Tagging"](http://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Kim_107.pdf) that was proposed by Bongjun Kim (Department of Computer Science, Northwestern University, Evnaston, Illinois, USA) and obtained the 3rd best score at the [DCASE 2019 Challenge, task 5](http://dcase.community/challenge2019/task-urban-sound-tagging).
_Troubleshooting_: If you get an error while trying to load the weigths of VGGish into the model, try changing the version of the `hdf5` package installed in the Google Colab environment:
```
!pip install h5py==2.10.0
import h5py
print(h5py.__version__)
```
The version of the packages required for this project are in `requirement.txt`.
* `5-model-testing.ipynb`: In this notebook, you will evaluate the performance of your trained CNN using standard metrics for [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification). While developing your model, you should only use the validation set of the [SONYC-UST dataset](https://zenodo.org/record/2590742#.XIkTPBNKjuM). When you are satisfied of the performance on the validation set, you can evaluate the model on the test set. You should absolutely avoid evaluating the model on the test set while developing, because if you do so you will start learning the test set.
* `6-record-and-predict.py` In this Python script (to be run locally), you will record an audio file, compute the features, and make the prediction using your already trained CNN. This is how you will embed your trained urban sound tagging system on the [Nvidia Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit). Try to make the best integration of your system in terms of latency and interface (controls, display, etc.)
* `6-record-and-predict.py` In this Python script (to be run locally), you will record an audio file, compute the features, and make the prediction using your already trained CNN. This is how you can first try to embed your trained urban sound tagging system on the [Nvidia Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit). However, it will probably be very slow, unfortunately. You can thus try to convert your model to the [ONNX](https://onnx.ai/) format before using it for inference on the Jetson Nano (or even on a standard computer). Try to make the best integration of your system in terms of latency and interface (controls, display, etc.)
* `audio-recording.py`: This script shows two ways of recording audio with [python-sounddevice](https://python-sounddevice.readthedocs.io/en/0.3.14/#).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment