Commit ed8a7847 authored by Simon's avatar Simon
Browse files

record and predict

parent 17b61d66
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Sep 30 10:29:17 2019
This scripts records 10-second audio file using sounddevice,
computes a log-Mel spectrogram, and predicts the tags using
a trained CNN.
@author: sleglaive
"""
import os
import oyaml as yaml
import numpy as np
from keras.models import model_from_json
import mel_features
import sounddevice as sd
from scipy.io.wavfile import write
# Run python3 -m sounddevice
# and choose the proper device
sd.default.device = 7
#%%============================================================================
############################# PATH TO YOUR NETWORK ############################
###============================================================================
model_architecture_file = 'your/path/to/model_architecture.json'
model_weights_file = 'your/path/to/best_model_weights.h5'
#%%
# =============================================================================
# Load taxonomy
# =============================================================================
# Load annotations and taxonomy
taxonomy_file = './dcase-ust-taxonomy.yaml'
with open(taxonomy_file, 'r') as f:
taxonomy = yaml.load(f, Loader=yaml.Loader)
# get list of coarse labels from taxonomy
labels = [v for k,v in taxonomy['coarse'].items()]
#%%
# =============================================================================
# Capture audio and compute log-Mel spectrogram
# =============================================================================
# add your code here
x
#%%
# =============================================================================
# Load model
# =============================================================================
# add your code here
#%%
# =============================================================================
# Predict
# =============================================================================
# add your code here
\ No newline at end of file
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Mon Sep 30 11:41:12 2019
This scripts records an audio file using sounddevice.
@author: sleglaive
"""
import tempfile
import queue
import sys
import sounddevice as sd
import soundfile as sf
from scipy.io.wavfile import write
import numpy # Make sure NumPy is loaded before it is used in the callback
filename = tempfile.mktemp(prefix='recording_', suffix='.wav', dir='')
device = 7 # 'input device (numeric ID or substring)' Run "python3 -m sounddevice" to identify the device.
sd.default.device = 7
samplerate = 44100
channels = 1
subtype = "PCM_24"
# record a fixed or infinite duration
recording_duration = "fixed" # "fixed" or "infinite"
if recording_duration == "fixed":
seconds = 10 # Duration of recording
print('Recording started')
y = sd.rec(int(seconds * samplerate), samplerate=samplerate, channels=1)
sd.wait() # Wait until recording is finished
print('\nRecording finished: ' + repr(filename))
write(filename, samplerate, y) # Save as WAV file
elif recording_duration == "infinite":
q = queue.Queue()
def callback(indata, frames, time, status):
"""This is called (from a separate thread) for each audio block."""
if status:
print(status, file=sys.stderr)
q.put(indata.copy())
try:
# Make sure the file is opened before recording anything:
with sf.SoundFile(filename, mode='x', samplerate=samplerate,
channels=channels, subtype=subtype) as file:
with sd.InputStream(samplerate=samplerate, device=device,
channels=channels, callback=callback):
print('#' * 80)
print('press Ctrl+C to stop the recording')
print('#' * 80)
while True:
audio_block = q.get()
file.write(audio_block)
except KeyboardInterrupt:
print('\nRecording finished: ' + repr(filename))
......@@ -55,7 +55,7 @@ During the development stage, you will work on [Google Colab](https://colab.rese
If you are not already familiar with Google Colab and Jupyter Notebooks, you can have a look to [this brief overview](https://colab.research.google.com/notebooks/basic_features_overview.ipynb).
In order to guide you in this project, you have access to the following Jupyter Notebooks.
In order to guide you in this project, you have access to the following Jupyter Notebooks and Python scripts:
* `1-dataset.ipynb`: This notebook introduces the [SONYC-UST dataset](https://zenodo.org/record/2590742#.XIkTPBNKjuM). You will use this dataset in the development stage of your system, i.e. before deploying it on the [Nvidia Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit).
......@@ -68,9 +68,11 @@ In order to guide you in this project, you have access to the following Jupyter
* `4-model-training.ipynb`: In this notebook, you will build and train a convolutional neural network (CNN) to perform urban sound tagging with [Keras](https://keras.io/). Using transfer learning, your CNN will build upon a model called [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset/vggish). It was trained on [AudioSet](https://github.com/tensorflow/models/tree/master/research/audioset), a dataset of over 2 million human-labeled 10-second YouTube video soundtracks, with labels taken from an ontology of more than 600 audio event classes. This represents more than 5 thousand hours of audio. The method you will implement is based on ["Convolutional Neural Networks with Transfer Learning for Urban Sound Tagging"](http://dcase.community/documents/challenge2019/technical_reports/DCASE2019_Kim_107.pdf) that was proposed by Bongjun Kim (Department of Computer Science, Northwestern University, Evnaston, Illinois, USA) and obtained the 3rd best score at the [DCASE 2019 Challange, task 5](http://dcase.community/challenge2019/task-urban-sound-tagging).
* `5-model-testing.ipynb`: In this notebook, you will evaluate the performance of your trained CNN using standard metrics for [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification).
* `5-model-testing.ipynb`: In this notebook, you will evaluate the performance of your trained CNN using standard metrics for [multi-label classification](https://en.wikipedia.org/wiki/Multi-label_classification). While developing your model, you should only use the validation set of the [SONYC-UST dataset](https://zenodo.org/record/2590742#.XIkTPBNKjuM). When you are satisfied of the performance on the validation set, you can evaluate the model on the test set. You should absolutely avoid evaluating the model on the test set while developing, because if you do so you will start learning the test set.
* `6-jetson-nano.ipynb`: In this notebook, you will learn some tools to help you embed your trained urban sound tagging system on the [Nvidia Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit).
* `6-record-and-predict.py` In this Python script (to be run locally), you will record an audio file, compute the features, and make the prediction using your already trained CNN. This is how you will embed your trained urban sound tagging system on the [Nvidia Jetson Nano](https://developer.nvidia.com/embedded/jetson-nano-developer-kit). Try to make the best integration of your system in terms of latency and interface (controls, display, etc.)
* `audio-recording.py`: This script shows two ways of recording audio with [python-sounddevice](https://python-sounddevice.readthedocs.io/en/0.3.14/#).
In each notebook, you may have to answer questions in *'text cells'*, or to write Python code in *'code cells'*.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment