# Urban Sound Tagging Project

.bold[Simon Leglaive]

class: middle, center

# Introduction

class: middle, center

# You + deep learning = 	&#10084;&#65039;




You spent more than 10 hours learning the basics of deep learning, including PyTorch practice on toy examples.

Now its time to solve a real-world problem!


class: center, middle

<div style="text-align:center;margin-bottom:30px">
  <iframe width="700" height="400" src="https://www.youtube.com/embed/d-JMtVLUSEg" frameborder="0" allow="autoplay; encrypted-media" style="max-width:100%" allowfullscreen="">

## Urban Sound Tagging

.small-vspace.left-column[<img src="http://d33wubrfki0l68.cloudfront.net/282c08f73c870b0d68e92024a0248ac73d051daa/91ec9/images/tasks/challenge2016/task4_overview.png" style="width: 470px;" />]

.right-column[Given a 10-second audio recording, predict the presence/absence of 8 urban sounds:
  1.  `engine` 
  2. `machinery-impact` 
  3. `non-machinery-impact` 
  4. `powered-saw` 
  5. `alert-signal` 
  6. `music` 
  7. `human-voice` 
  8. `dog`




This is a .bold[multi-label classification] problem.

## Machine listening

Machine listening focuses on developing algorithms to **analyze, interpret and understand audio data**, including speech, music, and environmental sounds.


It involves techniques from **signal processing** and **machine learning** to solve various tasks in
- **Speech Processing**

  Automatic speech recognition, speaker identification and recognition, speech enhancement, ...
- **Music information retrieval** (MIR) 

  Chord and melody recognition, music genre classification, music recommendation, ...

- **Bio-** and **eco-acoustics**
  Animal call recognition, migration / bio-diversity / noise pollution monitoring, ...

class: middle, center

# Audio signal representation

class: middle
## Real-world sounds

Real-world sounds are complex, we need **representations** to highlight their characteristics.

.center[Waveform representation of different sounds.]

class: middle

## Towards a “meaningful” representation

What are meaningful properties of an audio signal?

Let’s look at what musicians use to represent sounds: the musical score.

A succession of “audio events” with indicators of **pitch**, **dynamics**, **tempo**, and **timbre**.

class: middle, center

Tempo and rhythm relate to **time** (measured in seconds).

Pitch and timbre relate to **frequency** (measured in Hertz).

Dynamics relates to **intensity** or **power** (measured in decibels).



.alert-g[Given the waveform of an audio signal, we would like to compute a representation
highlighting the characteristics of the signal along these three dimensions.

**Such a representation is given by the spectrogram**.


class: center, middle, black-slide

<iframe width="100%" height="100%" src="https://musiclab.chromeexperiments.com/Spectrogram/" frameborder="0" allowfullscreen></iframe>

class: middle

  .alert-g[It is easier to **discriminate** between different sounds from their spectrogram representation than from their waveform.]


.footnote[  .big[🧑‍🏫] .italic["From a pedagogical point of view, spectrograms are great for a deep learning project. <br>They can be (naively) see as images (good for CNNs) or sequential data (good for RNNs)."]]

class: middle, center

# Project organization

class: middle

## Agenda

- **You should work outside class hours**.
- 6 in-class sessions are scheduled to help you and to evaluate you. 
  - 3rd session: Deadline + Evaluation 📌
  - 6th session: Deadline + Evaluation 📌

    See Edunao for the complete agenda.

- For in-class sessions to be useful for you, prepare material (figures, tables, reports, questions, clean code, ...).

class: middle

## Resources


  <img src="images/gitlab.svg" style="width: 50px;" />


- All resources and instructions (read them carefully!) are available on Gitlab.

- You are provided with a fully-functional baseline system. 
- **Your task is to improve upon this baseline and propose a better urban sound tagging system** 🚀

class: middle

## Tools

- You must work with

<img src="images/pytorch.png" style="width: 150px;" />

- You will use

  <img src="images/cs.jpeg" style="width: 70px;" />

  to have access to computational ressources.

class: middle

## Evaluation

In brief:

- 2 intermediary deadlines and evaluated sessions
- 1 final technical report per group
- 1 final video per student

In details:

- See Edunao

class: middle


The final performance of your system is not the objective and will not count for your evaluation.

You should target a thoughtful, rigorous, organized and justified approach. This is what really matters, not the final scores.


class: middle, center

# Now, hands on!

