<!DOCTYPE html>
<html>
  <head>
    <title>Urban Sound Tagging Project</title>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   <link rel="stylesheet" href="./assets/katex.min.css">
   <link rel="stylesheet" type="text/css" href="./assets/slides.css">
   <link rel="stylesheet" type="text/css" href="./assets/grid.css">
  </head>
  <body>

<textarea id="source">


class: center, middle

# Urban Sound Tagging Project


<br/><br/>
.bold[Simon Leglaive]
<br/><br/>
.tiny[CentraleSupélec]

---
class: middle, center

# Introduction

---
class: middle, center

# You + deep learning = 	&#10084;&#65039;

.vspace[

]

.alert-g[

You spent more than 10 hours learning the basics of deep learning, including PyTorch practice on toy examples.

Now its time to solve a real-world problem!

]

---
class: center, middle

<div style="text-align:center;margin-bottom:30px">
  <iframe width="700" height="400" src="https://www.youtube.com/embed/d-JMtVLUSEg" frameborder="0" allow="autoplay; encrypted-media" style="max-width:100%" allowfullscreen="">
  </iframe>
</div>

---
## Urban Sound Tagging


.small-vspace.left-column[<img src="http://d33wubrfki0l68.cloudfront.net/282c08f73c870b0d68e92024a0248ac73d051daa/91ec9/images/tasks/challenge2016/task4_overview.png" style="width: 470px;" />]

.right-column[Given a 10-second audio recording, predict the presence/absence of 8 urban sounds:
  1.  `engine` 
  2. `machinery-impact` 
  3. `non-machinery-impact` 
  4. `powered-saw` 
  5. `alert-signal` 
  6. `music` 
  7. `human-voice` 
  8. `dog`
]

.reset-column[

]

.small-nvspace[
]

This is a .bold[multi-label classification] problem.

---
## Machine listening

.alert-g[
Machine listening focuses on developing algorithms to **analyze, interpret and understand audio data**, including speech, music, and environmental sounds.
]

--

It involves techniques from **signal processing** and **machine learning** to solve various tasks in
- **Speech Processing**

  Automatic speech recognition, speaker identification and recognition, speech enhancement, ...
- **Music information retrieval** (MIR) 

  Chord and melody recognition, music genre classification, music recommendation, ...

- **Bio-** and **eco-acoustics**
  
  Animal call recognition, migration / bio-diversity / noise pollution monitoring, ...


---
class: middle, center

# Audio signal representation

---
class: middle
## Real-world sounds

Real-world sounds are complex, we need **representations** to highlight their characteristics.

.grid[
.kol-1-2[.width-80[![](images/speech.png)]]
.kol-1-2[.width-80[![](images/piano.png)]]
.kol-1-2[.width-80[![](images/drums.png)]]
.kol-1-2[.width-80[![](images/dog.png)]]
.kol-1-2[.width-80[![](images/subway.png)]]
.kol-1-2[.width-80[![](images/havana.png)]]
]
.center[Waveform representation of different sounds.]

---
class: middle

## Towards a “meaningful” representation

What are meaningful properties of an audio signal?

Let’s look at what musicians use to represent sounds: the musical score.
  
.center.width-80[![](images/partitions_meaningful.svg)]

A succession of “audio events” with indicators of **pitch**, **dynamics**, **tempo**, and **timbre**.

---
class: middle, center

Tempo and rhythm relate to **time** (measured in seconds).

Pitch and timbre relate to **frequency** (measured in Hertz).

Dynamics relates to **intensity** or **power** (measured in decibels).

.vspace[

]

.alert-g[Given the waveform of an audio signal, we would like to compute a representation
highlighting the characteristics of the signal along these three dimensions.

**Such a representation is given by the spectrogram**.

]

---
class: center, middle, black-slide

<iframe width="100%" height="100%" src="https://musiclab.chromeexperiments.com/Spectrogram/" frameborder="0" allowfullscreen></iframe>


---
class: middle

  .alert-g[It is easier to **discriminate** between different sounds from their spectrogram representation than from their waveform.]

.center.width-30[![](images/addll_TF.png)]


.footnote[  .big[🧑‍🏫] .italic["From a pedagogical point of view, spectrograms are great for a deep learning project. <br>They can be (naively) see as images (good for CNNs) or sequential data (good for RNNs)."]]


---
class: middle, center

# Project organization

---
class: middle

## Agenda

- **You should work outside class hours**.
  
- 6 in-class sessions are scheduled to help you and to evaluate you. 
  - 3rd session: Deadline + Evaluation 📌
  - 6th session: Deadline + Evaluation 📌

    See Edunao for the complete agenda.


- For in-class sessions to be useful for you, prepare material (figures, tables, reports, questions, clean code, ...).

---
class: middle

## Resources

  .alert-g[

  <img src="images/gitlab.svg" style="width: 50px;" />
  <br/>https://gitlab-research.centralesupelec.fr/sleglaive/urban-sound-tagging-project

  ]

- All resources and instructions (read them carefully!) are available on Gitlab.

- You are provided with a fully-functional baseline system. 
  
- **Your task is to improve upon this baseline and propose a better urban sound tagging system** 🚀

---
class: middle

## Tools


- You must work with

.center[
<img src="images/pytorch.png" style="width: 150px;" />
<br/>https://pytorch.org/
]

- You will use

  .center[
  <img src="images/cs.jpeg" style="width: 70px;" />
  <br/>https://mydocker.centralesupelec.fr/
  ]

  to have access to computational ressources.

---
class: middle

## Evaluation

In brief:

- 2 intermediary deadlines and evaluated sessions
- 1 final technical report per group
- 1 final video per student


In details:

- See Edunao

---
class: middle

.alert[

The final performance of your system is not the objective and will not count for your evaluation.

You should target a thoughtful, rigorous, organized and justified approach. This is what really matters, not the final scores.

]


---
class: middle, center

# Now, hands on!


</textarea>
  <script src="./assets/remark-latest.min.js"></script>
  <script src="./assets/auto-render.min.js"></script>
  <script src="./assets/katex.min.js"></script>
  <script type="text/javascript">
      function getParameterByName(name, url) {
          if (!url) url = window.location.href;
          name = name.replace(/[\[\]]/g, "\\$&");
          var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
              results = regex.exec(url);
          if (!results) return null;
          if (!results[2]) return '';
          return decodeURIComponent(results[2].replace(/\+/g, " "));
      }

      var options = {sourceUrl: getParameterByName("p"),
                     highlightLanguage: "python",
                     // highlightStyle: "tomorrow",
                     // highlightStyle: "default",
                     highlightStyle: "github",
                     // highlightStyle: "googlecode",
                     // highlightStyle: "zenburn",
                     highlightSpans: true,
                     highlightLines: true,
                     ratio: "16:9"};

      var renderMath = function() {
          renderMathInElement(document.body, {delimiters: [ // mind the order of delimiters(!?)
              {left: "$$", right: "$$", display: true},
              {left: "$", right: "$", display: false},
              {left: "\\[", right: "\\]", display: true},
              {left: "\\(", right: "\\)", display: false},
          ]});
      }
    var slideshow = remark.create(options, renderMath);
  </script>
  </body>
</html>