Skip to content
Snippets Groups Projects
Commit 7effa47e authored by Simon's avatar Simon
Browse files

removed ipynb checkpoints

parent c1327d7f
No related branches found
No related tags found
No related merge requests found
%% Cell type:markdown id: tags:
# The SONYC Urban Sound Tagging (SONYC-UST) dataset
%% Cell type:markdown id: tags:
## Preparing the dataset
%% Cell type:markdown id: tags:
### Download the dataset
Below, you will download the [SONYC-UST dataset](https://zenodo.org/record/2590742#.XIkTPBNKjuM) in your google drive. The dataset was released along with the following paper, that you should read:
> Cartwright, M., Mendez, A.E.M., Cramer, J., Lostanlen, V., Dove, G., Wu, H., Salamon, J., Nov, O., Bello, J.P. SONYC Urban Sound Tagging (SONYC-UST): A Multilabel Dataset from an Urban Acoustic Sensor Network. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE) , 2019. [PDF](https://dcase.community/documents/workshop2019/proceedings/DCASE2019Workshop_Cartwright_4.pdf)
%% Cell type:code id: tags:
``` python
import os
root_path = "/data/enseignement/2024-2025/3A-IA-DL/UST-project-kick-starter"
ust_data_path = os.path.join(root_path, "data/ust-data/sonyc-ust")
```
%% Cell type:code id: tags:
``` python
if not os.path.isdir(ust_data_path):
os.makedirs(ust_data_path, exist_ok=True) # create a folder to store the data
os.makedirs(os.path.join(ust_data_path, 'audio-dev'), exist_ok=True) # create a folder to store the development data
os.chdir(ust_data_path)
!wget https://zenodo.org/record/3338310/files/annotations.csv
!wget https://zenodo.org/record/3338310/files/audio-dev.tar.gz
!wget https://zenodo.org/record/3338310/files/audio-eval.tar.gz
!wget https://zenodo.org/record/3338310/files/dcase-ust-taxonomy.yaml
!wget https://zenodo.org/record/3338310/files/README.md
os.chdir("audio-dev")
!tar xf ../audio-dev.tar.gz
os.chdir("..")
!rm audio-dev.tar.gz
!tar xf audio-eval.tar.gz
!rm audio-eval.tar.gz
os.chdir(root_path)
```
%% Cell type:markdown id: tags:
You should end-up with the following file structure:
%% Cell type:raw id: tags:
```
data
+-- ust-data
| +-- sonyc-ust
| | +-- audio-dev
| | | +-- train
| | | | +-- 01_000006.wav
| | | | +-- ...
| | | +-- validate
| | | | +-- 00_000066.wav
| | | | +-- ...
| | +-- audio-eval
| | | +-- 00_010346.wav
| | | +-- ...
| | +-- annotations.csv
| | +-- dcase-ust-taxonomy.yaml
| | +-- README.md
```
%% Cell type:markdown id: tags:
## The SONYC Urban Sound Tagging Dataset
### Description
SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realistic urban noise monitoring. The audio was recorded from the [SONYC](https://wp.nyu.edu/sonyc) acoustic sensor network. Volunteers on the [Zooniverse](https://zooniverse.org) citizen science platform tagged the presence of 23 classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into 8 coarse-grained classes. The recordings are split into three sets: training, validation, and testing. The training and validation sets are disjoint with respect to the sensor from which each recording came, and the testing set is displaced in time. For increased reliability, three volunteers annotated each recording, and members of the SONYC team subsequently created a set of ground-truth tags for the validation set using a two-stage annotation procedure in which two annotators independently tagged and then collectively resolved any disagreements.
%% Cell type:markdown id: tags:
### Audio data
The provided audio has been acquired using the SONYC acoustic sensor network for urban noise pollution monitoring. Over 50 different sensors have been deployed in New York City, and these sensors have collectively gathered the equivalent of 37 years of audio data, of which SONYC provides a small subset. The data was sampled by selecting the nearest neighbors on [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset) features of recordings known to have classes of interest. All recordings are 10 seconds and were recorded with identical microphones at identical gain settings. To maintain privacy, the recordings in this release have been distributed in time and location, and the time and location of the recordings are not included in the metadata.
%% Cell type:markdown id: tags:
### Label taxonomy
The label taxonomy is as follows:
1. engine
1: small-sounding-engine
2: medium-sounding-engine
3: large-sounding-engine
X: engine-of-uncertain-size
2. machinery-impact
1: rock-drill
2: jackhammer
3: hoe-ram
4: pile-driver
X: other-unknown-impact-machinery
3. non-machinery-impact
1: non-machinery-impact
4. powered-saw
1: chainsaw
2: small-medium-rotating-saw
3: large-rotating-saw
X: other-unknown-powered-saw
5. alert-signal
1: car-horn
2: car-alarm
3: siren
4: reverse-beeper
X: other-unknown-alert-signal
6. music
1: stationary-music
2: mobile-music
3: ice-cream-truck
X: music-from-uncertain-source
7. human-voice
1: person-or-small-group-talking
2: person-or-small-group-shouting
3: large-crowd
4: amplified-speech
X: other-unknown-human-voice
8. dog
1: dog-barking-whining
The classes preceded by an `X` code indicate when an annotator was able to identify the coarse class, but couldn't identify the fine class because either they were uncertain which fine class it was or the fine class was not included in the taxonomy. `dcase-ust-taxonomy.yaml` contains this taxonomy in an easily machine-readable form.
**In this project, we are only interested in the 8 coarse-grained labels.**
%% Cell type:markdown id: tags:
### Data splits
This release contains a training subset (2351 recordings), and validation subset (443 recordings), and a test subset (274 recordings). The training and validation subsets are disjoint with respect to the sensor from which each recording came, and were chosen such that the distribution of citizen science provided labels were similar for each split. The sensors in the test set are not disjoint from the training and validation subsets, but the test recordings are displaced in time, occurring after any of the recordings in the training and validation subset.
%% Cell type:markdown id: tags:
### Annotation data
The annotation data are contained in `annotations.csv`, and encompass the training, validation, and test subsets. Each row in the file represents one multi-label annotation of a recording---it could be the annotation of a single citizen science volunteer, a single SONYC team member, or the agreed-upon ground truth by the SONYC team (see the annotator_id column description for more information).
#### Columns
*split*
: The data split. (*train*, *validate*)
*sensor_id*
: The ID of the sensor the recording is from. These have been anonymized to have no relation to geolocation.
*audio_filename*
: The filename of the audio recording
*annotator_id*
: The anonymous ID of the annotator. If this values is positive, it is a citizen science volunteer from the Zooniverse platform. If it is negative, it is a SONYC team member (only present for validation set). If it is 0, then it is the ground truth agreed-upon by the SONYC team.
*(coarse_id)-(fine_id)\_(fine_name)_presence*
: Columns of this form indicate the presence of fine-level class. `1` if present, `0` if not present. If `-1`, then the class wasn't labeled in this annotation because the annotation was performed by a SONYC team member who only annotated one coarse group of classes at a time when annotating the validation set.
*(coarse_id)\_(coarse_name)_presence*
: Columns of this form indicate the presence of a coarse-level class. `1` if present, `0` if not present. If `-1`, then the class wasn't labeled in this annotation because the annotation was performed by a SONYC team member who only annotated one coarse group of classes at a time when annotating the validation set. These columns are computed from the fine-level class presence columns and are presented here for convenience when training on only coarse-level classes.
*(coarse_id)-(fine_id)\_(fine_name)_proximity*
: Columns of this form indicate the proximity of a fine-level class. After indicating the presence of a fine-level class, citizen science annotators were asked to indicate the proximity of the sound event to the sensor. Only the citizen science volunteers performed this task, and therefore this data is included for training but not validation. This columns can take on four values: (`near`, `far`, `notsure`, `-1`). If `-1`, then the proximity was not annotated because either the annotation wasn't performed by a citizen science volunteer, or the citizen science volunteer did not indicate the presence of the class.
%% Cell type:markdown id: tags:
### Conditions of use
Dataset created by Mark Cartwright (1,2,3), Ana Elisa Mendez Mendez (1), Graham Dove (2), Jason Cramer (1), Vincent Lostanlen (1,2,4), Ho-Hsiang Wu (1), Justin Salamon (1,5), Oded Nov (6), Juan Pablo Bello (1,2,3)
1. Music and Audio Resarch Lab, New York University
2. Center for Urban Science and Progress, New York University
3. Department of Computer Science and Engineering, New York University
4. Cornell Lab of Ornithology
5. Adobe Research
6. Department of Technology Management and Innovation, New York University
The SONYC-UST dataset is offered free of charge under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) license:
https://creativecommons.org/licenses/by/4.0/
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment