Summary
ASPED (Audio Sensing for PEdestrian Detection) is a large-scale audio and video dataset prepared for pedestrian detection using sound.
ASPED consists of almost 2,600 hours of audio, more than 3.4 million continuous frames in video,
and corresponding annotation of pedestrian count for each audio and video.
For more information, please take a look at our paper in progress.
Download ASPED v1.0
- We understand that our data package is quite extensive! Feel free to start with our mini test package for a trial. It includes one segment of our audio file along with the matching annotation file.
- Our model is available in our github repository, here.
File | Description | Size | Download |
---|---|---|---|
Mini Test Data Package |
ASPED v1.0 Session 1 Mini Package
|
* | Download |
Audio Data Package | ASPED v1.0 Session 1-5 Audio Recording (.MP4) ASPED 1.0 Session 1-5 Annotation (.CSV) |
580 GB (Approx. 3 TB after decoding) | Download |
Video Data Package | ASPED v1.0 Session 1-5 Video Recording (.MP4) | Approx. 2 TB | Download |
Metadata |
ASPED v1.0 Session 1-5 Metadata (.XLSX)
|
3.12 MB | Download |
Data Description
1. Session Details
Session | Date | Location | # of Video Recorders | Total Video Frames | # of Audio Recorders |
---|---|---|---|---|---|
Session 1 | May 24-26, 2023 | Cadell Courtyard | 1 | 160,473 | 6 |
Tech Walkway | 4 | 616,582 | 9 | ||
Session 2 | June 1-3, 2023 | Cadell Courtyard | 1 | 163,678/td> | 6 |
Tech Walkway | 3 | 460,926 | 6 | ||
Session 3 | June 7-9, 2023 | Cadell Courtyard | 1 | 163,914 | 6 |
Tech Walkway | 3 | 467,919 | 7 | ||
Session 4 | June 21-23, 2023 | Cadell Courtyard | 1 | 156,008 | 6 |
Tech Walkway | 4 | 586,494 | 9 | ||
Session 5 | June 28-30, 2023 | Cadell Courtyard | 1 | 163,903 | 6 |
Tech Walkway | 3 | 466,075 | 7 |
2. Audio Data
-
|--Metadata.xlsx |--Session_5242023 |-- Cadell |--Audio * each location has multiple recorders |--Recorder1_[device-name] |--[audio1].wav |--[audio2].wav ... |--Recorder2_[device-name] ... |--Labels * each location has one annotation file |--5-24-Cadell.csv |-- TechwayA |-- TechwayB |-- TechwayC |-- TechwayD |--Test_6012023 |--Test_6072023 |--Test_6212023 |--Test_6282023
ffmpeg -i /PATH/TO/FLAC.flac -o /PATH/TO/WAV.wav
or use python packages like 'Pydub'.
The WAV format audio will be about 3 TB in total and the FLAC format will be about 580 GB.
3. Video Data
-
|--Metadata.xlsx |--Session_5242023 |-- Cadell |--Video * each location has one camera |--Camera1_[device-name] |--[video1].MP4 |--[video2].MP4 ... |-- TechwayA |-- TechwayB |-- TechwayC |-- TechwayD |--Test_6012023 |--Test_6072023 |--Test_6212023 |--Test_6282023
We set up five video cameras to capture footage to determine the actual count of pedestrians walking past the audio recorders. Each recording session captured almost two days' worth of footage at a rate of 1 frame per second, leading to a combined total of approximately 10 days of recordings in the entire dataset. Each camera covered multiple audio recording devices, as depicted in the following map. You can find the map in the metadata file.
4. Annotation Data
Each annotation file lists the number of detected pedestrians within 1, 3, 6, and 9-meter radii around the monitored audio recorders.
Every row presents the timestamp, frame number, and the count of pedestrians for each audio recorder across the four specified radii.