Download ASPED.b

We are currently preparing the publicly available version of ASPED.b.

File Description Size Download
Mini Test Data Package ASPED.b Session 1 Mini Package
  • one audio file (.MP4)
  • one annotation file (.CSV)
  • metadata file (.XLSX)
* Download link will be available shortly.
Audio Data Package ASPED.b Session 1-4 Audio Recording (.MP4)
ASPED.b Session 1-4 Annotation (.CSV)
240 GB
(Approx. 1 TB after decoding)
Download
Video Data Package ASPED.b Session 1-4 Video Recording (.MP4) Approx. 1 TB Download
Metadata ASPED.b Session 1-4 Metadata (.XLSX)
  • Session time, location map
  • Device coordinates
  • Start and end time of recordings
989 KB Download

Data Description

  1. Session Details
  2. Audio Data
  3. Video Data
  4. Annotation Data

1. ASPED.b Session Details

Session Date Location # of Video Recorders Total Video Frames # of Audio Recorders % of Bus Obstructed Frames
Session 1 July 26-28, 2023 Fifth Street 6 916,825 8 2.32%
Session 2 August 9-11, 2023 Fifth Street 6 932,842 7 2.47%
Session 3 November 7-9, 2023 Fifth Street 4 536,822 4 4.96%
Session 4 November 28-30, 2023 Fifth Street 4 560,024 5 2.51%


2. Audio Data

  • |--Session_07262023
      |-- FifthSt_A
        |-- Audio
          |-- [Recorder1]
            |-- 0001.flac
            |-- 0002.flac
            ...
          ...
          |-- [Recorder2]
      |-- FifthSt_B
      |-- FifthSt_C
      |-- FifthSt_D
      |-- FifthSt_E
      |-- FifthSt_F
    |--Session_08092023
    |--Session_11072023
    |--Session_11282023

audio_recorder

All the audio files are encoded as FLAC files. FLAC is a widely applied lossless audio encoding format. For more information about FLAC, please refer to this Wikipedia page.

For decoding, you can use ffmpeg by running
ffmpeg -i /PATH/TO/FLAC.flac -o /PATH/TO/WAV.wav 
or use python packages like 'Pydub'. The WAV format audio will be about 1 TB in total and the FLAC format will be about 240 GB.



3. Video Data

  • |--Session_07262023
      |-- FifthSt_A
        |--Video         * each location has one camera
          |-- 0001.mp4
          |-- 0002.mp4
          ...
      |-- FifthSt_B
      |-- FifthSt_C
      |-- FifthSt_D
      |-- FifthSt_E
      |-- FifthSt_F
    |--Session_08092023
    |--Session_11072023
    |--Session_11282023

We set up six video cameras to capture footage to determine the actual count of pedestrians walking past the audio recorders. Each recording session captured almost 41 hours worth of footage at a rate of 1 frame per second, leading to a combined total of approximately 34 days of recordings in the entire dataset among 20 different video recording sessions.

Each camera covered multiple audio recording devices, as depicted in the following map. You can find the map in the metadata file.

video_installing
location_fifthst

This dataset can be utilized either to reproduce the study or to train a computer vision model for pedestrian detection.



4. Annotation Data

Each annotation file lists the number of detected pedestrians within 1, 3, 6, and 9-meter radii around the monitored audio recorders (indicated by the "recorder" columns). For ASPED.b, the sample annotation file also includes info regarding whether an audio recorder contains a vehicle inside of the 1, 3, 6, or 9-meter radii (indicated by the "view_recorder" columns). Lastly, the annotation file also indicates whether the camera was obstructed by a bus in a specific frame (indicated by the "busFrame" column).

[Sample Annotation Data]
sample_annotation_b
We used the Masked-attention Mask Transformer (Mask2Former) model to detect and annotate the pedestrians passing by the audio recorders in the video recordings, with a prediction accuracy threshold of 0.7. For this research, we used the Mask2Former version from OpenMMLab, which was trained using the Microsoft COCO dataset.



5. Code

We have also provided PyTorch util files to help you work with the ASPED dataset. These files handle the data loading, processing, and preparation for model training.

File Description Download
Dataset File
  • torch.utils.data.Dataset implementation for the ASPED.b dataset
  • Accesses individual audio clips and labels from the dataset, and efficiently
  • Handles large audio files that may be split into multiple parts
Download
DataModule File
  • pytorch_lightning.LightningDataModule organizes the entire data pipeline
  • Takes the Dataset above, splits it into training, validation, and test sets, and creates the DataLoaders that feed batches of data to the model
  • Handles data augmentation and class balancing
Download