Conformer-based sound event detection with semi-supervised learning and data augmentation - LY Corporation R&D

Publications

WORKSHOP (INTERNATIONAL) Conformer-based sound event detection with semi-supervised learning and data augmentation

Koichi Miyazaki (Nagoya University), Tatsuya Komatsu, Tomoki Hayashi (Human Dataware Lab), Shinji Watanabe (Johns Hopkins University), Tomoki Toda (Nagoya University), Kazuya Takeda (Nagoya University)

Detection and Classification of Acoustic Scenes and Events (DCASE 2020)

November 02, 2020

Sound event detection (SED) is a popular area of research and this paper implements a Conformer-based system to solve it. Conformer is a convolution-augmented Transformer that can use CNN to more effectively extract the local features from audio data and use the Transformer to capture the global features. In order for SED to model and identify different types of sounds, both global and local information must be considered. Therefore, this method can effectively model the various features of sound events. Furthermore, inspired by the BERT, which has been very successful in the field of natural language processing (NLP), the Conformer-based method incorporates a special token that is used to predict weak labels in the input sequence. This token can aggregate the information of the whole sequence. In addition, we use semi-supervised learning to further enhance our model, and explore the impact of different data augmentation techniques. We evaluated our Conformer based model using the DCASE2020 Task4 validation dataset and obtained 47.7% event-based macro F1 scores, which is significantly better than 34.8% of the baseline system.

Speech Processing

Paper : Conformer-based sound event detection with semi-supervised learning and data augmentation open into new tab or window (external link)