Scene-Dependent Acoustic Event Detection with Scene Conditioning and Fake-Scene-Conditioned Loss - LINEヤフーの研究開発

Publications

カンファレンス (国際) Scene-Dependent Acoustic Event Detection with Scene Conditioning and Fake-Scene-Conditioned Loss

Tatsuya Komatsu, Keisuke Imoto (Ritsumeikan University), Masahito Togami

2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2020)

2020.5.4

In this paper, we propose scene-dependent acoustic event detection (AED) with scene conditioning and fake-scene-conditioned loss. The proposed method employs a multitask network, that has not only AED part but also acoustic scene classification (ASC). The scenes predicted by ASC are employed as an additional feature for scene conditioning of AED to learn the relationship between scenes and events. For efficient training, the proposed method incorporates a new AED loss function, which is the fake-scene-conditioned loss, in addition to the conventional AED loss. Upon training, the AED part is conditioned with fake scenes as well as predicted and true scenes. The fake-scene-conditioned loss is calculated between the fake-scene-conditioned AED results and labels of events that do not exist in the fake scenes are removed. Whereas training with combinations of true scenes/events, i.e., the conventional AED loss, only reveals that an event is present in a scene, with fake-scene-conditioned loss, the proposed method can learn that an event is absent in a scene. Experimental results show that the proposed method improves the AED performance compared with the baseline; an increase in the f1 score of 23% and a decrease in the false alarm rate of 56% for scenes where no event exists.

音声処理

Paper : Scene-Dependent Acoustic Event Detection with Scene Conditioning and Fake-Scene-Conditioned Loss 新しいタブまたはウィンドウで開く（外部サイト）