Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework - LINEヤフーの研究開発

Publications

カンファレンス (国際) Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework

Hokuto Munakata, Ryo Terashima, Yusuke Fujita

The 25th Annual Conference of the International Speech Communication Association (INTERSPEECH 2024)

2024.9.3

We propose a data cleansing method that utilizes a neural analysis and synthesis (NANSY++) framework to train an end-to-end neural diarization model (EEND) for singer diarization. Our proposed model converts song data with chorus singing which is commonly contained in popular music and unsuitable for generating a simulated dataset to the solo singing data. This cleansing is based on NANSY++, which is a framework trained to reconstruct an input non-overlapped audio signal. We exploit the pre-trained NANSY++ to convert chorus singing into clean, non-overlapped audio. This cleansing process mitigates the mislabeling of chorus singing to solo singing and helps the effective training of EEND models even when the majority of available song data contains chorus sections. We experimentally evaluated the EEND model trained with a dataset using our proposed method using annotated popular duet songs. As a result, our proposed method improved 14.8 points in diarization error rate.

音声処理

Paper : Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework 新しいタブまたはウィンドウで開く（外部サイト）

PDF : Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework