Publications
CONFERENCE (INTERNATIONAL) Multi-channel separation of dynamic speech and sound events
Takuya Fujimura (Nagoya University), Robin Scheibler
The 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)
August 20, 2023
We propose a multi-channel separation method for moving sound sources. We build upon a recent beamformer for a moving speaker using attention-based tracking. This method uses an attention mechanism to compute the time-varying spatial statistics which enables tracking the moving source. While this prior work aimed to extract a single target source, we simultaneously estimate multiple sources. Our main technical contribution is to introduce attention-based tracking into the iterative source steering algorithm for independent vector analysis (IVA), enabling joint estimation of multiple sources. We experimentally show that the proposed method greatly improves the separation performance for moving speakers, including an absolute reduction of 27.2% in word error rate compared to time-invariant IVA. In addition, we demonstrate that the proposed method is effective as a pre-processing for sound event detection, showing an improvement in F1 scores of up to 4.7% in real recordings.
Paper : Multi-channel separation of dynamic speech and sound events (external link)