Evaluating Self-Supervised Speech Models Via Text-Based LLMs - LY Corporation R&D

Publications

WORKSHOP (INTERNATIONAL) Evaluating Self-Supervised Speech Models Via Text-Based LLMs

Takashi Maekaku, Keita Goto, Jinchuan Tian (Carnegie Mellon University), Yusuke Shinohara, Shinji Watanabe (Carnegie Mellon University)

2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2025)

December 08, 2025

Self-Supervised Learning (SSL) has gained traction for its ability to learn rich representations with low labeling costs, applicable across diverse downstream tasks. However, assessing the downstream-task performance remains challenging due to the cost of extra training and evaluation. Existing methods for task-agnostic evaluation also require extra training or hyperparameter tuning. We propose a novel evaluation metric using large language models (LLMs). By inputting discrete token sequences and minimal domain cues derived from SSL models into LLMs, we obtain the mean log-likelihood; these cues guide in-context learning, rendering the score more reliable without extra training or hyperparameter tuning. Experimental results show a correlation between LLM-based scores and automatic speech recognition task. Additionally, our findings reveal that LLMs not only functions as an SSL evaluation tools but also provides inference-time embeddings that are useful for speaker verification task.

Speech Processing

Paper : Evaluating Self-Supervised Speech Models Via Text-Based LLMs open into new tab or window (external link)