Publications

カンファレンス (国際) Lattice Path Edit Distance: A Romanization-aware Edit Distance for Extracting Misspelling-Correction Pairs from Japanese Search Query Log

Nobuhiro Kaji

The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)

2023.12.8

Edit distance has been successfully used to extract training data, \textit{i.e.}, misspelling-correction pairs, of spelling correction models from search query logs in languages including English. However, the success does not readily apply to Japanese, where misspellings are often dissimilar to correct spellings due to the romanization-based input methods. To address this problem, we introduce \textit{lattice path edit distance}, which utilizes romanization lattices to efficiently consider all possible romanized forms of input strings. Empirical experiments using Japanese search query logs demonstrated that the lattice path edit distance outperformed baseline methods including the standard edit distance combined with an existing transliterator and morphological analyzer.

Paper : Lattice Path Edit Distance: A Romanization-aware Edit Distance for Extracting Misspelling-Correction Pairs from Japanese Search Query Log新しいタブまたはウィンドウで開く (外部サイト)