Publications

カンファレンス (国際) Over-penalization for Extra Information in Neural IR Models

Kota Usuha (Tsukuba univ.), Makoto P. Kato (Tsukuba univ.), Sumio Fujita

The 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)

2024.10.21

This paper presents our analysis of neural IR models, particularly focusing on over-penalization for extra information (OPEX) — the phenomenon where addition of a sentence to a document causes an unreasonable decline in the document rank. We found that neural IR models suffered from OPEX, especially when the added sentence is similar to the other sentences in the document. To mitigate OPEX, we propose to apply a window-based scoring approach that segments a document and aggregates scores of the segments to compute the overall document score. We theoretically proved that the window-based scoring approach fully suppressed OPEX in an extreme case where each segment contains only a single sentence, and empirically showed that this approach mitigated OPEX.

Paper : Over-penalization for Extra Information in Neural IR Models新しいタブまたはウィンドウで開く (外部サイト)