Publications
CONFERENCE (INTERNATIONAL) Over-penalization for Extra Information in Neural IR Models
Kota Usuha (Tsukuba univ.), Makoto P. Kato (Tsukuba univ.), Sumio Fujita
The 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)
October 21, 2024
This paper presents our analysis of neural IR models, particularly focusing on over-penalization for extra information (OPEX) — the phenomenon where addition of a sentence to a document causes an unreasonable decline in the document rank. We found that neural IR models suffered from OPEX, especially when the added sentence is similar to the other sentences in the document. To mitigate OPEX, we propose to apply a window-based scoring approach that segments a document and aggregates scores of the segments to compute the overall document score. We theoretically proved that the window-based scoring approach fully suppressed OPEX in an extreme case where each segment contains only a single sentence, and empirically showed that this approach mitigated OPEX.
Paper : Over-penalization for Extra Information in Neural IR Models (external link)