カンファレンス (国際) Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Haruka Kiyohara (Tokyo Institute of Technology), Masatoshi Uehara (Cornell University), Yusuke Narita (Yale University), Nobuyuki Shimizu (Yahoo Japan Corporation), Yasuo Yamamoto (Yahoo Japan Corporation), Yuta Saito (Cornell University)



Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous works assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to data-drivenly identify the right user behavior model to use to minimize the MSE. Extensive synthetic and real-world experiments demonstrate that the empirical accuracy improvement can be significant, enabling an effective OPE of ranking systems even under diverse user behavior.

Paper : Off-Policy Evaluation of Ranking Policies under Diverse User Behavior新しいタブまたはウィンドウで開く (外部サイト)