Duality-based Residual Estimation for Fully Offline Value-based Reinforcement Learning - LINEヤフーの研究開発

Publications

カンファレンス (国際) Duality-based Residual Estimation for Fully Offline Value-based Reinforcement Learning

The 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026)

2026.5.2

Value-based reinforcement learning (RL) efficiently handles high-dimensional state spaces, but existing methods lack a principled method for hyperparameter tuning without online interaction, limiting use in safety-critical and data-scarce domains. We propose the Duality-based Residual Estimator (DRE), a simple offline validation metric for value-based offline RL. DRE is compatible with standard value-based OPE and enables automatic hyperparameter selection. Our results address a key theoretical bottleneck toward fully offline value-based RL, enabling deployment without online tuning.

機械学習・データサイエンス
Trustworthy AI

Paper : Duality-based Residual Estimation for Fully Offline Value-based Reinforcement Learning 新しいタブまたはウィンドウで開く（外部サイト）