Publications

CONFERENCE (INTERNATIONAL) No-regret Bandit Exploration based on Soft Tree Ensemble Model

Shogo Iwazaki, Shinya Suzumura

The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

December 19, 2024

We propose a new stochastic bandit algorithm based on reward estimates with a tree ensemble model. Specifically, we focus on a soft tree model, a variant of the standard decision tree that has been studied practically and theoretically in recent years. By deriving some non-trivial properties of soft trees, we generalize the existing analysis technique of the neural bandit algorithm to our soft trees-based one. We show that our algorithm attains smaller cumulative regret than the existing neural bandit algorithm, but this is at the expense of the hypothesis space of the soft tree ensemble model being more constrained than that of a ReLU-based neural network. Our numerical experiments show that our algorithm is competitive against other baseline algorithms.

PDF : No-regret Bandit Exploration based on Soft Tree Ensemble Model