Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO - LY Corporation R&D

Publications

OTHERS (INTERNATIONAL) Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO

Shokichi Takakura, Akifumi Wachi, Rei Higuchi (The University of Tokyo/RIKEN AIP), Kohei Miyaguchi, Taiji Suzuki (The University of Tokyo/RIKEN AIP)

arXiv.org (arXiv)

February 03, 2026

Aligning large language models (LLMs) to diverse human preferences is fundamentally challenging since criteria can often conflict with each other. Inference-time alignment methods have recently gained popularity as they allow LLMs to be aligned to multiple criteria via different alignment algorithms at inference time. However, inference-time alignment is computationally expensive since it often requires multiple forward passes of the base model. In this work, we propose \textit{inference-aware meta-alignment} (IAMA), a novel approach that enables LLMs to be aligned to multiple criteria with limited computational budget at inference time. IAMA trains a base model such that it can be effectively aligned to multiple tasks via different inference-time alignment algorithms. To solve the non-linear optimization problems involved in IAMA, we propose \textit{non-linear GRPO}, which provably converges to the optimal solution in the space of probability measures.

Paper : Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO open into new tab or window (external link)