Publications
カンファレンス (国際) RNSum: A Large-Scale Dataset for Automatic Release Note Generation viaCommit Logs Summarization
Hisashi Kamezawa (U. Tokyo), Noriki Nishida (RIKEN AIP), Nobuyuki Shimizu, Takashi Miyazaki, Hideki Nakayama (U. Tokyo)
Association of Computational Linguistics (ACL)
2022.5.22
A release note is a technical document that de-scribes the latest changes to a software product and is crucial in open source software development. However, it still remains challenging to automatically generate release notes. In this paper, we present a new dataset called RNSum, which contains approximately 82,000 English release notes and the associated commit messages derived from the online repositories inGitHub. Then, we propose class wise extractive-then-abstractive/abstractive summarization approaches to this task, which can employ a modern transformer-based seq2seq network likeBART and can be applied to various repositories without specific constraints. The experimental results on the RNSum dataset show that the proposed methods can generate less noisy release notes at higher coverage than the baselines. We also observe that there is a significant gap in the coverage of essential information, when compared to human references.Our dataset and the code are publicly available.
Paper : RNSum: A Large-Scale Dataset for Automatic Release Note Generation viaCommit Logs Summarization (外部サイト)