Publications
ワークショップ (国際) Lessons on Parameter Sharing across Layers in Transformers
Sho Takase, Shun Kiyono
The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP 2023)
2023.7.13
We propose a novel parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares the parameters of one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to improve the efficiency. We propose three strategies: SEQUENCE, CYCLE, and CYCLE (REV) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in terms of the parameter size and computational time in the machine translation task. We also demonstrate that the proposed strategies are effective in the configuration where we use many training data such as the recent WMT competition. Moreover, we indicate that the proposed strategies are also more efficient than the previous approach (Dehghani et al., 2019) on automatic speech recognition and language modeling tasks.
Paper : Lessons on Parameter Sharing across Layers in Transformers (外部サイト)