Lessons on Parameter Sharing across Layers in Transformers - LINEヤフーの研究開発

Publications

ワークショップ (国際) Lessons on Parameter Sharing across Layers in Transformers

Sho Takase, Shun Kiyono

The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP 2023)

2023.7.13

We propose a novel parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares the parameters of one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to improve the efficiency. We propose three strategies: SEQUENCE, CYCLE, and CYCLE (REV) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in terms of the parameter size and computational time in the machine translation task. We also demonstrate that the proposed strategies are effective in the configuration where we use many training data such as the recent WMT competition. Moreover, we indicate that the proposed strategies are also more efficient than the previous approach (Dehghani et al., 2019) on automatic speech recognition and language modeling tasks.

自然言語処理

Paper : Lessons on Parameter Sharing across Layers in Transformers 新しいタブまたはウィンドウで開く（外部サイト）