Publications

WORKSHOP (INTERNATIONAL) Lessons on Parameter Sharing across Layers in Transformers

Sho Takase, Shun Kiyono

The Fourth Workshop on Simple and Efficient Natural Language Processing (SustaiNLP 2023)

July 13, 2023

We propose a novel parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares the parameters of one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to improve the efficiency. We propose three strategies: SEQUENCE, CYCLE, and CYCLE (REV) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in terms of the parameter size and computational time in the machine translation task. We also demonstrate that the proposed strategies are effective in the configuration where we use many training data such as the recent WMT competition. Moreover, we indicate that the proposed strategies are also more efficient than the previous approach (Dehghani et al., 2019) on automatic speech recognition and language modeling tasks.

Paper : Lessons on Parameter Sharing across Layers in Transformersopen into new tab or window (external link)