Webinar ID: 935 2181 2801
Recently, Transformers have shown remarkable performance in NLP tasks and become the new standard in the field. However, we believe that they are still relatively understudied and require more attention. In our research, we focused on Transformers' learning capacity and studied if they can handle multitasking. Experimental results show that they benefit from parameter sharing and have a surprisingly high capacity to serve different tasks simultaneously, without requiring additional parameters. Multitasking not only enables training multi-functional networks with the budget of one, but also allows each task to act as an implicit supervisor for others. In this talk, we discuss different scenarios where complex tasks such as 'language translation and error correction' or 'video transcript generation and annotation' are carried out at the same time via a single network.
Register for the GERAD’s Efficient Machine Learning email notification.