GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

6分钟 ·
播放数0
·
评论数0

A podcast discussion about GShard, a module for scaling neural networks using conditional computation and automatic sharding, focusing on its application to multilingual machine translation.