目前的位置: 首页 实验室新闻 正文

Training Large Neural Networks with Small Network Footprint


Abstract:

Distributed machine learning (ML) systems using parameter servers are prevalently used in industry. With the rapid development of GPU, training performance is usually bottlenecked at network communication for exchanging gradients and parameters.

In this talk, I will share our work on how to alleviate the communication bottleneck and speed up distributed ML training. First I will motivate the problem with measurements on GPU clusters in Azure and EC2. Then I will share the design and implementation of our solution, a system called Stanza that separates the training of different layers in ML models, by exploiting their distinct characteristics. A prototype of Stanza is implemented on PyTorch. Our evaluation on Azure and EC2 shows that Stanza provides 1.25x to 13x speedups over parameter server, for training common CNNs on ImageNet with Nvidia V100 GPUs and 10GbE network.

Bio:

Hong Xu is an assistant professor in Department of Computer Science, City University of Hong Kong. His research area is computer networking, particularly data center networks and big data systems. He received the B.Eng. degree from The Chinese University of Hong Kong in 2007, and the M.A.Sc. and Ph.D. degrees from University of Toronto in 2009 and 2013, respectively. He was the recipient of an Early Career Scheme Grant from the Hong Kong Research Grants Council in 2014. He received several best paper awards (ACM TURC 2017 (Sigcomm China), IEEE ICNP 2015, etc.). He is a member of ACM and IEEE.


上一条:Enabling the Design of Energy-Efficient and Reliable Processor Chips 下一条:梅宏院士:信息化新阶段与数字经济

关闭

嵌入式与网络计算湖南省重点实验室
版权所有 © 2023 湖南大学