Predicting the parameters of a neural network without training it
* Register (or log in) to the AI4G Neural Network to add this session to your agenda or watch the replay
Reducing barriers for deep learning practitioners to develop neural networks is one step towards democratizing deep learning, by making the technology more accessible to smaller players in the field. This AI for Good Discovery will explore the key themes in Graham Taylor’s research around removing barriers for deep learning practitioners who lack the background or resources to work with cutting-edge models that require advanced forms of hardware parallelism. Collaborating with Facebook AI Research (now Meta), his team developed a technique to initialize diverse neural network architectures using a “meta-model”. This research challenges the long-held assumption that gradient-based optimizers are required to train deep neural networks.
Astonishingly, the meta-model can predict parameters for almost any neural network in just one forward pass, achieving ~60% accuracy on the popular CIFAR-10 dataset without any training. Moreover, while the meta-model was training, it did not observe any network close to the ResNet-50 whose ~25 million parameters it predicted. Like the team’s 2020 work to reduce the computational requirements of GANs, this talk will highlight the approach which democratizes deep learning by making the technology accessible to smaller players in the field, such as startup companies and not-for-profits. The work appeared at NeurIPS 2021 and was reported on by Anil Ananthaswamy for Quanta Magazine.