Neural Networks and the Lottery Ticket Hypothesis

2019-05-07 - 3 minutes read

Deep neural networks, computer systems loosely modeled after the human brain, are found in the majority of AI-based products. As you would expect, neural networks aren’t simple systems — they tend to be very large, require huge datasets and expensive equipment, and can take multiple days to train. In fact, traditional neural networks can be prohibitively expensive to train.

Michael Carbin, an MIT Assistant Professor, and Jonathan Frankle, a PhD student and IPRI team member, responded to this issue in a paper titled The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In this MIT CSAIL project, the researchers detail how these large neural nets contain smaller “subnetworks” that are up to 10 times smaller than the full network. The subnetworks can learn just as well, making equally precise predictions sometimes faster than the full neural networks. This work may also have implications for transfer learning.

Image showing a neural network. Image in the public domain in Wikimedia Commons.

Understanding the Lottery Ticket Hypothesis

To explain how this works, the researchers compare traditional deep learning methods to a lottery. In their analogy, training large neural networks is akin to buying every possible lottery ticket to guarantee a win, even when only a few tickets are actually winners. However, training subnetworks would be like buying only the winning tickets. As Carbin states, the “remaining science is to figure how to identify the winning tickets without seeing the winning numbers first.”

Carbin and Frankle tested their lottery ticket hypothesis and the existence of subnetworks by performing a process called pruning, which involves eliminating unneeded connections from trained networks to fit them on low-power devices. The pruned connections have the lowest network prioritization or weight.

As Adam Conner-Simons states in MIT News, “[t]heir key innovation was the idea that connections that were pruned after the network was trained might never have been necessary at all.”

Researchers have not yet determined how to find subnetworks without building a full neural network and using pruning, but finding out how to move directly to the subnetwork step would save time and money. It would also potentially enable individual programmers and not just large companies to create meaningful models. However, “[u]nderstanding the ‘lottery ticket hypothesis’ is likely to keep researchers busy for years to come,” notes Assistant Professor Daniel Roy in MIT News.

As for the researchers themselves, next on the agenda is determining why specific subnetworks are good at learning and how to efficiently find them.

Read more about this work from the following news stories: