EfficientBERT: Effectively trading off model size and accuracy during model compression
In this project I explore the following questions: can we better understand the effects of compression and architecture decisions on model performance? Do these architectural decisions (including model size) or distillation properties dominate these trade-offs?
My work on BERT has been accepted and invited to leading industry conferences including: RayConnect, RaySummit, MLConf, Bay Area NLP Meetup.
Watch the talk
Resources
-
-
Get the trained model checkpoints here
To play around with the SigOpt dashboard and analyze results for yourself, take a look at the experiment
-
EfficientBERT Summary
EfficientBERT complete paper on Nvidia’s devblog
Slides from MLConf