Summarizer

Transfer Learning History

The path from convolutional networks dominating image tasks to seeking similar approaches for NLP, culminating in transformers and GPT models

← Back to There Will Be a Scientific Theory of Deep Learning

The evolution of transfer learning transitioned from the 2012 dominance of AlexNet in image recognition to the eventual rise of GPT models, a journey significantly delayed by hardware constraints. While fields like radiology saw an earlier "explosion" through convolutional neural networks, the broader NLP revolution was bottlenecked by anemic GPU memory that made training massive transformers impossible until the early 2020s. This transition eventually required a massive leap in compute power and scale, famously disrupting the consumer market as GPU prices skyrocketed and the belief in large-scale models finally reached a tipping point.

2 comments tagged with this topic

View on HN · Topics
A much earlier major win for deep learning was AlexNet for image recognition in 2012. It dominated the competition and within a couple years it was effectively the only way to do image tasks. I think it was Jeremy Howard who wrote a paper around 2017 wondering when we’d get a transfer learning approach that worked as well for NLP as convnets did for images. The attention paper that year didn’t immediately dominate. The hardware wasn’t good enough and there wasn’t consensus on belief that scale would solve everything. It took like five more years before GPT3 took off and started this current wave. I also think you might be discounting exactly how much compute is used to train these monsters. A single 1ghz processor would take about 100,000,000 years to train something in this class. Even with on the order of 25k GPUs training GPT3 size models takes a couple months. The anemic RAM on GPUs a decade ago (I think we had k80 GPUs with 12GB vs 100’s of GBs on H100/H200 today) and it was actually completely impossible to train a large transformer model prior to the early 2020s. I’m even reminded how much gamers complained in the late 2010s about GPU prices skyrocketing because of ML use.
View on HN · Topics
If you are in the radiology field it started “exploding” much earlier, with CNNs.