The path from convolutional networks dominating image tasks to seeking similar approaches for NLP, culminating in transformers and GPT models
← Back to There Will Be a Scientific Theory of Deep Learning
The evolution of transfer learning transitioned from the 2012 dominance of AlexNet in image recognition to the eventual rise of GPT models, a journey significantly delayed by hardware constraints. While fields like radiology saw an earlier "explosion" through convolutional neural networks, the broader NLP revolution was bottlenecked by anemic GPU memory that made training massive transformers impossible until the early 2020s. This transition eventually required a massive leap in compute power and scale, famously disrupting the consumer market as GPU prices skyrocketed and the belief in large-scale models finally reached a tipping point.
2 comments tagged with this topic