Scaling towards AGI
Abstract: In this talk, I will take you on a tour of large language models, tracing their evolution from Recurrent Neural Networks (RNNs) to the Transformer architecture. We will explore how Transformers elegantly sidestep the vanishing and exploding gradient issues that plagued RNNs. I will introduce neural scaling laws—empirical relationships reminiscent of scaling behaviors common in physics—that predict how model performance improves with increased computational investment.