Photo by Steve Johnson on Unsplash
Notes for "Making AI Accessible" by Andrej Kaparthy - Sequoia AI Ascent
Introduction
Andrej Karpathy, a leading AI researcher, recently gave a talk titled "Making AI Accessible." In his talk, Andrej discussed the current trends and challenges in developing large language models (LLMs) and the opportunities available for new founders in this space.
The Race to Build a LLM OS
Everyone, including OpenAI, is trying to build an LLM operating system (OS) with multimodal capabilities. The goal is to create default apps and a vibrant ecosystem of apps tuned to specific tools. Andrej emphasized that the future of AI lies in building an LLM OS that can handle a wide range of tasks and modalities.
Challenges in Fine-tuning Models
Fine-tuning models is a significant challenge in AI. As a model is fine-tuned, it may start regressing in other aspects. The goal is to add knowledge to the model, which requires a dataset and a training loop. However, the more a model is fine-tuned, the more it may specialize in specific tasks, making it less effective in other areas.
Scaling
Scaling is crucial for training models, but it's very difficult due to the scarcity of talent. Moreover, current GPUs were not designed for thousands of workloads, and expertise, algorithm, and data aspects are also needed besides just scale. Karpathy emphasized that scaling is not just about increasing the size of the model, but also about improving the quality of the data and the algorithms used to train the model.
Modalities of AI Models
Andrej discussed the distinct split between diffusion models and autoregressive models, both of which are ways of presenting probability distributions. Diffusion models generate data by simulating a random process, while autoregressive models generate data by predicting the next value in a sequence. Karpathy suggested a potential space to unify these modalities for a best-of-both-worlds solution.
Energetic Efficiency
There's a massive gap in the energetic efficiency of running these models. Current computers are not ideal for these workloads. Karpathy suggested adapting computer architecture to new data workflows, leveraging precision, utilizing sparsity (as the brain is not always activated), and rethinking the Von Neumann architecture, which is off by a factor of 1000 - million for the movement of data within computers.
Conclusion
In summary, Karpathy's talk highlighted the opportunities and challenges in the AI space, emphasizing the need for new founders to address issues like scaling, model fine-tuning, and energetic efficiency. He encouraged new founders to explore the potential of unifying diffusion and autoregressive models and to consider the energetic efficiency of their AI systems. By addressing these challenges, new founders can help make AI more accessible and useful to a wider range of people.