Fine-Tuning Models for Next-Level AI

In this talk, Roya Kandalan, who holds a PhD in electrical engineering and currently works as a senior research scientist with a focus on AI, starts her presentation by discussing the history and evolution of AI. Before generative AI, the focus was on predictive AI, which required feature engineering and building channels and feature maps to identify entities in images. A significant breakthrough came with convolutional neural networks in 2012, which automated the process of identifying objects like cats and dogs in images. Generative AI has since progressed to using large, complex models and vast data sets to not only identify objects, but also generate new content.

Kandalan highlights the role of large language models (LLMs) in generative AI. These models, which have many parameters and are exposed to extensive data, predict the next word in a sequence based on probabilities. For instance, in the phrase "it's raining cats and," the model might predict "dogs" with high confidence. LLMs have revolutionized tasks like sentiment analysis, translation, and classification, previously handled by natural language processing.

The ability of LLMs to handle multiple modalities is a key advancement. They can transform various types of data, such as images and videos, into numeric representations and integrate them into a latent space. This allows for a comprehensive understanding of concepts like the word "Paris," which encompasses its location, images, and text representation. Kandalan mentions Andrej Karpathy's work on utilizing recurrent neural networks to combine numeric values from images in a latent space for enhanced image captioning.

Kandalan discusses the capabilities of the Gemini models, which offer different levels of quality based on their size. The latest Gemini models can handle larger contexts, allowing for the upload of extensive data, such as two hours of video or 60,000 lines of code. This capability is crucial for making informed decisions with large data sets.

She emphasizes the importance of the 2017 paper "Attention Is All You Need" by Google, which introduced the transformer model. This model provides a mathematical representation of attention mechanisms, enabling the efficient handling of large contexts with current computational resources.

Kandalan also explains the concept of fine-tuning models to improve their performance on specific tasks. Fine-tuning involves training a pre-existing model with new examples to enhance its quality and robustness for a particular task. She compares this process to consulting a specialist doctor for specific medical issues. Fine-tuning can range from adjusting all parameters, to providing a few examples without altering the model's structure, each with its trade-offs in quality, cost, and computational requirements.

Near the end of the talk, Kandalan transitions to a live demo using AI Studio, a web interface for rapid prototyping with Gemini APIs. AI Studio allows users to experiment with models and generate code in various languages. She demonstrates how to fine-tune a model by importing a data set and training it to generate specific content. She also introduces Vertex AI, a more advanced platform for fine-tuning models with additional settings and larger data sets.

She concludes by discussing the practical applications of different Gemini models based on quality, speed, cost, and control. Roya encourages selecting a model that aligns with specific project needs and emphasizes the ongoing advancements in AI technology.

Kandalan's presentation underscores the transformative potential of generative AI, the importance of fine-tuning models for specialized tasks, and the accessibility of AI tools for rapid prototyping and development. Her insights provide a comprehensive understanding of the current state and future possibilities of AI technology.