# My Journey Through Cohere Labs ML Summer School: From Beginner to AI Enthusiast

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQH_V9eyrJKsTQ/feedshare-shrink_800/B56Zei2XV1HoAk-/0/1750783850439?e=1755734400&v=beta&t=g9IIhcLf5ObHfIBcF9TEUhkAjAotjxXxGYG7CpNDAuk align="left")

Ever wondered how Netflix knows exactly what show you'll binge next, or how self-driving cars "see" the road? That's the magic of machine learning (ML for short) — a type of AI where computers spot patterns in data all on their own. If that sounds intriguing but a bit mysterious, you're in the right place!

I recently joined the **Cohere Labs Open Science Community ML Summer School**, a super accessible program designed for anyone curious about machine learning. As a complete beginner, I stumbled upon this program and immediately got excited and honestly, it turned out to be one of the best ways I could’ve spent my summer.

Every session was packed with **cool ideas, exciting research paper reviews, and mind-blowing concepts**. Sure, there were moments when I didn’t fully understand everything, but I learned to **embrace the confusion**, trust the process, and just keep exploring. And that made all the difference.

---

## **What is the Cohere Labs ML Summer School?**

The **Cohere Labs Open Science Community ML Summer School** is based on a simple but powerful idea: machine learning should be accessible to anyone, no matter their background, location, or experience level. It’s about learning together, staying curious, and building with others.

This summer, Cohere Labs launched an amazing learning initiative featuring speakers from **INRIA, Meta (FAIR), Google DeepMind, Cohere**, and more. These are some of the leading minds in the field, and they shared insights on topics like **foundation models, retrieval systems, multimodal learning**, and even how AI can be used for **social good**.

The best part? It was completely open and beginner-friendly. Whether you were just getting started or already experimenting with models, there was something for everyone. At the end of the program, every participant received a **digital certificate** recognizing their participation.

But what really made it special was the community. Being part of a global group of learners who were all equally excited to explore ML made the experience even more inspiring. It wasn’t just about learning concepts — it was about growing together.

---

## **A Quick Look at the Sessions**

The summer school was structured around a series of live sessions, each led by experts working at the cutting edge of machine learning. Every session brought something new — from core concepts to advanced techniques and made them surprisingly approachable.

### **Session 1: ML Math Refresher**

**Speaker:** *Katrina Lawrence*  
Applied Mathematician  
**Topic:** *Foundational Math for Machine Learning*

We began the summer school with a back-to-basics session that felt essential -- especially for someone like me coming in without a strong math background.

Katrina walked us through the core mathematical concepts that underpin most machine learning algorithms:

* **Derivatives**
    
* **Vector Calculus**
    
* **Linear Algebra**
    

What I appreciated most was how clearly she explained things. She emphasized that understanding these basics isn’t about memorizing formulas, but about building **intuition**. Whether it’s calculating gradients for optimization or working with matrices in neural networks, this session helped demystify the math behind ML.

If you're looking for an approachable way to refresh your math skills, check out her [YouTube channel, Math Unlocked](https://www.youtube.com/@MathUnlockedWithKatrina).

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQH9wT4oyNdS7A/feedshare-shrink_2048_1536/B56ZfM5Si.HoA0-/0/1751489268078?e=1755734400&v=beta&t=xDch_2Opk8QWZ2-2j9ayTJs80RtJ8sBuWygcUj1dXT4 align="left")

---

### 🔍 **Session 2: Introduction to Embeddings & Retrieval**

**Speaker:** *Nils Reimers*  
VP of AI Search at Cohere  
**Topic:** *How Embeddings Power Modern Search Systems*

In the second session, we dove into the world of **embeddings and neural retrieval** with Nils Reimers, who leads AI Search at Cohere.

This was a big shift from theory to real-world application. Nils explained how transformer-based models (like BERT) are used to **generate embeddings** — dense numerical representations that capture the meaning of text. These embeddings allow models to search and compare information in a far more nuanced way than traditional keyword methods.

Key concepts he covered:

* **Retriever + Ranker architecture**
    
* **Dense vs. Sparse Embeddings**
    
* **Challenges with limited labeled data**
    
* **Context Engineering for better search relevance**
    

What stood out most to me was how **context engineering** and smart architecture choices can make or break a search system. It was a powerful reminder that great models alone aren’t enough, how you use them really matters.

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQFpfQLUuG-b8A/feedshare-shrink_2048_1536/B56ZfM5SiVH8Ao-/0/1751489267026?e=1755734400&v=beta&t=wsmEpRpIsy9LJDb-pnw19yJopGvIjXwOGYTRa_11S9o align="left")

---

### **Session 3: Introduction to Transformers and the Evolution of Large Language Models**

**Speaker:** *Siddhant Gupta*  
NLP Community Lead, Cohere Labs  
**Topic:** *How Transformers Changed the Game in NLP*

On Day 2, we explored the architecture that powers modern AI : **Transformers** in a session led by **Siddhant Gupta**, who leads the NLP community at Cohere Labs.

We started with a brief history of models that came before transformers:

* **RNNs (Recurrent Neural Networks):** Designed for sequential data, but struggled with long-term memory.
    
* **LSTMs (Long Short-Term Memory networks):** An improvement over RNNs with gated memory units for handling longer sequences better.
    

Then came the real highlight — understanding **Transformers**, which completely reshaped how language models work. Instead of relying on sequential processing, transformers introduced **attention mechanisms**, allowing models to process entire sequences in parallel while still maintaining context.

#### Key Components of a Transformer:

* **Embedding Layer**: Converts tokens (words or subwords) into dense vectors.
    
* **Positional Encoding**: Adds information about token order to embeddings.
    
* **Self-Attention & Multi-Head Attention**: Enables the model to focus on relevant words throughout the sequence.
    
* **Feedforward Neural Networks**: Processes the attended information.
    
* **Stacked Layers**: Allow deeper understanding by layering multiple transformer blocks.
    

#### Concepts Covered:

* **Tokenization**: Splitting text into smaller units like words or subwords.
    
* **Word Embeddings**: Vector representations that help the model understand meaning and relationships.
    
* **Attention Mechanisms**:
    
    * *Self-Attention*: Each word attends to others in the same input.
        
    * *Cross-Attention*: Used in models like encoder-decoder architectures.
        

#### Transformer Variants:

We also explored different transformer-based models and how they work:

* **BERT** (Encoder-only): For tasks like classification, sentiment analysis, and question answering.
    
* **GPT** (Decoder-only): Ideal for text generation and conversational AI.
    
* **RAG** (Retrieval-Augmented Generation): Merges retrieval and generation, useful for providing accurate and up-to-date responses.
    

Siddhant did a great job breaking down what can be an overwhelming topic into something digestible. I particularly appreciated how he showed real-world applications like **Google Docs suggestions** and **Gmail’s Smart Compose** using BERT, or how GPT is behind models like ChatGPT.

If you're interested in going through the session slides, you can check them out [here](https://lnkd.in/gjzs_e3E).

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQF5B3bq8ELoDQ/feedshare-shrink_2048_1536/B56ZfRrYcwHQAs-/0/1751569509124?e=1755734400&v=beta&t=2amvi9aRoMq1I7vxvRBEmmvfyXTWSN83tj0xc8_FgO4 align="left")

---

### **Session 3: Scaling Self-Supervised Learning for Vision — An Introduction to DINOv2**

**Speaker:** *Timothée Darcet*  
PhD Researcher, Meta AI (FAIR) & Inria  
**Topic:** *Self-Supervised Learning in Computer Vision with DINOv2*

This session introduced us to the exciting world of **self-supervised learning (SSL)** in computer vision — a method where models learn to understand images without needing manually labeled data. Instead of relying on external annotations, these models generate their own pseudo-labels during training.

Timothée Darcet explained the motivation behind SSL and walked us through key techniques like **contrastive learning** and **masked image modeling**. The highlight was a deep dive into **DINOv2**, a cutting-edge SSL model used for learning high-quality visual representations.

#### Key Takeaways:

* **DINOv2** is trained on a curated dataset of **142 million images** using a mix of loss functions (DINO, iBOT, COLIO).
    
* It outperforms models like **CLIP** in tasks such as segmentation and feature extraction.
    
* Its general-purpose nature makes it suitable for specialized domains, including **medical imaging**.
    
* DINOv2 is particularly strong in feature map quality and interpretability, enabling precise image understanding without labels.
    

This session helped bridge the gap between complex CV models and practical applications, offering a fresh perspective on how vision models are evolving beyond supervised learning.

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQHrvOIQeiyIqQ/feedshare-shrink_2048_1536/B56ZfrcyIFHQAo-/0/1752001885378?e=1755734400&v=beta&t=f59pKwDt7ZDzLcaAzjj0AjJ7UGaY89z0e-r6ll364gc align="left")

---

### **Session 4: A Temperature Check on Web Agents**

**Speaker:** *Lawrence Jang*  
Researcher at Meta  
**Topic:** *Autonomous Web Interaction with Language Models*

With large language models gaining the ability to understand and generate text, the next frontier is getting them to **act** especially on the web. In this session, Lawrence Jang explored the emerging field of **LLM-powered web agents**, which can autonomously navigate websites, click buttons, scroll pages, and even fill out forms using natural language instructions.

#### Highlights:

* **WebArena** was introduced as a benchmark, where **humans achieve 80% task success**, while LLM agents currently achieve only around **14%**, highlighting how early the field still is.
    
* Advanced benchmarks like **VisualWebArena** and **VideoWebArena** extend evaluation to visual and video-based tasks.
    
* **ICAL** is one approach that uses **human feedback** to fine-tune web agents for better task performance.
    
* The session addressed major challenges:
    
    * Following instructions accurately
        
    * Aligning text with visual content
        
    * Memory and long-term planning
        
    * Preventing hallucinations and ensuring safe behavior
        
* We also got a glimpse into practical tools like **LangChain**, and discussions on future directions such as **multi-agent systems**, **visual grounding**, and **ethical considerations**.
    

Together, these two sessions(3 &4) gave us a look into the cutting-edge of AI - - from models that learn to see without supervision to agents that learn to act in the digital world. The possibilities, and the challenges, are both massive and inspiring.

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQHt6CDzddYcHg/feedshare-shrink_2048_1536/B56Zfrcv8VHEAs-/0/1752001876533?e=1755734400&v=beta&t=Uw2Bg6kPltqpje5PleokLhlnEuyLd4NgBFvPgYaR6Rc align="left")

---

### Session 5: Test-Time Scaling Small LMs to o1 Level

**Speaker:** Isha Puri

AI PhD at MIT  
**Date:** July 10, 2025

As large language models reach diminishing returns from scale, Isha Puri presented a compelling direction: achieving high performance at **test-time** without retraining. Her method rooted in **particle-based inference** and **process reward models** emphasizes diversity, balancing exploration and exploitation during decoding.

The results are remarkable: small models (1.5B parameters) were shown to outperform GPT-4o in just four inference rollouts. For 7B models, scaling up to o1-level capabilities took only 32 rollouts.

What stood out was how this technique bypasses the early pruning limitations of greedy or beam search. By unlocking latent capabilities through smarter inference rather than brute-force training, this approach opens the door to **democratizing powerful LLM reasoning at lower cost, latency, and compute.**

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQFeh6vg1d-DbA/feedshare-shrink_2048_1536/B56ZgCP9zhHQAo-/0/1752384404498?e=1756339200&v=beta&t=Q3we6IZ_rUJn2W1NszdysezFXsIXZklda4FjzY5JxmI align="left")

---

### Session 6: Secret Life of Noise — Understanding Diffusion Models

**Speaker:** Gowthami Somepalli

Research Scientist at Adobe Firefly  
**Date:** July 11, 2025

Gowthami walked us through the evolution of **diffusion models** — the engines behind modern generative art tools like Firefly. Beginning with **DDPMs (Denoising Diffusion Probabilistic Models)** and extending to **DDIMs (Deterministic variants)**, the session was a deep dive into how structured noise can be harnessed to produce realistic, diverse outputs.

She clarified how **noise schedules** determine the quality and control of generated content, while also introducing **flow matching**, a deterministic framework offering more direct distribution transformation, potentially bridging the gap between variational autoencoders and diffusion models.

Notably, these models offer:

* **Superior sample quality** over GANs
    
* **Stable training dynamics**
    
* **Mathematical rigor**
    
* **Inference-time flexibility**, making them ideal for creative applications.
    
    ![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQGWP4jurAy_xg/feedshare-shrink_2048_1536/B56ZgCP9ztHYAo-/0/1752384402432?e=1756339200&v=beta&t=Tm-n5sjQBiLRyhfAFZMZsvkBJDfdNewIpUeRczjnJY0 align="left")
    

---

### Session 7: Understanding Transformers via N-gram Statistics

**Speaker:** Timothy Nguyen

AI Researcher at Google DeepMind  
**Date:** July 11, 2025

This session reimagined the transformer’s inner workings not as black boxes, but as **statistical machines**. Timothy Nguyen revealed how **up to 79% of transformer predictions** on the TinyStories dataset could be explained using **optimal N-gram rules** derived from training data.

Key takeaways:

* Low-variance predictions align closely with N-gram patterns
    
* Transformers exhibit **curriculum-like learning**, progressing from simple to complex rules
    
* Introduced a novel, **training-intrinsic metric** to detect overfitting *without needing a validation set*
    

This reframing provides practical tools to better understand **when LLMs memorize, generalize, or hallucinate**, offering a statistically grounded perspective on model interpretability.

🔗 [Research Paper](https://lnkd.in/gXXXkS5m)

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQFegc27BY5aQQ/feedshare-shrink_2048_1536/B56ZgCP9z3GUAo-/0/1752384403131?e=1756339200&v=beta&t=ZzPFnvOrtDGYcEjJVfPi7dtNZeSTSuSZQ71eZv_Uc4g align="left")

---

### **Session 8: Distributed Training in Machine Learning**

**Speaker:** Arthur Douillard  
**Senior Researcher, Google DeepMind**  
**Topic:** Distributed Training Strategies for Large Language Models

In this session, Arthur Douillard took us behind the scenes of what it really takes to train large language models (LLMs). With their enormous size, these models can’t be trained on a single GPU, distributed training is essential. Arthur unpacked the core strategies used in practice today, like Fully Sharded Data Parallelism (FSDP), Tensor and Pipeline Parallelism, and Expert Parallelism.

What stood out was his dive into experimental methods like DiLoCo, SWARM, PowerSGD, and DeMo. These techniques aim to scale LLMs across devices even when they’re not co-located but often at the cost of some accuracy or performance. He also touched on the real-world challenges: GPU hardware failures, communication bottlenecks, and the inherent complexity of coordinating planetary-scale training. While we’re not fully there yet, we’re inching closer to a future where training across global clusters is a reality.

![No alternative text description for this image](https://media.licdn.com/dms/image/v2/D5622AQGByE-iXh_5ng/feedshare-shrink_2048_1536/B56ZgNDeUbHcAo-/0/1752565680453?e=1756339200&v=beta&t=BIlVE_C48HU8qf3XRwGV-eke0ap0NdGtlCuzREdTc_E align="left")

---

### **Session 9: Research Mentorship**

**Speaker:** Sara Hooker  
**Head of Cohere Labs**  
**Topic:** Finding Meaningful Directions in ML Research

Sara Hooker’s mentorship session felt like a compass for anyone early in their ML research journey. She began with a reflection on the evolution of AI research, urging us to think deeply about how and why we choose problems to work on. Instead of chasing incremental papers or buzzwords, she encouraged us to:

* Master a topic deeply and thoroughly
    
* Collaborate openly and generously
    
* Learn by teaching others
    
* Constantly ask: "Is this scientifically meaningful?"
    

Sara also introduced the idea of a “third path” between academia and industry, represented by Cohere Labs and other open science communities. These spaces provide an alternative for those who want to contribute to cutting-edge research without being bound by the formal structures of universities or corporate labs. Her session was as inspiring as it was practical, offering a vision of research that is both rigorous and radically accessible.

---

### **Session 10: ML Open Science Social**

**Speakers:** Madeline Smith & Brittawnya Prince  
**Team: Cohere Labs Operations**  
**Topic:** Building Community Through Open Science

To wrap up the summer school, Cohere Labs hosted a virtual social, an informal yet deeply meaningful session. It was a space for researchers from across the globe to connect, share stories, and brainstorm future ideas together. The event captured the spirit of open science: diverse voices, shared curiosity, and a collective drive to explore the unknown.

More than just a networking event, it felt like a celebration of everything we had learned, unlearned, and reimagined during the program. It was a fitting finale to a summer spent not just learning machine learning but living it as a collaborative, creative, and community-first endeavor.

---

### **Reflections: More Than Just a Summer School**

Looking back, the Cohere Labs ML Summer School wasn’t just a series of lectures — it was a turning point. Coming in with beginner-level knowledge, I walked away not only understanding complex topics (maybe not always but still great learning) like self-supervised learning, distributed training, and tokenization but also feeling part of a vibrant open science community.

What stood out most was the spirit of accessibility. The sessions weren’t about gatekeeping knowledge but they were about opening doors. Each speaker, from leading researchers at Meta and DeepMind to pioneers at Cohere Labs, made the content feel approachable without watering it down.

I also learned that doing machine learning research isn't about knowing everything from the start — it’s about being curious, collaborative, and resilient. Whether it's contributing to open-source projects, diving deeper into topics like explainability or fairness, or just asking better questions, I now feel equipped to take meaningful next steps in my ML journey.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1753292885070/8b129176-89fa-4d39-b623-3701b5df34ef.png align="center")

---

### **What’s Next?**

This summer school planted the seed and now it’s up to me (and all of us who joined) to keep it growing. I’m planning to build hands-on projects, explore open research challenges, and stay connected with the community I’ve found here.

If you’ve ever felt like machine learning was too vast or too complex to dive into - trust me, you’re not alone. But with communities like Cohere Labs and the right mindset, you can absolutely get started.

Let the exploration continue 🚀

### **Explore More & Stay Connected**

If you’re interested in watching the recorded sessions or learning more about the Cohere Labs Open Science Community, you can visit:  
🔗 [https://sites.google.com/cohere.com/coherelabs-community/community-programs/summer-school](https://sites.google.com/cohere.com/coherelabs-community/community-programs/summer-school)

🔗 [https://cohere.com/research](https://cohere.com/research)

They regularly host talks, reading groups, and other open learning initiatives - highly recommended for anyone passionate about ML and open science!

Feel free to connect with me if you’d like to discuss anything from the sessions, share ideas, or collaborate on projects.

📬 **Connect with me on** [**LinkedIn**](https://www.linkedin.com/in/kolasani-venkat/)