Posts

DeepSeek R1, A New Chapter in Inference-Time Scaling for Reasoning Models : Reviewing DeepSeek (Part 2)

Disclaimer: Despite efforts to remain fair and balanced, some residual bias may remain in these views. Deep learning has long been driven by scaling—making models larger, training on more data, and increasing computational heft. In recent years, however, researchers have shifted some focus from training-time scaling to inference-time scaling: the idea that allocating additional compute at test time can unlock improved model performance without necessarily enlarging the model itself. In this post, we explore this emerging paradigm, review how OpenAI’s o1-preview model has already influenced the field, and then dive into DeepSeek R1—a Chinese innovation that leverages these principles to enhance reasoning capabilities at a fraction of conventional costs. ...

Incremental Evolution Rather Than Radical Revolution: Reviewing DeepSeek (Part 1)

Introduction DeepSeek has recently generated buzz across the AI community—especially its R1 model, which has stirred both excitement and concern over data transparency and security. From my personal experiments on inference time scaling, good reasoning models highly depend on foundation model’s capabilities. In this respect, DeepSeek‑V3, the heart of R1, deserves careful review. It indeed represents a solid example of how modern LLM research builds cumulatively on past work. Rather than a radical departure, DeepSeek‑V3 is the product of incremental progress—integrating efficient attention mechanisms, advanced mixture‑of‑experts (MoE) designs, multi‑token prediction (MTP), and low‑precision training. ...

Beyond Copying: Understanding the OpenAI-DeepSeek AI Controversy

In recent weeks, the AI community has been abuzz with controversy—and healthy debate—over claims that Chinese competitors are “stealing” OpenAI’s work to rapidly advance their own models. As discussions swirl on intellectual property rights, model replication, and ethical data use, it’s worth taking a step back to assess both the technical and ethical sides of the issue. This post explores what’s really happening, why it matters for innovation, and what it means for the future of AI development. ...

Unraveling Knowldge Distillation in AI/ML Models

Imagine training a colossal neural network—a behemoth capable of diagnosing diseases, driving autonomous vehicles, or generating human-like text—only to find that deploying such an enormous model is like trying to run a marathon in a sports car with a tiny fuel tank. This is where the art and science of model distillation come into play. In this post, we explore how model distillation—originally introduced by Hinton and colleagues—transforms these giants into nimble, efficient models. We’ll discuss Hinton’s key findings, how distillation works for discriminative tasks (like prediction models), and extend our discussion to the realm of generative tasks with large language models (LLMs). We’ll also clarify the differences between distillation and standard supervised fine-tuning (SFT) when synthetic outputs are used. ...

Rethinking the Monty Hall Problem: How to Get Along With Cognitive Bias

Here’s a confession: As an AI researcher well-schooled in math and statistics, I found the Monty Hall Problem mathematically straightforward from day one. It’s a textbook case of conditional probability. But folks, was I in for a surprise when I tried explaining it to others. The Classic Puzzle That Stumps Almost Everyone Let’s start with the basics: You’re faced with three doors. Behind one is a car (that’s your prize), and behind the others are goats. You pick a door - let’s say Door 1. Monty Hall (who knows where everything is) opens one of the other doors, always revealing a goat. Now comes the tricky part: Monty offers you the chance to stick with your original choice or switch to the remaining unopened door. The mathematically correct answer? You should switch - doing so gives you a 2/3 chance of winning, rather than the 1/3 chance if you stick. But try telling that to most people, and you’ll likely get anything from skeptical looks to passionate arguments about why it “must” be 50-50. ...

Implementing Interfaces for an LC-3 Assembler in C

In my previous post, we explored the theoretical foundations of the data structures needed for our LC-3 assembler. Today, we’ll dive into how these abstract concepts translate into actual C code. While many modern languages offer high-level abstractions and built-in data structures, implementing these in C requires us to get our hands dirty with manual memory management and careful pointer manipulation. graph TD A[Header Files] -->|Defines Interfaces| B[Data Structures] B --> C[Memory Management] B --> D[Implementation] C --> E[Allocation] C --> F[Deallocation] D --> G[Core Functions] D --> H[Error Handling] (Caption: The relationship between our interfaces, data structures, and their implementations in C) ...

Data Structures Deep Dive: Building an LC-3 Assembler

graph TD A[File Handler] -->|SourceLine| B[Lexer] B -->|Tokens| C[Parser] C -->|InstructionRecord| D[Encoder] D -->|Machine Code| E[Object File] F[Symbol Table] -->|Label Lookups| C G[Error Handler] -->|Error Collection| B & C & D (Caption: Data flow diagram showing how our data structures interconnect pipeline stages and provide support throughout the process.) Imagine we’re building a translation machine that needs to understand two very different languages: the human-readable assembly code that programmers write, and the binary machine code that computers execute. This is exactly what our LC-3 assembler does, and today we’re going to explore the data structures that make this translation possible. ...

Architectural Choices in Building LC-3 Assembler

flowchart LR %% Two-Pass Assembler with Modular Pipeline subgraph Pass1[First Pass] direction TB A1[(Source File)] --> B1[Lexer] B1 --> C1[Parser: Label Collection] C1 --> D1[(Symbol Table)] end subgraph Pass2[Second Pass] direction TB A2[(Symbol Table)] --> B2[Lexer] B2 --> C2[Parser: Build Instructions] C2 --> D2[Encoder] D2 --> E2[Writer] E2 --> F2[(Machine Code Output)] end %% Layout passes side by side Pass1 --> Pass2 (Caption: General Structure of Two-pass and Modular Assembler Design) ...

Diving Deeper into LC-3: From Opcodes to Machine Code

Welcome back. Continuing from our last post, today, we’re going to peek behind the curtain and see how the Little Computer 3 (LC-3) actually “understands” our instructions. Imagine you’re writing a letter to someone who only reads binary—you’d need a very specific format for them to understand your message. That’s exactly what we’re doing when we write LC-3 assembly code: we’re writing human-readable instructions that need to be translated into a language of 1s and 0s that the computer understands. ...

Why I'm Learning C and LC-3 (Despite Being an AI/ML Person)

Merry Christmas, everyone! Between juggling a full-time job as a data scientist and pursuing my computer science graduate degree part-time, I didn’t think I’d have much bandwidth for extracurricular projects. But when the holidays rolled around, I found myself itching to step away from data engineering and machine learning modeling—and dive headfirst into the world of low-level programming. Surprising? Maybe. But it’s been a thrilling journey so far. Stepping Out of the AI/ML Bubble I love AI/ML. But one thing I’ve realized is that constantly working at a high level (think PyTorch, scikit-learn, XGBoost, etc.) can sometimes obscure what’s really happening under the hood. Sure, I can engineer data, develop models, and conduct data experiments all day long. But when something breaks at a deep level—like in a CUDA kernel, or even just the memory management within my Python environment—I’m reminded there’s a whole world of “lower-level” knowledge I’ve yet to fully explore. ...