Posts

When AI Hallucinates: Building a Verification Framework for AI-Generated Content

Caption: Medical hallucinations in LLMs organized into 5 main clusters (Kim et al., 2025) Imagine relying on AI for crucial medical information, only to discover that nearly half of what it confidently tells you doesn’t exist at all. Welcome to the unsettling world of AI hallucinations. In the rapidly evolving landscape of AI-assisted information processing, we’re witnessing a curious paradox: the same tools that promise to revolutionize our workflows are simultaneously introducing new challenges to information integrity. This first post in a series introduces Project ACVS (Academic Citation Verification System), which represents a broader approach to verifying AI outputs across multiple domains. ...

How Much VRAM Do You Need for LLMs? A Detailed Guide for Training/Fine-Tuning/Inference

Introduction The emergence of Large Language Models (LLMs) has opened exciting possibilities for many industries and enthusiasts. However, these powerful AI systems require substantial computing resources, particularly GPU memory (VRAM). Whether you’re a software engineer, hobbyist, or data scientist looking to work with these models on your own hardware, understanding these requirements is essential. What Are LLMs and Why Do They Need So Much Memory? Before diving into the technical details, let’s clarify what LLMs are: they’re artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. Think of them as extremely sophisticated prediction engines that can complete sentences, answer questions, write essays, and even code. ...

From Skeptic to Believer: How Diffusion Models Are Reshaping Language Generation

When I first encountered diffusion models back in 2020, I dismissed them as elegant solutions for continuous domains like images but fundamentally incompatible with the discrete nature of language. Like many in the field, I was convinced that autoregressive models (ARMs) were the only sensible architecture for text generation. After all, language is inherently sequential, and the causal attention mechanism in models like GPT seemed perfectly designed for this constraint. ...

DeepSeek R1, A New Chapter in Inference-Time Scaling for Reasoning Models : Reviewing DeepSeek (Part 2)

Disclaimer: Despite efforts to remain fair and balanced, some residual bias may remain in these views. Deep learning has long been driven by scaling—making models larger, training on more data, and increasing computational heft. In recent years, however, researchers have shifted some focus from training-time scaling to inference-time scaling: the idea that allocating additional compute at test time can unlock improved model performance without necessarily enlarging the model itself. In this post, we explore this emerging paradigm, review how OpenAI’s o1-preview model has already influenced the field, and then dive into DeepSeek R1—a Chinese innovation that leverages these principles to enhance reasoning capabilities at a fraction of conventional costs. ...

Incremental Evolution Rather Than Radical Revolution: Reviewing DeepSeek (Part 1)

Introduction DeepSeek has recently generated buzz across the AI community—especially its R1 model, which has stirred both excitement and concern over data transparency and security. From my personal experiments on inference time scaling, good reasoning models highly depend on foundation model’s capabilities. In this respect, DeepSeek‑V3, the heart of R1, deserves careful review. It indeed represents a solid example of how modern LLM research builds cumulatively on past work. Rather than a radical departure, DeepSeek‑V3 is the product of incremental progress—integrating efficient attention mechanisms, advanced mixture‑of‑experts (MoE) designs, multi‑token prediction (MTP), and low‑precision training. ...

Beyond Copying: Understanding the OpenAI-DeepSeek AI Controversy

In recent weeks, the AI community has been abuzz with controversy—and healthy debate—over claims that Chinese competitors are “stealing” OpenAI’s work to rapidly advance their own models. As discussions swirl on intellectual property rights, model replication, and ethical data use, it’s worth taking a step back to assess both the technical and ethical sides of the issue. This post explores what’s really happening, why it matters for innovation, and what it means for the future of AI development. ...

Unraveling Knowldge Distillation in AI/ML Models

Imagine training a colossal neural network—a behemoth capable of diagnosing diseases, driving autonomous vehicles, or generating human-like text—only to find that deploying such an enormous model is like trying to run a marathon in a sports car with a tiny fuel tank. This is where the art and science of model distillation come into play. In this post, we explore how model distillation—originally introduced by Hinton and colleagues—transforms these giants into nimble, efficient models. We’ll discuss Hinton’s key findings, how distillation works for discriminative tasks (like prediction models), and extend our discussion to the realm of generative tasks with large language models (LLMs). We’ll also clarify the differences between distillation and standard supervised fine-tuning (SFT) when synthetic outputs are used. ...

Rethinking the Monty Hall Problem: How to Get Along With Cognitive Bias

Here’s a confession: As an AI researcher well-schooled in math and statistics, I found the Monty Hall Problem mathematically straightforward from day one. It’s a textbook case of conditional probability. But folks, was I in for a surprise when I tried explaining it to others. The Classic Puzzle That Stumps Almost Everyone Let’s start with the basics: You’re faced with three doors. Behind one is a car (that’s your prize), and behind the others are goats. You pick a door - let’s say Door 1. Monty Hall (who knows where everything is) opens one of the other doors, always revealing a goat. Now comes the tricky part: Monty offers you the chance to stick with your original choice or switch to the remaining unopened door. The mathematically correct answer? You should switch - doing so gives you a 2/3 chance of winning, rather than the 1/3 chance if you stick. But try telling that to most people, and you’ll likely get anything from skeptical looks to passionate arguments about why it “must” be 50-50. ...

Implementing Interfaces for an LC-3 Assembler in C

In my previous post, we explored the theoretical foundations of the data structures needed for our LC-3 assembler. Today, we’ll dive into how these abstract concepts translate into actual C code. While many modern languages offer high-level abstractions and built-in data structures, implementing these in C requires us to get our hands dirty with manual memory management and careful pointer manipulation. graph TD A[Header Files] -->|Defines Interfaces| B[Data Structures] B --> C[Memory Management] B --> D[Implementation] C --> E[Allocation] C --> F[Deallocation] D --> G[Core Functions] D --> H[Error Handling] (Caption: The relationship between our interfaces, data structures, and their implementations in C) ...

Data Structures Deep Dive: Building an LC-3 Assembler

graph TD A[File Handler] -->|SourceLine| B[Lexer] B -->|Tokens| C[Parser] C -->|InstructionRecord| D[Encoder] D -->|Machine Code| E[Object File] F[Symbol Table] -->|Label Lookups| C G[Error Handler] -->|Error Collection| B & C & D (Caption: Data flow diagram showing how our data structures interconnect pipeline stages and provide support throughout the process.) Imagine we’re building a translation machine that needs to understand two very different languages: the human-readable assembly code that programmers write, and the binary machine code that computers execute. This is exactly what our LC-3 assembler does, and today we’re going to explore the data structures that make this translation possible. ...