Posts

From Bayes to ChatGPT: Journey For Statisticians To Understand Gen-AI

“It’s genuinely amazing that… these sorts of things can be extracted from a statistical analysis of a large body of text,” science fiction author Ted Chiang remarked in a 2023 Financial Times interview. “But, in his view, that doesn’t make the tools intelligent. Applied statistics is a far more precise descriptor, but no one wants to use that term, because it’s not as sexy.” The visionary writer’s observation cuts directly to the heart of modern AI development—beneath the marketing hype and sensationalized headlines lies a sobering truth: these systems, regardless of their impressive capabilities, remain fundamentally statistical models. And yes folks, that’s what we are going to explore in this article. ...

Beyond the Hype: Blockchain in 2025 - The Silent Revolution

“The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” - Mark Weiser A Personal Journey: From Skepticism to Appreciation I’ll admit it: when I first encountered blockchain years ago, it felt like nothing more than a buzzword. As an economist by training, I was particularly skeptical of cryptocurrency, which seemed misnamed—less a currency and more a speculative digital asset akin to gold or jewels. The hype cycle of 2017-2018 only reinforced my doubts as blockchain was touted as a solution for virtually everything. ...

The Uncanny Valley of AI-Generated Art: Technical Challenges Behind the Artificial Aesthetic

Background In recent times, AI-generated images have captivated the public’s imagination, with platforms like OpenAI’s ChatGPT-4o enabling users to create visuals in distinctive styles, such as those reminiscent of Studio Ghibli. This phenomenon, often termed “Ghiblification,” has sparked both admiration and ethical debates regarding the use of AI in creative processes. Despite the impressive capabilities of these AI systems, many users have noticed that AI-generated images often possess certain “weird” or unnatural characteristics. But what causes these peculiarities? There are indeed interesting points that are not often discussed in AI-generated art controversies. Let me give you the latest example. ...

When AI Hallucinates: Building a Verification Framework for AI-Generated Content

Caption: Medical hallucinations in LLMs organized into 5 main clusters (Kim et al., 2025) Imagine relying on AI for crucial medical information, only to discover that nearly half of what it confidently tells you doesn’t exist at all. Welcome to the unsettling world of AI hallucinations. In the rapidly evolving landscape of AI-assisted information processing, we’re witnessing a curious paradox: the same tools that promise to revolutionize our workflows are simultaneously introducing new challenges to information integrity. This first post in a series introduces Project ACVS (Academic Citation Verification System), which represents a broader approach to verifying AI outputs across multiple domains. ...

How Much VRAM Do You Need for LLMs? A Detailed Guide for Training/Fine-Tuning/Inference

Introduction The emergence of Large Language Models (LLMs) has opened exciting possibilities for many industries and enthusiasts. However, these powerful AI systems require substantial computing resources, particularly GPU memory (VRAM). Whether you’re a software engineer, hobbyist, or data scientist looking to work with these models on your own hardware, understanding these requirements is essential. What Are LLMs and Why Do They Need So Much Memory? Before diving into the technical details, let’s clarify what LLMs are: they’re artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. Think of them as extremely sophisticated prediction engines that can complete sentences, answer questions, write essays, and even code. ...

From Skeptic to Believer: How Diffusion Models Are Reshaping Language Generation

When I first encountered diffusion models back in 2020, I dismissed them as elegant solutions for continuous domains like images but fundamentally incompatible with the discrete nature of language. Like many in the field, I was convinced that autoregressive models (ARMs) were the only sensible architecture for text generation. After all, language is inherently sequential, and the causal attention mechanism in models like GPT seemed perfectly designed for this constraint. ...

DeepSeek R1, A New Chapter in Inference-Time Scaling for Reasoning Models : Reviewing DeepSeek (Part 2)

Disclaimer: Despite efforts to remain fair and balanced, some residual bias may remain in these views. Deep learning has long been driven by scaling—making models larger, training on more data, and increasing computational heft. In recent years, however, researchers have shifted some focus from training-time scaling to inference-time scaling: the idea that allocating additional compute at test time can unlock improved model performance without necessarily enlarging the model itself. In this post, we explore this emerging paradigm, review how OpenAI’s o1-preview model has already influenced the field, and then dive into DeepSeek R1—a Chinese innovation that leverages these principles to enhance reasoning capabilities at a fraction of conventional costs. ...

Incremental Evolution Rather Than Radical Revolution: Reviewing DeepSeek (Part 1)

Introduction DeepSeek has recently generated buzz across the AI community—especially its R1 model, which has stirred both excitement and concern over data transparency and security. From my personal experiments on inference time scaling, good reasoning models highly depend on foundation model’s capabilities. In this respect, DeepSeek‑V3, the heart of R1, deserves careful review. It indeed represents a solid example of how modern LLM research builds cumulatively on past work. Rather than a radical departure, DeepSeek‑V3 is the product of incremental progress—integrating efficient attention mechanisms, advanced mixture‑of‑experts (MoE) designs, multi‑token prediction (MTP), and low‑precision training. ...

Beyond Copying: Understanding the OpenAI-DeepSeek AI Controversy

In recent weeks, the AI community has been abuzz with controversy—and healthy debate—over claims that Chinese competitors are “stealing” OpenAI’s work to rapidly advance their own models. As discussions swirl on intellectual property rights, model replication, and ethical data use, it’s worth taking a step back to assess both the technical and ethical sides of the issue. This post explores what’s really happening, why it matters for innovation, and what it means for the future of AI development. ...

Unraveling Knowldge Distillation in AI/ML Models

Imagine training a colossal neural network—a behemoth capable of diagnosing diseases, driving autonomous vehicles, or generating human-like text—only to find that deploying such an enormous model is like trying to run a marathon in a sports car with a tiny fuel tank. This is where the art and science of model distillation come into play. In this post, we explore how model distillation—originally introduced by Hinton and colleagues—transforms these giants into nimble, efficient models. We’ll discuss Hinton’s key findings, how distillation works for discriminative tasks (like prediction models), and extend our discussion to the realm of generative tasks with large language models (LLMs). We’ll also clarify the differences between distillation and standard supervised fine-tuning (SFT) when synthetic outputs are used. ...