How I Leveraged C Learning to Understand Rust Better

When I first encountered Rust after years of Python experience, I thought I understood it. “Ok, Rust’s way is interesting,” I told myself, nodding along to the Book’s explanations of ownership. The compiler errors were frustrating, but I got my code working eventually. I believed I had grasped the concepts. I was wrong. It wasn’t until I stepped away to learn C and systems programming that I realized how superficial my understanding had been. Only when I could visualize memory operations – seeing exactly what happened in the stack, heap, and global memory – did Rust’s ownership system transform from a set of arbitrary rules into a coherent mental model. ...

May 17, 2025 · 14 min · Sae-Hwan Park

The Linux-Windows Bridge I Wish I'd Discovered Years Ago

I built my own Linux environment on Windows using WSL2 (and you should, too) As someone deeply immersed in AI and ML development, I’ve often found myself juggling multiple computing environments. My workday typically involves switching between a Windows laptop at work, my personal MacStudio (along with one Windows desktop) at home, and SSH connections to remote computing clusters for intensive training jobs or working on data files that should not move (ruled by DUA). This fragmentation created friction in my workflow that I was eager to solve. ...

May 9, 2025 · 16 min · Sae-Hwan Park

The Quest for Heterogeneity: Understanding Conditional Average Treatment Effects (CATE)

We unmask heterogeneity, finding out how CATE learners help target interventions to those who will benefit most The Journey Beyond Average Effects In the vast landscape of causal inference, we’ve long relied on a simple compass: the Average Treatment Effect (ATE). Like ancient mariners navigating by a single star, researchers across disciplines have used this average to guide important decisions. But what if I told you that this single metric—this lone star—only reveals a fraction of the story? ...

May 3, 2025 · 23 min · Sae-Hwan Park

The Dimensional Odyssey: Navigating the Manifolds of t-SNE and UMAP

Prologue: The Curse of Dimensionality Imagine yourself as an explorer in a vast, multidimensional wilderness. Each step you take propels you along one of hundreds, perhaps thousands of different dimensions. The terrain stretches beyond what your mind can comprehend – a hyperdimensional landscape where traditional notions of distance and proximity lose their intuitive meaning. This is the world of high-dimensional data, a realm where our human perceptual limitations become painfully apparent. ...

April 18, 2025 · 25 min · Sae-Hwan Park

From Bayes to ChatGPT: Journey For Statisticians To Understand Gen-AI

“It’s genuinely amazing that… these sorts of things can be extracted from a statistical analysis of a large body of text,” science fiction author Ted Chiang remarked in a 2023 Financial Times interview. “But, in his view, that doesn’t make the tools intelligent. Applied statistics is a far more precise descriptor, but no one wants to use that term, because it’s not as sexy.” The visionary writer’s observation cuts directly to the heart of modern AI development—beneath the marketing hype and sensationalized headlines lies a sobering truth: these systems, regardless of their impressive capabilities, remain fundamentally statistical models. And yes folks, that’s what we are going to explore in this article. ...

April 11, 2025 · 16 min · Sae-Hwan Park

Beyond the Hype: Blockchain in 2025 - The Silent Revolution

“The most profound technologies are those that disappear. They weave themselves into the fabric of everyday life until they are indistinguishable from it.” - Mark Weiser A Personal Journey: From Skepticism to Appreciation I’ll admit it: when I first encountered blockchain years ago, it felt like nothing more than a buzzword. As an economist by training, I was particularly skeptical of cryptocurrency, which seemed misnamed—less a currency and more a speculative digital asset akin to gold or jewels. The hype cycle of 2017-2018 only reinforced my doubts as blockchain was touted as a solution for virtually everything. ...

April 5, 2025 · 16 min · Sae-Hwan Park

The Uncanny Valley of AI-Generated Art: Technical Challenges Behind the Artificial Aesthetic

Background In recent times, AI-generated images have captivated the public’s imagination, with platforms like OpenAI’s ChatGPT-4o enabling users to create visuals in distinctive styles, such as those reminiscent of Studio Ghibli. This phenomenon, often termed “Ghiblification,” has sparked both admiration and ethical debates regarding the use of AI in creative processes. Despite the impressive capabilities of these AI systems, many users have noticed that AI-generated images often possess certain “weird” or unnatural characteristics. But what causes these peculiarities? There are indeed interesting points that are not often discussed in AI-generated art controversies. Let me give you the latest example. ...

March 29, 2025 · 7 min · Sae-Hwan Park

When AI Hallucinates: Building a Verification Framework for AI-Generated Content

Caption: Medical hallucinations in LLMs organized into 5 main clusters (Kim et al., 2025) Imagine relying on AI for crucial medical information, only to discover that nearly half of what it confidently tells you doesn’t exist at all. Welcome to the unsettling world of AI hallucinations. In the rapidly evolving landscape of AI-assisted information processing, we’re witnessing a curious paradox: the same tools that promise to revolutionize our workflows are simultaneously introducing new challenges to information integrity. This first post in a series introduces Project ACVS (Academic Citation Verification System), which represents a broader approach to verifying AI outputs across multiple domains. ...

March 21, 2025 · 9 min · Sae-Hwan Park

How Much VRAM Do You Need for LLMs? A Detailed Guide for Training/Fine-Tuning/Inference

Introduction The emergence of Large Language Models (LLMs) has opened exciting possibilities for many industries and enthusiasts. However, these powerful AI systems require substantial computing resources, particularly GPU memory (VRAM). Whether you’re a software engineer, hobbyist, or data scientist looking to work with these models on your own hardware, understanding these requirements is essential. What Are LLMs and Why Do They Need So Much Memory? Before diving into the technical details, let’s clarify what LLMs are: they’re artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. Think of them as extremely sophisticated prediction engines that can complete sentences, answer questions, write essays, and even code. ...

March 14, 2025 · 8 min · Sae-Hwan Park

From Skeptic to Believer: How Diffusion Models Are Reshaping Language Generation

When I first encountered diffusion models back in 2020, I dismissed them as elegant solutions for continuous domains like images but fundamentally incompatible with the discrete nature of language. Like many in the field, I was convinced that autoregressive models (ARMs) were the only sensible architecture for text generation. After all, language is inherently sequential, and the causal attention mechanism in models like GPT seemed perfectly designed for this constraint. ...

March 7, 2025 · 7 min · Sae-Hwan Park