Where our thread pool server meets its match, and I discover that sometimes the best way to handle thousands of connections is to pretend threads don’t exist.
(Source: https://berb.github.io/diploma-thesis/community/042_serverarch.html)
The Tantalizing Promise
Fresh from the victory of my thread pool implementation, I felt invincible. My server could handle hundreds of concurrent connections with predictable resource usage. Thread creation was under control, context switching was manageable, and memory consumption was reasonable.
But then I stumbled across a claim that seemed almost too good to be true:
“Handle 10,000+ concurrent connections with a single thread using async/await.”
Ten thousand connections? With one thread? My engineering skepticism kicked in immediately. This sounded like the kind of marketing hyperbole that promises “unlimited scalability” while quietly ignoring the laws of physics.
Yet the numbers kept appearing. Blog posts showing async servers handling 100x more connections than threaded equivalents. Benchmarks demonstrating dramatic memory savings. Real production systems serving millions of users with just a handful of async tasks.
Disclaimer: Like previous episodes, I’ve dramatized certain moments of my learning journey for narrative effect. The async rabbit hole was indeed deep, but the actual implementation was more methodical than the emotional rollercoaster I describe below. Also note that I intentionally added some sleeping to clients to emulate costly operation.
My curiosity was piqued, but more importantly – my thread pool had hit a wall I hadn’t anticipated.
The Thread Pool’s Achilles Heel
Before diving into async, I needed to understand my thread pool’s limitations. Time for stress testing beyond my previous casual experiments. I conducted a systematic load test crafting shell scripts.
The results were… educational:
|
|
The thread pool was hitting a scaling wall. Beyond 1,500 concurrent connections, performance degraded rapidly. Not because of the thread pool itself, but because of fundamental OS-level bottlenecks:
- Each thread still consumed stack space (2MB+ per worker)
- Context switching overhead increased with connection count
- File descriptor limits started constraining connections
- Network buffer memory scaled linearly with connection count
The Hospital Emergency Room Reality
Working with healthcare data, I recognized this pattern immediately. It’s like an emergency room during a crisis – even with optimized staffing (thread pool), there are physical limits:
- Examination rooms (file descriptors) are finite
- Medical equipment (memory buffers) can only be allocated so far
- Staff coordination (context switching) becomes chaotic beyond a certain patient load
The thread pool had solved the “unlimited thread creation” problem, but it hadn’t solved the fundamental resource scaling problem.
Enter the Event Loop: A Different Mental Model
This is where async/await promised something radical: what if we could handle thousands of connections without creating thousands of workers?
But first, I needed to understand what “event-driven” actually meant.
Threading vs Event-Driven: The Cognitive Shift
My mental model for threading was straightforward:
Each connection gets its own execution context. When a thread waits for network I/O, the entire thread blocks. The OS context-switches to other threads, but the blocked thread consumes resources while doing nothing productive.
Event-driven programming flips this model completely:
One thread, many tasks. When a task needs to wait for I/O, it doesn’t block the thread – it yields control back to the event loop. The loop can then work on other tasks that are ready to make progress.
The Restaurant Revelation
The difference became clear through a restaurant analogy:
Threading Model: Like a restaurant where each waiter is assigned to exactly one table for the entire meal. When customers are deciding what to order, the waiter stands idle, waiting. During busy periods, you need as many waiters as you have occupied tables.
Event-Driven Model: Like a restaurant where waiters handle multiple tables dynamically. When Table 5 is deciding what to order, the waiter takes orders from Table 8, delivers food to Table 3, and checks on Table 12. One waiter can efficiently serve many tables by never staying idle when there’s work to be done elsewhere.
The Key Insight: I/O is the Bottleneck
Network programming is fundamentally I/O bound. In my handshake protocol:
- Wait for client to send
HELLO X
← I/O bound - Process sequence number ← CPU bound (microseconds)
- Wait for network buffer to accept response ← I/O bound
- Wait for client to send
HELLO Z
← I/O bound - Process validation ← CPU bound (microseconds)
The actual CPU work was tiny – maybe 0.1% of the total time. The other 99.9% was waiting for network operations. Threads were spending their lives waiting.
Event-driven programming says: “Why waste thread resources on waiting? Let’s do useful work instead.”
First Contact with Tokio
Armed with my new mental model, I was ready to dive into Rust’s async ecosystem. But first, I had to choose a runtime.
Rust’s async
/await
syntax is built into the language, but it requires a runtime to execute async tasks. Think of it like this: Rust provides the grammar for writing async code, but you need an execution engine to run it.
The dominant choice is Tokio – Rust’s most mature async runtime. Adding it to my project felt like stepping into a parallel universe:
|
|
That "full"
feature set was my first hint that async programming comes with complexity overhead. Tokio includes:
- Task scheduler
- Async I/O drivers (TCP, UDP, files)
- Timers and timeouts
- Multi-threaded work stealing
- Async-aware synchronization primitives
It’s essentially an entire operating system for async tasks.
The #[tokio::main]
Magic
My first async program looked deceptively simple:
|
|
But that #[tokio::main]
attribute was doing heavy lifting behind the scenes. It’s roughly equivalent to:
|
|
The macro creates a Tokio runtime, starts the event loop, and runs my async main function to completion. I was no longer writing a traditional program – I was writing a collection of async tasks that would be orchestrated by the Tokio scheduler.
The Async Transformation Journey
Converting my threaded server to async wasn’t just a matter of adding .await
everywhere. It required rethinking the entire control flow.
Step 1: The Basic Pattern Translation
My threaded main loop:
|
|
Became:
|
|
The pattern was similar, but the semantics were completely different:
listener.accept()
was now non-blockingtokio::spawn
created a lightweight task, not an OS thread- The entire function needed to be
async
Step 2: The Await Infection
This is where things got interesting. As soon as I made one function async
, the change propagated through my entire codebase like a virus:
|
|
Async is contagious. Once you start doing async I/O, every function in the call stack must be async
and use .await
. There’s no mixing async and sync I/O in the same execution path.
Step 3: The Borrowing Nightmare
But my real education in async Rust came when I hit the borrowing issues. Consider this seemingly innocent code:
|
|
The compiler hit me with this:
|
|
What? The error message was cryptic, but the issue was fundamental: async functions can be suspended and resumed. When an .await
point is reached, the entire function state (including local variables) might be stored and restored later.
The borrow checker was protecting me from a subtle bug: what if message
(which borrowed from buffer
) outlived the function suspension point?
Step 4: Ownership Lessons in Async Context
The solution required thinking differently about data ownership in async contexts:
|
|
The .to_string()
call created an owned copy of the data, eliminating the borrowing dependency. This was my first lesson in async ownership patterns: when in doubt, prefer owned data over borrowed data across .await
boundaries.
The Error Handling Evolution
Async programming introduced new complexity to error handling – not just because of syntax, but because of timeout management and error composition patterns.
The Timeout Reality
In threaded code, I could set socket timeouts and forget about them:
|
|
In async code, timeouts required explicit orchestration:
|
|
That double question mark (??
) was my introduction to nested error handling:
- First
?
unwraps the timeout result (Result<Result<T, E1>, Elapsed>
) - Second
?
unwraps the actual I/O result
Error Composition Patterns
This nested structure led me to discover several powerful error handling patterns in async contexts:
Pattern 1: Timeout Composition
|
|
Pattern 2: Hierarchical Timeouts
|
|
Pattern 3: Error Context Preservation
|
|
The Async Error Propagation Challenge
One subtle challenge I discovered was error propagation through spawned tasks:
|
|
This composability was powerful – I could set timeouts at any granularity, transform errors to preserve context, and handle failures gracefully across complex async operations.
Performance Testing: The Async Advantage
With my async server implemented, the moment of truth arrived. Time to see if the hype was real.
Comparative Results
I ran systematic load tests against all three implementations:
Implementation | Max Concurrent | Memory Usage | CPU Usage | Success Rate |
---|---|---|---|---|
Single-threaded | ~50 | 12MB | 100% (blocking) | 100% |
Thread Pool (8) | 1,500 | 468MB | 55% | 85% |
Async/Tokio | 5,000+ | 89MB | 25% | 99.8% |
The async numbers were stunning:
- 5x more concurrent connections than the thread pool
- 80% less memory usage than the thread pool
- Lower CPU utilization despite higher throughput
- Better success rate under high load
Understanding the Victory
The memory difference revealed the fundamental architectural advantage:
Thread Pool: 8 threads × 2MB stack + connection buffers = ~468MB at 1,500 connections
Async: Single thread + task overhead (~2KB per task) = ~89MB at 5,000 connections
Each async task consumed roughly 1,000x less memory than a thread. System call analysis revealed the CPU efficiency came from dramatically fewer context switches and more efficient epoll
usage.
Scaling Beyond Expectations
Pushing further, the async server handled 15,000 concurrent connections on my laptop – 10x more than the thread pool’s practical limit. Even at that scale, it consumed less memory than the thread pool at 1,500 connections.
Eventually, I hit different bottlenecks: file descriptor limits, network bandwidth, and OS scheduler overhead. But these were resource limits, not architectural limits. The async model had eliminated the artificial constraints of thread-based concurrency.
Scaling Beyond Expectations
Emboldened by the initial results, I pushed the async server to its limits:
|
|
The results were eye-opening:
Concurrent Clients | Memory | CPU | Avg Response Time | Success Rate |
---|---|---|---|---|
1,000 | 45MB | 8% | 12ms | 100% |
2,000 | 58MB | 15% | 18ms | 100% |
5,000 | 89MB | 25% | 28ms | 99.8% |
10,000 | 156MB | 45% | 45ms | 98.5% |
15,000 | 234MB | 65% | 78ms | 95.2% |
The async server handled 15,000 concurrent connections on my laptop – 10x more than the thread pool’s practical limit. Even at that scale, it consumed less memory than the thread pool at 1,500 connections.
The Bottleneck Evolution
Eventually, I hit new bottlenecks, but they were different bottlenecks:
- File descriptor limits: Even async tasks need file descriptors for sockets
- Network bandwidth: The physical network interface became the constraint
- Memory bandwidth: Copying data between user/kernel space for thousands of connections
- OS scheduler overhead: Even Tokio’s scheduler has limits
But these were resource limits, not architectural limits. The async model had eliminated the artificial constraints of thread-based concurrency.
Production Considerations: The Hidden Complexity
As I basked in the performance victory, I started encountering the hidden complexity of production async programming.
The Blocking Function Problem
One seemingly innocent change broke everything:
|
|
Suddenly, my server’s performance dropped by 90%. What happened?
std::fs::write
is a blocking operation. In async code, blocking operations block the entire event loop. While one task was writing to the file, no other tasks could make progress. My 5,000-connection server was reduced to essentially single-threaded performance.
The solution required async-aware alternatives:
|
|
Every I/O operation in async code must be async-aware. This is both a strength (explicit asynchrony) and a complexity burden (can’t mix sync/async easily).
The Send + Sync Requirement
Another stumbling block came from error types:
|
|
The error message was intimidating:
|
|
Tokio’s scheduler can move tasks between threads, so all data in async tasks must be Send + Sync
. This forced me to learn about async-safe data structures:
|
|
This was my introduction to async-aware concurrency primitives – a whole new layer of complexity beyond basic async/await.
When Async Isn’t the Answer
After all this async evangelism, I needed to acknowledge async programming’s limitations and trade-offs.
CPU-Bound Tasks: Async’s Achilles Heel
I tested my async server with a CPU-intensive variation:
|
|
Performance collapsed immediately:
Implementation | 100 Concurrent CPU Tasks | CPU Usage | Responsiveness |
---|---|---|---|
Thread Pool | 8.5 seconds | 100% (8 cores) | Good |
Async | 42.3 seconds | 100% (1 core) | Terrible |
Async tasks share a single thread by default. When one task does CPU-intensive work, it starves all other tasks. The cooperative scheduling model assumes tasks yield frequently through .await
points.
The Healthcare Data Processing Reality
In healthcare contexts, this distinction matters enormously:
- Async excels: Processing thousands of concurrent claim submissions with network I/O
- Threads excel: Parallel analysis of large datasets with CPU-intensive computations
- Hybrid approaches: Use async for I/O coordination, spawn blocking tasks for CPU work
|
|
Complexity Tax
Async programming also comes with a complexity tax:
- Learning curve: Understanding futures, tasks, and runtimes
- Debugging difficulty: Stack traces through async boundaries are confusing
- Ecosystem fragmentation: Not all libraries have async variants
- Error handling complexity: Nested timeouts and error propagation
For simple applications or teams new to Rust, the threaded approach might be more appropriate despite lower theoretical performance.
The Mental Model Transformation
By the end of my async journey, my mental model had completely transformed.
Old Model: Threads as Workers
|
|
Each thread was a dedicated worker with exclusive resources.
New Model: Tasks as Work Units
|
|
The event loop was a work multiplexer, constantly switching between ready tasks.
The Cooperative Insight
The key insight was cooperation vs preemption:
- Threads: OS forcibly switches between threads (preemptive multitasking)
- Async: Tasks voluntarily yield control at
.await
points (cooperative multitasking)
This cooperation enabled much more efficient resource utilization but required disciplined programming – tasks must yield regularly and avoid blocking operations.
Async vs Threading: The Final Verdict
After implementing and testing all three approaches, clear patterns emerged:
Use Single-Threading When:
- Prototyping or educational projects
- Very low connection rates (< 10 concurrent)
- Simplicity is more important than performance
- Team is new to concurrency concepts
Use Thread Pools When:
- Mixed I/O and CPU-intensive workloads
- Moderate connection rates (100-1,000 concurrent)
- Team is comfortable with traditional threading
- Need to integrate with blocking libraries
- Debugging and profiling are critical
Use Async When:
- High connection rates (1,000+ concurrent)
- I/O-bound workloads dominate
- Memory efficiency is crucial
- Team is willing to invest in async expertise
- Modern Rust ecosystem compatibility is important
Healthcare Data Context
In healthcare systems, I would like to choose depending on the use case. My speculation – because I have not tested all these empirically yet – is:
- Claims processing pipelines: Async excels (thousands of concurrent network requests)
- Clinical decision support: Threading might be better (CPU-intensive analysis)
- Patient data synchronization: Async wins (high-concurrency, I/O-bound)
- Audit log analysis: Hybrid approach (async coordination + blocking computation)
The Road Ahead: Beyond Basic Async
My async handshake server was just the beginning. Real-world async applications involve additional complexities:
- Backpressure management: Preventing fast producers from overwhelming slow consumers
- Circuit breakers: Graceful degradation when downstream services fail
- Rate limiting: Controlling resource consumption per client
- Metrics and observability: Understanding async system behavior
- Testing async code: Ensuring correctness under concurrency
But those are adventures for future projects. For now, I had achieved something significant: I understood the fundamental trade-offs between threading and async models, and I could choose the right tool for the job.
Reflection: The Async Awakening
The journey from threads to async wasn’t just about learning new syntax – it was about fundamentally changing how I think about concurrency.
Threading taught me: How to safely share resources between parallel execution contexts.
Async taught me: How to efficiently multiplex work over limited execution resources.
Both are valuable mental models. Threading maps naturally to how we think about parallel work in the real world. Async requires embracing a more abstract model of cooperative multitasking.
The performance results spoke for themselves:
- Thread pool: 1,500 concurrent connections, 468MB memory
- Async: 5,000+ concurrent connections, 89MB memory
But the deeper insight was architectural: async didn’t just perform better – it eliminated entire categories of resource bottlenecks.
As I looked at my async server humming along with thousands of concurrent connections, I felt like I had unlocked a new level of systems programming. Not because async is inherently superior, but because I now had multiple tools in my concurrency toolkit and understood when to use each one.
The handshake protocol that started as a Friday night curiosity had become a vehicle for exploring the deepest concepts in concurrent systems design. Not bad for a simple three-message exchange.
The async awakening was complete. Time to wrap up. The upcoming final episode will be a comprehensive postmortem for my entire journey of the Rust Handshake project.
Github Repository
Please check out Handshake project repository containing full (refactored) source codes.