Where our simple single-threaded server meets the harsh reality of concurrent clients – and how I learned to think in threads.
Source: https://commons.wikimedia.org/
One Server, But More Than One Clients
Last week’s victory almost felt complete as my handshake server seemed to work flawlessly – until concurrency kicks in. Unless there are only single server and single client (too strong assumption!), we always have to take multiple clients into consideration.
To illustrate, let’s assume our 3-way handshake protocol requires expensive computation that takes some time (e.g., a few seconds or even minutes).
We may open two terminals, feeling confident:
|
|
So far, so good. Then Terminal 2:
|
|
The second client will just be … hung. Not crashed, not rejected – frozen in digital limbo. This is because our “working” server could only handle one client at a time. In the second episode of this challenge, we need to address this concern and upgrade our server accordingly. We will also discuss and review a few basic concepts of concurrency and threading down the road.
Disclaimer: Like the last episode, I purposely dramatized my challenge experience with reasonable exaggeration for the sake of more engaging story telling. Please take a grain of salt.
Diagnosing the Problem
Before jumping to solutions, we need to understand why this was happening. Time to trace through my server’s execution step by step.
My original main loop looked innocent enough:
|
|
Let me walk through this with multiple clients:
Timeline Analysis:
- T=0: Server starts, reaches Point A, waits for first connection
- T=1: Client 1 connects, server accepts at Point A
- T=1: Server moves to Point B, begins handshake with Client 1
- T=2: Client 2 tries to connect – but server is busy at Point B!
- T=3-5: Client 1 completes handshake, server finishes Point B
- T=5: Server loops back to Point A, finally accepts Client 2
The issue was crystal clear: my server was fundamentally sequential. Each handle_handshake()
call blocked the entire event loop.
Hospital Analogy
For those familiar with hospital operations, this pattern would be recognized immediately. Imagine a clinic with one receptionist processing patient check-ins:
- Patient A arrives, receptionist starts processing insurance verification
- Patient B arrives, waits in line
- Patient C arrives, waits behind Patient B
- Receptionist finishes Patient A (5 minutes later)
- Finally begins processing Patient B
Even if insurance verification could be done in parallel (calling different systems, waiting for responses), the single-receptionist model creates an artificial bottleneck. Each patient must wait for all previous patients to completely finish.
What If We Had Multiple Workers?
Staring at my blocked client, a fundamental question emerged: What if the server could work on multiple handshakes simultaneously?
This wasn’t just about speed – it was about resource utilization. During a handshake, my server spent most of its time waiting:
- Waiting for the client to send
HELLO X
- Waiting for network I/O to complete
- Waiting for the client to send
HELLO Z
While waiting for Client 1’s network I/O, why couldn’t the server accept and start processing Client 2?
Concurrency vs Parallelism
Before diving into solutions, I needed to clarify my thinking. Two related but distinct concepts kept getting tangled in my mind:
Concurrency: Multiple tasks making progress by interleaving execution. Like a juggler keeping multiple balls in the air by rapidly switching attention between them.
Parallelism: Multiple tasks executing simultaneously. Like multiple jugglers, each handling their own set of balls.
For my server, I needed concurrency – the ability to make progress on multiple client connections without one blocking the others.
Threading, The Operating System’s Gift
We may consider TWO distinct approaches to solving this problem. Perhaps threading might be the easier solution to come up between the two. But before writing any code, we need to understand what threads actually are and how they could solve my problem.
(Aside: “harder” solution will be discussed and implemented in the next episode!)
Threads as Independent Workers
I started visualizing threads as independent workers within the same office building:
- Shared Resources: All workers share the same building (process memory), equipment (file handles), and utilities (network sockets)
- Independent Work: Each worker can focus on their own tasks without coordinating with others
- Communication: Workers can communicate when needed, but don’t have to
Applied to my server:
- Main Thread: Acts as a receptionist, accepting new connections
- Worker Threads: Each handles one client’s complete handshake
- Shared Infrastructure: All threads share the listening socket and server resources
The First Question: How Many Threads?
Before writing any threading code, I faced a fundamental design decision: How many threads should I create?
Several approaches came to mind:
Option 1: Thread-per-Connection
- Create a new thread for each client connection
- Simple to understand and implement
- But what about resource limits?
Option 2: Fixed Thread Pool
- Create a fixed number of worker threads
- Queue incoming connections for available workers
- More complex but potentially more efficient
Option 3: Dynamic Thread Pool
- Start with a base number of threads
- Create more as needed, destroy when idle
- Most complex, but potentially most adaptive
For my first threading attempt, I chose Option 1 – thread-per-connection. Start simple, understand the fundamentals, then optimize.
First Threading Implementation (Naive)
With my mental model established, I began the most basic threading implementation:
Step 1: The Core Insight
My original server had this structure:
|
|
The threading insight was simple: what if handle_handshake
ran in a separate thread?
|
|
But immediately, the Rust compiler screamed at me:
|
|
Step 2: Understanding Ownership in Threading
This error forced me to think deeply about ownership across thread boundaries. In single-threaded code, ownership transfers were straightforward. But what happens when a value needs to move to a different thread?
The compiler was protecting me from a classic threading bug: what if the main thread ended while the spawned thread was still using stream
?
The solution was the move
keyword:
|
|
This was my first “aha!” moment with Rust threading: the ownership system prevented data races by design. No shared mutable state, no race conditions – the thread owned its stream completely.
Step 3: Error Handling Across Threads
My next challenge: what happens when handle_handshake()
returns an error? In single-threaded code, I could propagate errors with ?
. But thread spawning returns a JoinHandle
, not a Result
.
I needed to handle errors within each thread:
|
|
But this felt clunky. Let me create a wrapper function to encapsulate this pattern:
|
|
Step 4: Testing the Basic Threading
With basic threading implemented, the moment of truth arrived:
|
|
It worked! Both clients completed simultaneously. The server output showed beautiful interleaving:
|
|
I had achieved true concurrency.
Resource Explosion
Euphoria lasted about five minutes. Time to stress-test my creation:
|
|
The server handled it. Let’s push further:
|
|
My almost-decade-old XPS13 laptop started groaning. Time to investigate what was happening under the hood.
Resource Consumption Analysis
Using system monitoring tools, the truth became apparent:
|
|
2.1 gigabytes of RAM? For a simple handshake server? And 847 threads?
The Thread Stack Problem
Research revealed the culprit: each thread gets its own stack. On Linux, the default stack size is 8MB per thread. Simple math:
- 1000 threads × 8MB stack = 8GB of virtual memory
- Context switching between 1000 threads = performance nightmare
My “elegant” thread-per-connection approach was a resource catastrophe.
The Deeper Problem: Context Switching
Monitoring context switches revealed another issue:
|
|
The cs
column showed nearly 100K context switches per second. The operating system was spending more time switching between threads than doing actual work.
Realization: Thread-per-connection doesn’t scale beyond a few hundred concurrent connections.
Rethinking the Approach: Thread Pools
Staring at my resource-hungry server, I needed a new strategy. The problem wasn’t threading itself – it was unbounded thread creation.
The Restaurant Insight Revisited
Back to my restaurant analogy, but with a crucial insight: successful restaurants don’t hire a new waiter for every customer. They maintain a fixed staff of experienced waiters who handle multiple customers throughout their shift.
What if I applied this principle to my server?
Thread Pool Concept:
- Create a fixed number of worker threads at startup
- When connections arrive, assign them to available workers
- When workers finish, they return to the pool for new assignments
This approach promised several advantages:
- Predictable Resource Usage: Fixed memory footprint
- Reduced Context Switching: Fewer threads means less switching overhead
- Reusable Workers: Threads don’t need creation/destruction overhead
Designing the Thread Pool Interface
Before implementation, I needed to design the interface. How should work be distributed to threads?
Option 1: Direct Assignment
|
|
Option 2: Work Queue
|
|
Option 2 felt more natural – I could submit work without worrying about worker management details.
Thread Pool Sizing Strategy
A critical question remained: how many threads should the pool contain?
For CPU-bound tasks, the answer is usually “number of CPU cores.” But network I/O is different – threads spend time waiting for network responses, not using CPU.
My reasoning process:
- Minimum: At least 4 threads for reasonable concurrency
- CPU Cores: Query
std::thread::available_parallelism()
- I/O Multiplier: Since threads wait for network I/O, use 2× CPU cores
- Fallback: If detection fails, default to 8
|
|
I was not sure if that was optimal, but it felt reasonable to me. This gave me a principled approach to thread pool sizing.
threadpool
Crate
Rather than implementing a thread pool from scratch (which I had once done in C), I decided to use the threadpool
crate. I wanted to focus on real core problem, not infrastructure for this challenge.
Step 1: Adding the Dependency
|
|
Step 2: Restructuring the Main Loop
The thread pool required a fundamental shift in thinking:
Old Pattern (Thread-per-Connection):
|
|
New Pattern (Thread Pool):
|
|
The change was minimal, but the implications were profound. Instead of creating threads, I was submitting work to a pre-existing pool.
Step 3: Adding Connection Monitoring
With a thread pool, I gained the ability to monitor and log connections more effectively:
|
|
This gave me visibility into connection patterns and helped with debugging.
Step 4: Enhanced Error Handling
The thread pool enabled more sophisticated error handling:
|
|
Now I could track both successful and failed connections with their source addresses.
Performance Testing: Validating the Approach
Time to validate my thread pool implementation with systematic testing.
Test Methodology
I created a controlled test environment:
|
|
Comparative Results
Implementation | 500 Clients | Memory Usage | CPU Usage | Context Switches/sec |
---|---|---|---|---|
Single-threaded | 45.2s | 12MB | 2% | 50 |
Thread-per-connection | 8.7s | 1.2GB | 65% | 89,000 |
Thread Pool (8 workers) | 9.1s | 48MB | 25% | 1,200 |
Analysis and Insights
Single-threaded: Predictably slow but incredibly resource-efficient. The serialization bottleneck dominated performance.
Thread-per-connection: Fastest raw performance, but at catastrophic resource cost. The memory usage and context switching overhead made it unsustainable.
Thread Pool: Nearly as fast as thread-per-connection but with 96% less memory usage and 98.6% fewer context switches. This was the sweet spot.
The thread pool achieved what I was looking for: high performance with sustainable resource usage.
Production Considerations: Beyond Basic Threading
As I admired my thread pool implementation, several production concerns emerged.
Connection Limiting and Backpressure
What happens when connection requests exceed the thread pool’s capacity? The pool’s internal queue could grow unbounded, eventually causing memory exhaustion.
I needed backpressure – a mechanism to limit concurrent connections:
|
|
This pattern provides graceful degradation – when capacity is reached, new connections are explicitly rejected rather than causing system failure.
Health Claims Processing
This backpressure mechanism reminded me of claims processing systems. When claim volume exceeds processing capacity, the system doesn’t crash – it queues claims and provides realistic processing time estimates.
Similarly, my server now had predictable behavior under load: accept up to N concurrent connections, reject additional requests gracefully.
Threading Lessons and Insights
The Rust Threading Advantage
Working with threads in Rust felt fundamentally different from other languages:
Ownership-Based Safety: The compiler prevented data races at compile time. No mysterious crashes from multiple threads accessing shared memory incorrectly.
Automatic Resource Management: RAII meant automatic cleanup. When threads ended, their resources were automatically released – no manual memory management required.
Clear Error Boundaries: Each thread’s errors were isolated. A parsing failure in one client connection couldn’t corrupt another client’s state.
The Evolution of My Mental Model
My understanding of threading evolved through this project:
Initial View: Threads as a performance optimization – a way to make things faster.
Mature View: Threads as a resource utilization strategy – a way to keep the CPU busy while individual operations wait for I/O.
The key insight: threading isn’t primarily about speed – it’s about not wasting resources while waiting.
When to Choose Each Approach
Through testing and analysis, clear patterns emerged:
Single-threaded: Perfect for development, testing, or very low connection rates (< 10 concurrent).
Thread-per-connection: Suitable for predictable, moderate connection counts where maximum per-connection performance matters.
Thread Pool: The production choice for most network servers. Excellent scalability with predictable resource usage.
The Async Question Mark
Wrapping up our thread pool success, let’s ask ourselves a nagging question remained: What if there was a way to handle thousands of connections without creating thousands of threads?
Any quick research will keep pointing to asynchronous programming – a fundamentally different concurrency model that doesn’t rely on operating system threads for I/O operations.
The promise is tantalizing: handle thousands of concurrent connections with just a few threads by cooperatively multitasking within a single thread.
But that’s a story for our next episode, where we’ll explore Rust’s async
/await
system and rebuild our handshake server using tokio
. We’ll discover why async programming has become the dominant pattern for high-performance network services – and whether it lives up to its promises.
Reflection: The Threading Journey
Threading transformed my simple handshake server from a sequential toy into something topologically closer to real servers. More importantly, it taught me to think systematically about concurrency problems:
Resource Analysis: Understanding the true cost of different approaches before implementing them.
Performance Trade-offs: Recognizing that raw speed isn’t everything – sustainable resource usage often matters more.
Safety First: Appreciating how Rust’s ownership system makes concurrent programming dramatically safer than traditional approaches.
Production Mindset: Considering backpressure, connection limits, and graceful degradation from the beginning.
The server we built can now handle hundreds of concurrent connections efficiently and safely. From Friday night curiosity to legitimate solution – not bad for a weekend fun.
But I’m not done yet. That async question mark keeps calling…
Github Repository
I published my Handshake project repository containing full (refactored) source codes. Please check them out as well.