Rust Handshake Challenge #2: Threading the Needle of Concurrency

Where our simple single-threaded server meets the harsh reality of concurrent clients – and how I learned to think in threads.

Threading Concept Source: https://commons.wikimedia.org/

One Server, But More Than One Clients

Last week’s victory almost felt complete as my handshake server seemed to work flawlessly – until concurrency kicks in. Unless there are only single server and single client (too strong assumption!), we always have to take multiple clients into consideration.

To illustrate, let’s assume our 3-way handshake protocol requires expensive computation that takes some time (e.g., a few seconds or even minutes).

We may open two terminals, feeling confident:

1
2
3
4
5


# Terminal 1
$ ./rust-tcpclient 127.0.0.1 8080 100
Connecting to server...
Sent: HELLO 100
Received: HELLO 101

So far, so good. Then Terminal 2:

1
2
3
4


# Terminal 2 (simultaneously)
$ ./rust-tcpclient 127.0.0.1 8080 200
Connecting to server...
[cursor blinking... and blinking... and blinking...]

The second client will just be … hung. Not crashed, not rejected – frozen in digital limbo. This is because our “working” server could only handle one client at a time. In the second episode of this challenge, we need to address this concern and upgrade our server accordingly. We will also discuss and review a few basic concepts of concurrency and threading down the road.

Disclaimer: Like the last episode, I purposely dramatized my challenge experience with reasonable exaggeration for the sake of more engaging story telling. Please take a grain of salt.

Diagnosing the Problem

Before jumping to solutions, we need to understand why this was happening. Time to trace through my server’s execution step by step.

My original main loop looked innocent enough:

1
2
3
4
5
6
7
8
9


// simplified code for comprehensibility
fn main() {
  let listener = TcpListener::bind("0.0.0.0:8080")?;
  
  loop {
    let (stream, _addr) = listener.accept()?; // ← Point A
    handle_handshake(stream)?;                // ← Point B
  } // ← Point C
}

Let me walk through this with multiple clients:

Timeline Analysis:

T=0: Server starts, reaches Point A, waits for first connection
T=1: Client 1 connects, server accepts at Point A
T=1: Server moves to Point B, begins handshake with Client 1
T=2: Client 2 tries to connect – but server is busy at Point B!
T=3-5: Client 1 completes handshake, server finishes Point B
T=5: Server loops back to Point A, finally accepts Client 2

The issue was crystal clear: my server was fundamentally sequential. Each handle_handshake() call blocked the entire event loop.

Hospital Analogy

For those familiar with hospital operations, this pattern would be recognized immediately. Imagine a clinic with one receptionist processing patient check-ins:

Patient A arrives, receptionist starts processing insurance verification
Patient B arrives, waits in line
Patient C arrives, waits behind Patient B
Receptionist finishes Patient A (5 minutes later)
Finally begins processing Patient B

Even if insurance verification could be done in parallel (calling different systems, waiting for responses), the single-receptionist model creates an artificial bottleneck. Each patient must wait for all previous patients to completely finish.

What If We Had Multiple Workers?

Staring at my blocked client, a fundamental question emerged: What if the server could work on multiple handshakes simultaneously?

This wasn’t just about speed – it was about resource utilization. During a handshake, my server spent most of its time waiting:

Waiting for the client to send HELLO X
Waiting for network I/O to complete
Waiting for the client to send HELLO Z

While waiting for Client 1’s network I/O, why couldn’t the server accept and start processing Client 2?

Concurrency vs Parallelism

Before diving into solutions, I needed to clarify my thinking. Two related but distinct concepts kept getting tangled in my mind:

Concurrency: Multiple tasks making progress by interleaving execution. Like a juggler keeping multiple balls in the air by rapidly switching attention between them.

Parallelism: Multiple tasks executing simultaneously. Like multiple jugglers, each handling their own set of balls.

For my server, I needed concurrency – the ability to make progress on multiple client connections without one blocking the others.

Threading, The Operating System’s Gift

We may consider TWO distinct approaches to solving this problem. Perhaps threading might be the easier solution to come up between the two. But before writing any code, we need to understand what threads actually are and how they could solve my problem.

(Aside: “harder” solution will be discussed and implemented in the next episode!)

Threads as Independent Workers

I started visualizing threads as independent workers within the same office building:

Shared Resources: All workers share the same building (process memory), equipment (file handles), and utilities (network sockets)
Independent Work: Each worker can focus on their own tasks without coordinating with others
Communication: Workers can communicate when needed, but don’t have to

Applied to my server:

Main Thread: Acts as a receptionist, accepting new connections
Worker Threads: Each handles one client’s complete handshake
Shared Infrastructure: All threads share the listening socket and server resources

The First Question: How Many Threads?

Before writing any threading code, I faced a fundamental design decision: How many threads should I create?

Several approaches came to mind:

Option 1: Thread-per-Connection

Create a new thread for each client connection
Simple to understand and implement
But what about resource limits?

Option 2: Fixed Thread Pool

Create a fixed number of worker threads
Queue incoming connections for available workers
More complex but potentially more efficient

Option 3: Dynamic Thread Pool

Start with a base number of threads
Create more as needed, destroy when idle
Most complex, but potentially most adaptive

For my first threading attempt, I chose Option 1 – thread-per-connection. Start simple, understand the fundamentals, then optimize.

First Threading Implementation (Naive)

With my mental model established, I began the most basic threading implementation:

Step 1: The Core Insight

My original server had this structure:

1
2
3
4


loop {
  let (stream, _addr) = listener.accept()?;
  handle_handshake(stream)?; // ← Blocking point
}

The threading insight was simple: what if handle_handshake ran in a separate thread?

1
2
3
4
5
6
7


loop {
  let (stream, _addr) = listener.accept()?;
  // Launch handshake in separate thread, don't wait for it
  thread::spawn(|| {
    handle_handshake(stream); // ← Now non-blocking!
  });
}

But immediately, the Rust compiler screamed at me:

1

error[E0373]: closure may outlive the current function, but it borrows `stream`, which is owned by the current function

Step 2: Understanding Ownership in Threading

This error forced me to think deeply about ownership across thread boundaries. In single-threaded code, ownership transfers were straightforward. But what happens when a value needs to move to a different thread?

The compiler was protecting me from a classic threading bug: what if the main thread ended while the spawned thread was still using stream?

The solution was the move keyword:

1
2
3


thread::spawn(move || {
  handle_handshake(stream); // ← `move` transfers ownership to the thread
});

This was my first “aha!” moment with Rust threading: the ownership system prevented data races by design. No shared mutable state, no race conditions – the thread owned its stream completely.

Step 3: Error Handling Across Threads

My next challenge: what happens when handle_handshake() returns an error? In single-threaded code, I could propagate errors with ?. But thread spawning returns a JoinHandle, not a Result.

I needed to handle errors within each thread:

1
2
3
4
5


thread::spawn(move || {
  if let Err(e) = handle_handshake(stream) {
    eprintln!("ERROR: Handshake failed: {e}");
  }
});

But this felt clunky. Let me create a wrapper function to encapsulate this pattern:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


fn client_thread_wrapper(stream: TcpStream) {
  if let Err(e) = handle_client(stream) {
    eprintln!("ERROR: {e}");
  }
}

// Now the main loop becomes cleaner:
thread::spawn(move || {
  client_thread_wrapper(stream);
});

Step 4: Testing the Basic Threading

With basic threading implemented, the moment of truth arrived:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


# Terminal 1
$ ./rust-tcpclient 127.0.0.1 8080 100
Sent: HELLO 100
Received: HELLO 101
Sent: HELLO 102
Handshake successful!

# Terminal 2 (simultaneously!)
$ ./rust-tcpclient 127.0.0.1 8080 200
Sent: HELLO 200
Received: HELLO 201  
Sent: HELLO 202
Handshake successful!

It worked! Both clients completed simultaneously. The server output showed beautiful interleaving:

1
2
3
4


HELLO 100
HELLO 200
HELLO 102
HELLO 202

I had achieved true concurrency.

Resource Explosion

Euphoria lasted about five minutes. Time to stress-test my creation:

1
2
3
4


# Let's try 100 concurrent connections
for i in {1..100}; do
  ./rust-tcpclient 127.0.0.1 8080 $i &
done

The server handled it. Let’s push further:

1
2
3
4


# How about 1000?
for i in {1..1000}; do
  ./rust-tcpclient 127.0.0.1 8080 $i &
done

My almost-decade-old XPS13 laptop started groaning. Time to investigate what was happening under the hood.

Resource Consumption Analysis

Using system monitoring tools, the truth became apparent:

1
2
3
4
5
6


$ top -p $(pgrep rust-tcpserver)
PID    COMMAND      %CPU  %MEM     TIME+ RES
12345  rust-tcpserver 45.2  24.3   0:12.34 2.1g

$ ps -T -p $(pgrep rust-tcpserver) | wc -l
847

2.1 gigabytes of RAM? For a simple handshake server? And 847 threads?

The Thread Stack Problem

Research revealed the culprit: each thread gets its own stack. On Linux, the default stack size is 8MB per thread. Simple math:

1000 threads × 8MB stack = 8GB of virtual memory
Context switching between 1000 threads = performance nightmare

My “elegant” thread-per-connection approach was a resource catastrophe.

The Deeper Problem: Context Switching

Monitoring context switches revealed another issue:

1
2
3
4


$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 4  0      0 1234567     0 987654    0    0     0     0  123 98234  5 15 80  0  0

The cs column showed nearly 100K context switches per second. The operating system was spending more time switching between threads than doing actual work.

Realization: Thread-per-connection doesn’t scale beyond a few hundred concurrent connections.

Rethinking the Approach: Thread Pools

Staring at my resource-hungry server, I needed a new strategy. The problem wasn’t threading itself – it was unbounded thread creation.

The Restaurant Insight Revisited

Back to my restaurant analogy, but with a crucial insight: successful restaurants don’t hire a new waiter for every customer. They maintain a fixed staff of experienced waiters who handle multiple customers throughout their shift.

What if I applied this principle to my server?

Thread Pool Concept:

Create a fixed number of worker threads at startup
When connections arrive, assign them to available workers
When workers finish, they return to the pool for new assignments

This approach promised several advantages:

Predictable Resource Usage: Fixed memory footprint
Reduced Context Switching: Fewer threads means less switching overhead
Reusable Workers: Threads don’t need creation/destruction overhead

Designing the Thread Pool Interface

Before implementation, I needed to design the interface. How should work be distributed to threads?

Option 1: Direct Assignment

1
2
3


// pseudo-code
let worker = pool.get_available_worker();
worker.assign_task(stream);

Option 2: Work Queue

1
2


// pseudo-code  
pool.submit_task(|| handle_client(stream));

Option 2 felt more natural – I could submit work without worrying about worker management details.

Thread Pool Sizing Strategy

A critical question remained: how many threads should the pool contain?

For CPU-bound tasks, the answer is usually “number of CPU cores.” But network I/O is different – threads spend time waiting for network responses, not using CPU.

My reasoning process:

Minimum: At least 4 threads for reasonable concurrency
CPU Cores: Query std::thread::available_parallelism()
I/O Multiplier: Since threads wait for network I/O, use 2× CPU cores
Fallback: If detection fails, default to 8

1
2
3
4


let num_threads = std::cmp::max(
  4,
  std::thread::available_parallelism().map(|n| n.get() * 2).unwrap_or(8)
);

I was not sure if that was optimal, but it felt reasonable to me. This gave me a principled approach to thread pool sizing.

`threadpool` Crate

Rather than implementing a thread pool from scratch (which I had once done in C), I decided to use the threadpool crate. I wanted to focus on real core problem, not infrastructure for this challenge.

Step 1: Adding the Dependency

1
2


[dependencies]
threadpool = "1.8"

Step 2: Restructuring the Main Loop

The thread pool required a fundamental shift in thinking:

Old Pattern (Thread-per-Connection):

1
2
3
4


loop {
  let (stream, _addr) = listener.accept()?;
  thread::spawn(move || handle_client(stream));
}

New Pattern (Thread Pool):

1
2
3
4
5
6


let pool = ThreadPool::new(num_threads);

loop {
  let (stream, _addr) = listener.accept()?;
  pool.execute(move || handle_client(stream));
}

The change was minimal, but the implications were profound. Instead of creating threads, I was submitting work to a pre-existing pool.

Step 3: Adding Connection Monitoring

With a thread pool, I gained the ability to monitor and log connections more effectively:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


match listener.accept() {
  Ok((stream, addr)) => {
    println!("Accepted connection from {addr}");
    pool.execute(move || {
      handle_client_wrapper(stream);
    });
  }
  Err(e) => {
    eprintln!("ERROR accepting connection: {e}");
    continue;
  }
}

This gave me visibility into connection patterns and helped with debugging.

Step 4: Enhanced Error Handling

The thread pool enabled more sophisticated error handling:

1
2
3
4
5
6
7
8


fn handle_client_wrapper(stream: TcpStream) {
  let peer_addr = stream.peer_addr().unwrap_or_else(|_| "unknown".parse().unwrap());
  
  match handle_client(stream) {
    Ok(_) => println!("Successfully handled connection from {peer_addr}"),
    Err(e) => eprintln!("Error handling connection from {peer_addr}: {e}"),
  }
}

Now I could track both successful and failed connections with their source addresses.

Performance Testing: Validating the Approach

Time to validate my thread pool implementation with systematic testing.

Test Methodology

I created a controlled test environment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25


#!/bin/bash
# concurrent_test.sh

SERVER_PORT=8080
NUM_CLIENTS=500
CONCURRENT_LIMIT=50

echo "Testing with $NUM_CLIENTS total clients, max $CONCURRENT_LIMIT concurrent"

start_time=$(date +%s.%N)

for i in $(seq 1 $NUM_CLIENTS); do
  ./rust-tcpclient 127.0.0.1 $SERVER_PORT $i &
  
  # Limit concurrent connections to avoid overwhelming the system
  if (( i % CONCURRENT_LIMIT == 0 )); then
    wait
  fi
done

wait # Wait for all background jobs to complete
end_time=$(date +%s.%N)
duration=$(echo "$end_time - $start_time" | bc)

echo "Completed $NUM_CLIENTS handshakes in $duration seconds"

Comparative Results

Implementation	500 Clients	Memory Usage	CPU Usage	Context Switches/sec
Single-threaded	45.2s	12MB	2%	50
Thread-per-connection	8.7s	1.2GB	65%	89,000
Thread Pool (8 workers)	9.1s	48MB	25%	1,200

Analysis and Insights

Single-threaded: Predictably slow but incredibly resource-efficient. The serialization bottleneck dominated performance.

Thread-per-connection: Fastest raw performance, but at catastrophic resource cost. The memory usage and context switching overhead made it unsustainable.

Thread Pool: Nearly as fast as thread-per-connection but with 96% less memory usage and 98.6% fewer context switches. This was the sweet spot.

The thread pool achieved what I was looking for: high performance with sustainable resource usage.

Production Considerations: Beyond Basic Threading

As I admired my thread pool implementation, several production concerns emerged.

Connection Limiting and Backpressure

What happens when connection requests exceed the thread pool’s capacity? The pool’s internal queue could grow unbounded, eventually causing memory exhaustion.

I needed backpressure – a mechanism to limit concurrent connections:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


use std::sync::Arc;
use std::sync::atomic::{AtomicUsize, Ordering};

let active_connections = Arc::new(AtomicUsize::new(0));
const MAX_CONNECTIONS: usize = 1000;

loop {
  match listener.accept() {
    Ok((stream, addr)) => {
      let current = active_connections.load(Ordering::Relaxed);
      
      if current >= MAX_CONNECTIONS {
        eprintln!("Connection limit reached, rejecting {addr}");
        drop(stream); // Immediately close the connection
        continue;
      }

      let active_connections_clone = Arc::clone(&active_connections);
      active_connections.fetch_add(1, Ordering::Relaxed);

      pool.execute(move || {
        handle_client_wrapper(stream);
        active_connections_clone.fetch_sub(1, Ordering::Relaxed);
      });
    }
    Err(e) => {
      eprintln!("ERROR accepting connection: {e}");
      continue;
    }
  }
}

This pattern provides graceful degradation – when capacity is reached, new connections are explicitly rejected rather than causing system failure.

Health Claims Processing

This backpressure mechanism reminded me of claims processing systems. When claim volume exceeds processing capacity, the system doesn’t crash – it queues claims and provides realistic processing time estimates.

Similarly, my server now had predictable behavior under load: accept up to N concurrent connections, reject additional requests gracefully.

Threading Lessons and Insights

The Rust Threading Advantage

Working with threads in Rust felt fundamentally different from other languages:

Ownership-Based Safety: The compiler prevented data races at compile time. No mysterious crashes from multiple threads accessing shared memory incorrectly.

Automatic Resource Management: RAII meant automatic cleanup. When threads ended, their resources were automatically released – no manual memory management required.

Clear Error Boundaries: Each thread’s errors were isolated. A parsing failure in one client connection couldn’t corrupt another client’s state.

The Evolution of My Mental Model

My understanding of threading evolved through this project:

Initial View: Threads as a performance optimization – a way to make things faster.

Mature View: Threads as a resource utilization strategy – a way to keep the CPU busy while individual operations wait for I/O.

The key insight: threading isn’t primarily about speed – it’s about not wasting resources while waiting.

When to Choose Each Approach

Through testing and analysis, clear patterns emerged:

Single-threaded: Perfect for development, testing, or very low connection rates (< 10 concurrent).

Thread-per-connection: Suitable for predictable, moderate connection counts where maximum per-connection performance matters.

Thread Pool: The production choice for most network servers. Excellent scalability with predictable resource usage.

The Async Question Mark

Wrapping up our thread pool success, let’s ask ourselves a nagging question remained: What if there was a way to handle thousands of connections without creating thousands of threads?

Any quick research will keep pointing to asynchronous programming – a fundamentally different concurrency model that doesn’t rely on operating system threads for I/O operations.

The promise is tantalizing: handle thousands of concurrent connections with just a few threads by cooperatively multitasking within a single thread.

But that’s a story for our next episode, where we’ll explore Rust’s async/await system and rebuild our handshake server using tokio. We’ll discover why async programming has become the dominant pattern for high-performance network services – and whether it lives up to its promises.

Reflection: The Threading Journey

Threading transformed my simple handshake server from a sequential toy into something topologically closer to real servers. More importantly, it taught me to think systematically about concurrency problems:

Resource Analysis: Understanding the true cost of different approaches before implementing them.

Performance Trade-offs: Recognizing that raw speed isn’t everything – sustainable resource usage often matters more.

Safety First: Appreciating how Rust’s ownership system makes concurrent programming dramatically safer than traditional approaches.

Production Mindset: Considering backpressure, connection limits, and graceful degradation from the beginning.

The server we built can now handle hundreds of concurrent connections efficiently and safely. From Friday night curiosity to legitimate solution – not bad for a weekend fun.

But I’m not done yet. That async question mark keeps calling…

Github Repository

I published my Handshake project repository containing full (refactored) source codes. Please check them out as well.

https://github.com/SaehwanPark/rust-handshake

One Server, But More Than One Clients#

Diagnosing the Problem#

Hospital Analogy#

What If We Had Multiple Workers?#

Concurrency vs Parallelism#

Threading, The Operating System’s Gift#

Threads as Independent Workers#

The First Question: How Many Threads?#

First Threading Implementation (Naive)#

Step 1: The Core Insight#

Step 2: Understanding Ownership in Threading#

Step 3: Error Handling Across Threads#

Step 4: Testing the Basic Threading#

Resource Explosion#

Resource Consumption Analysis#

The Thread Stack Problem#

The Deeper Problem: Context Switching#

Rethinking the Approach: Thread Pools#

The Restaurant Insight Revisited#

Designing the Thread Pool Interface#

Thread Pool Sizing Strategy#

threadpool Crate#

Step 1: Adding the Dependency#

Step 2: Restructuring the Main Loop#

Step 3: Adding Connection Monitoring#

Step 4: Enhanced Error Handling#

Performance Testing: Validating the Approach#

Test Methodology#

Comparative Results#

Analysis and Insights#

Production Considerations: Beyond Basic Threading#

Connection Limiting and Backpressure#

Health Claims Processing#

Threading Lessons and Insights#

The Rust Threading Advantage#

The Evolution of My Mental Model#

When to Choose Each Approach#

The Async Question Mark#

Reflection: The Threading Journey#

Github Repository#