Where our thread pool server meets its match, and I discover that sometimes the best way to handle thousands of connections is to pretend threads don’t exist.


Event-Driven Async Server Architecture (Source: https://berb.github.io/diploma-thesis/community/042_serverarch.html)

The Tantalizing Promise

Fresh from the victory of my thread pool implementation, I felt invincible. My server could handle hundreds of concurrent connections with predictable resource usage. Thread creation was under control, context switching was manageable, and memory consumption was reasonable.

But then I stumbled across a claim that seemed almost too good to be true:

“Handle 10,000+ concurrent connections with a single thread using async/await.”

Ten thousand connections? With one thread? My engineering skepticism kicked in immediately. This sounded like the kind of marketing hyperbole that promises “unlimited scalability” while quietly ignoring the laws of physics.

Yet the numbers kept appearing. Blog posts showing async servers handling 100x more connections than threaded equivalents. Benchmarks demonstrating dramatic memory savings. Real production systems serving millions of users with just a handful of async tasks.

Disclaimer: Like previous episodes, I’ve dramatized certain moments of my learning journey for narrative effect. The async rabbit hole was indeed deep, but the actual implementation was more methodical than the emotional rollercoaster I describe below. Also note that I intentionally added some sleeping to clients to emulate costly operation.

My curiosity was piqued, but more importantly – my thread pool had hit a wall I hadn’t anticipated.


The Thread Pool’s Achilles Heel

Before diving into async, I needed to understand my thread pool’s limitations. Time for stress testing beyond my previous casual experiments. I conducted a systematic load test crafting shell scripts.

The results were… educational:

1
2
3
4
Clients: 500   | Memory: 156MB  | CPU: 15%  | Success Rate: 100%
Clients: 1000  | Memory: 312MB  | CPU: 35%  | Success Rate: 98%
Clients: 1500  | Memory: 468MB  | CPU: 55%  | Success Rate: 85%
Clients: 2000  | Memory: 624MB  | CPU: 75%  | Success Rate: 67%

The thread pool was hitting a scaling wall. Beyond 1,500 concurrent connections, performance degraded rapidly. Not because of the thread pool itself, but because of fundamental OS-level bottlenecks:

  • Each thread still consumed stack space (2MB+ per worker)
  • Context switching overhead increased with connection count
  • File descriptor limits started constraining connections
  • Network buffer memory scaled linearly with connection count

The Hospital Emergency Room Reality

Working with healthcare data, I recognized this pattern immediately. It’s like an emergency room during a crisis – even with optimized staffing (thread pool), there are physical limits:

  • Examination rooms (file descriptors) are finite
  • Medical equipment (memory buffers) can only be allocated so far
  • Staff coordination (context switching) becomes chaotic beyond a certain patient load

The thread pool had solved the “unlimited thread creation” problem, but it hadn’t solved the fundamental resource scaling problem.


Enter the Event Loop: A Different Mental Model

This is where async/await promised something radical: what if we could handle thousands of connections without creating thousands of workers?

But first, I needed to understand what “event-driven” actually meant.

Threading vs Event-Driven: The Cognitive Shift

My mental model for threading was straightforward:

graph TD A[Main Thread: Accept Connections] --> B[Worker Thread 1: Handle Client A] A --> C[Worker Thread 2: Handle Client B] A --> D[Worker Thread 3: Handle Client C] A --> E[Worker Thread N: Handle Client N] B --> F[Waiting for Network I/O] C --> G[Waiting for Network I/O] D --> H[Waiting for Network I/O] E --> I[Waiting for Network I/O]

Each connection gets its own execution context. When a thread waits for network I/O, the entire thread blocks. The OS context-switches to other threads, but the blocked thread consumes resources while doing nothing productive.

Event-driven programming flips this model completely:

graph TD A[Event Loop: Single Thread] --> B[Task 1: Read from Client A] A --> C[Task 2: Write to Client B] A --> D[Task 3: Accept New Connection] A --> E[Task N: Process Client N] F[I/O Completion Events] --> A A --> G[Scheduler: Pick Next Ready Task] G --> A

One thread, many tasks. When a task needs to wait for I/O, it doesn’t block the thread – it yields control back to the event loop. The loop can then work on other tasks that are ready to make progress.

The Restaurant Revelation

The difference became clear through a restaurant analogy:

Threading Model: Like a restaurant where each waiter is assigned to exactly one table for the entire meal. When customers are deciding what to order, the waiter stands idle, waiting. During busy periods, you need as many waiters as you have occupied tables.

Event-Driven Model: Like a restaurant where waiters handle multiple tables dynamically. When Table 5 is deciding what to order, the waiter takes orders from Table 8, delivers food to Table 3, and checks on Table 12. One waiter can efficiently serve many tables by never staying idle when there’s work to be done elsewhere.

The Key Insight: I/O is the Bottleneck

Network programming is fundamentally I/O bound. In my handshake protocol:

  1. Wait for client to send HELLO X ← I/O bound
  2. Process sequence number ← CPU bound (microseconds)
  3. Wait for network buffer to accept response ← I/O bound
  4. Wait for client to send HELLO Z ← I/O bound
  5. Process validation ← CPU bound (microseconds)

The actual CPU work was tiny – maybe 0.1% of the total time. The other 99.9% was waiting for network operations. Threads were spending their lives waiting.

Event-driven programming says: “Why waste thread resources on waiting? Let’s do useful work instead.”


First Contact with Tokio

Armed with my new mental model, I was ready to dive into Rust’s async ecosystem. But first, I had to choose a runtime.

Rust’s async/await syntax is built into the language, but it requires a runtime to execute async tasks. Think of it like this: Rust provides the grammar for writing async code, but you need an execution engine to run it.

The dominant choice is Tokio – Rust’s most mature async runtime. Adding it to my project felt like stepping into a parallel universe:

1
2
[dependencies]
tokio = { version = "1", features = ["full"] }

That "full" feature set was my first hint that async programming comes with complexity overhead. Tokio includes:

  • Task scheduler
  • Async I/O drivers (TCP, UDP, files)
  • Timers and timeouts
  • Multi-threaded work stealing
  • Async-aware synchronization primitives

It’s essentially an entire operating system for async tasks.

The #[tokio::main] Magic

My first async program looked deceptively simple:

1
2
3
4
5
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  println!("Hello, async world!");
  Ok(())
}

But that #[tokio::main] attribute was doing heavy lifting behind the scenes. It’s roughly equivalent to:

1
2
3
4
5
6
fn main() {
  let rt = tokio::runtime::Runtime::new().unwrap();
  rt.block_on(async {
    println!("Hello, async world!");
  })
}

The macro creates a Tokio runtime, starts the event loop, and runs my async main function to completion. I was no longer writing a traditional program – I was writing a collection of async tasks that would be orchestrated by the Tokio scheduler.


The Async Transformation Journey

Converting my threaded server to async wasn’t just a matter of adding .await everywhere. It required rethinking the entire control flow.

Step 1: The Basic Pattern Translation

My threaded main loop:

1
2
3
4
5
6
7
// Threaded version
loop {
  let (stream, addr) = listener.accept()?;
  pool.execute(move || {
    handle_client_wrapper(stream);
  });
}

Became:

1
2
3
4
5
6
7
// Async version
loop {
  let (stream, addr) = listener.accept().await?;
  tokio::spawn(async move {
    handle_client_wrapper(stream, addr).await;
  });
}

The pattern was similar, but the semantics were completely different:

  • listener.accept() was now non-blocking
  • tokio::spawn created a lightweight task, not an OS thread
  • The entire function needed to be async

Step 2: The Await Infection

This is where things got interesting. As soon as I made one function async, the change propagated through my entire codebase like a virus:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
// This function calls async I/O...
async fn handle_client(stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  let bytes_read = stream.read(&mut buffer).await?;  // ← .await required
  // ...
}

// So this function must be async too...
async fn handle_client_wrapper(stream: TcpStream, addr: SocketAddr) {
  if let Err(e) = handle_client(stream).await {  // ← .await required
    eprintln!("Error: {}", e);
  }
}

// And this function must be async...
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  // ...
  tokio::spawn(async move {
    handle_client_wrapper(stream, addr).await;  // ← .await required
  });
  // ...
}

Async is contagious. Once you start doing async I/O, every function in the call stack must be async and use .await. There’s no mixing async and sync I/O in the same execution path.

Step 3: The Borrowing Nightmare

But my real education in async Rust came when I hit the borrowing issues. Consider this seemingly innocent code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
async fn handle_client(mut stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  let mut buffer = [0u8; MSG_SIZE];
  
  let bytes_read = stream.read(&mut buffer).await?;
  let message = String::from_utf8_lossy(&buffer[..bytes_read]);
  
  // This should work, right?
  let parsed_seq = parse_hello_message(&message)?;
  
  // Send response...
  let response = format!("HELLO {}", parsed_seq + 1);
  stream.write_all(response.as_bytes()).await?;
  
  Ok(())
}

The compiler hit me with this:

1
2
3
error: `buffer` does not live long enough
borrowed value does not live long enough
argument requires that `buffer` is borrowed for `'static`

What? The error message was cryptic, but the issue was fundamental: async functions can be suspended and resumed. When an .await point is reached, the entire function state (including local variables) might be stored and restored later.

The borrow checker was protecting me from a subtle bug: what if message (which borrowed from buffer) outlived the function suspension point?

Step 4: Ownership Lessons in Async Context

The solution required thinking differently about data ownership in async contexts:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
async fn handle_client(mut stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  let mut buffer = [0u8; MSG_SIZE];
  
  let bytes_read = stream.read(&mut buffer).await?;
  
  // Instead of borrowing, create an owned String
  let message = String::from_utf8_lossy(&buffer[..bytes_read]).to_string();
  
  // Now we can safely use it across await points
  let parsed_seq = parse_hello_message(&message)?;
  
  let response = format!("HELLO {}", parsed_seq + 1);
  stream.write_all(response.as_bytes()).await?;
  
  Ok(())
}

The .to_string() call created an owned copy of the data, eliminating the borrowing dependency. This was my first lesson in async ownership patterns: when in doubt, prefer owned data over borrowed data across .await boundaries.


The Error Handling Evolution

Async programming introduced new complexity to error handling – not just because of syntax, but because of timeout management and error composition patterns.

The Timeout Reality

In threaded code, I could set socket timeouts and forget about them:

1
2
3
// Threaded version
stream.set_read_timeout(Some(Duration::from_secs(5)))?;
let bytes_read = stream.read(&mut buffer)?;  // Times out automatically

In async code, timeouts required explicit orchestration:

1
2
3
4
5
6
7
// Async version
use tokio::time::timeout;

let bytes_read = timeout(
  Duration::from_secs(5),
  stream.read(&mut buffer)
).await??;  // Note the double question mark!

That double question mark (??) was my introduction to nested error handling:

  • First ? unwraps the timeout result (Result<Result<T, E1>, Elapsed>)
  • Second ? unwraps the actual I/O result

Error Composition Patterns

This nested structure led me to discover several powerful error handling patterns in async contexts:

Pattern 1: Timeout Composition

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
async fn robust_handshake(
  stream: TcpStream,
) -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
  
  // Fine-grained timeouts for each phase
  let hello1 = timeout(Duration::from_secs(5), read_hello_message(&stream)).await??;
  let response = timeout(Duration::from_secs(2), send_response(&stream, hello1)).await??;
  let hello2 = timeout(Duration::from_secs(5), read_hello_message(&stream)).await??;
  
  validate_sequence(hello1, hello2)
}

Pattern 2: Hierarchical Timeouts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
async fn handle_client_with_timeouts(
  stream: TcpStream,
  addr: SocketAddr,
) -> Result<(), ClientError> {
  
  // Overall connection timeout wraps fine-grained timeouts
  let result = timeout(Duration::from_secs(30), async {
    robust_handshake(stream).await
  }).await;
  
  match result {
    Ok(Ok(())) => Ok(()),
    Ok(Err(e)) => Err(ClientError::Handshake(e)),
    Err(_) => Err(ClientError::Timeout(addr)),
  }
}

Pattern 3: Error Context Preservation

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#[derive(Debug)]
enum ClientError {
  Timeout(SocketAddr),
  Handshake(Box<dyn std::error::Error + Send + Sync>),
  Network(std::io::Error),
  Protocol(String),
}

// Transform errors to preserve context through async boundaries
async fn contextual_error_handling(
  stream: TcpStream,
  addr: SocketAddr,
) -> Result<(), ClientError> {
  
  let bytes_read = timeout(Duration::from_secs(5), stream.read(&mut buffer))
    .await
    .map_err(|_| ClientError::Timeout(addr))?  // Transform timeout error
    .map_err(ClientError::Network)?;           // Transform I/O error
    
  if bytes_read == 0 {
    return Err(ClientError::Protocol("Empty message".to_string()));
  }
  
  Ok(())
}

The Async Error Propagation Challenge

One subtle challenge I discovered was error propagation through spawned tasks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Problematic: Errors get lost in spawned tasks
async fn naive_approach() {
  for i in 0..100 {
    tokio::spawn(async move {
      risky_operation(i).await?;  // ← Error can't propagate to parent!
    });
  }
}

// Better: Collect and handle errors explicitly
async fn robust_approach() -> Result<(), Vec<TaskError>> {
  let tasks: Vec<_> = (0..100)
    .map(|i| tokio::spawn(async move { risky_operation(i).await }))
    .collect();
    
  let mut errors = Vec::new();
  for task in tasks {
    if let Err(e) = task.await.unwrap() {
      errors.push(TaskError::Operation(e));
    }
  }
  
  if errors.is_empty() { Ok(()) } else { Err(errors) }
}

This composability was powerful – I could set timeouts at any granularity, transform errors to preserve context, and handle failures gracefully across complex async operations.


Performance Testing: The Async Advantage

With my async server implemented, the moment of truth arrived. Time to see if the hype was real.

Comparative Results

I ran systematic load tests against all three implementations:

Implementation Max Concurrent Memory Usage CPU Usage Success Rate
Single-threaded ~50 12MB 100% (blocking) 100%
Thread Pool (8) 1,500 468MB 55% 85%
Async/Tokio 5,000+ 89MB 25% 99.8%

The async numbers were stunning:

  • 5x more concurrent connections than the thread pool
  • 80% less memory usage than the thread pool
  • Lower CPU utilization despite higher throughput
  • Better success rate under high load

Understanding the Victory

The memory difference revealed the fundamental architectural advantage:

Thread Pool: 8 threads × 2MB stack + connection buffers = ~468MB at 1,500 connections

Async: Single thread + task overhead (~2KB per task) = ~89MB at 5,000 connections

Each async task consumed roughly 1,000x less memory than a thread. System call analysis revealed the CPU efficiency came from dramatically fewer context switches and more efficient epoll usage.

Scaling Beyond Expectations

Pushing further, the async server handled 15,000 concurrent connections on my laptop – 10x more than the thread pool’s practical limit. Even at that scale, it consumed less memory than the thread pool at 1,500 connections.

Eventually, I hit different bottlenecks: file descriptor limits, network bandwidth, and OS scheduler overhead. But these were resource limits, not architectural limits. The async model had eliminated the artificial constraints of thread-based concurrency.


Scaling Beyond Expectations

Emboldened by the initial results, I pushed the async server to its limits:

1
2
3
4
5
6
# Let's see how far we can push this...
for clients in 1000 2000 5000 10000 15000; do
  echo "Testing with $clients concurrent clients..."
  ./stress_test_async.sh $clients
  sleep 5
done

The results were eye-opening:

Concurrent Clients Memory CPU Avg Response Time Success Rate
1,000 45MB 8% 12ms 100%
2,000 58MB 15% 18ms 100%
5,000 89MB 25% 28ms 99.8%
10,000 156MB 45% 45ms 98.5%
15,000 234MB 65% 78ms 95.2%

The async server handled 15,000 concurrent connections on my laptop – 10x more than the thread pool’s practical limit. Even at that scale, it consumed less memory than the thread pool at 1,500 connections.

The Bottleneck Evolution

Eventually, I hit new bottlenecks, but they were different bottlenecks:

  1. File descriptor limits: Even async tasks need file descriptors for sockets
  2. Network bandwidth: The physical network interface became the constraint
  3. Memory bandwidth: Copying data between user/kernel space for thousands of connections
  4. OS scheduler overhead: Even Tokio’s scheduler has limits

But these were resource limits, not architectural limits. The async model had eliminated the artificial constraints of thread-based concurrency.


Production Considerations: The Hidden Complexity

As I basked in the performance victory, I started encountering the hidden complexity of production async programming.

The Blocking Function Problem

One seemingly innocent change broke everything:

1
2
3
4
5
6
7
8
9
async fn handle_client(mut stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  // ... handshake logic ...
  
  // Let's add some logging to a file
  std::fs::write("connections.log", format!("Handled client at {}\n", 
                 SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs()))?;
  
  Ok(())
}

Suddenly, my server’s performance dropped by 90%. What happened?

std::fs::write is a blocking operation. In async code, blocking operations block the entire event loop. While one task was writing to the file, no other tasks could make progress. My 5,000-connection server was reduced to essentially single-threaded performance.

The solution required async-aware alternatives:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
use tokio::fs;

async fn handle_client(mut stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  // ... handshake logic ...
  
  // Use async file I/O instead
  fs::write("connections.log", format!("Handled client at {}\n", 
            SystemTime::now().duration_since(UNIX_EPOCH)?.as_secs())).await?;
  
  Ok(())
}

Every I/O operation in async code must be async-aware. This is both a strength (explicit asynchrony) and a complexity burden (can’t mix sync/async easily).

The Send + Sync Requirement

Another stumbling block came from error types:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// This compiles in single-threaded code...
async fn problematic_function() -> Result<(), std::io::Error> {
  let local_data = Rc::new(RefCell::new(vec![1, 2, 3]));
  
  some_async_operation().await?;
  
  local_data.borrow_mut().push(4);
  Ok(())
}

// But fails when spawned as a task:
tokio::spawn(async {
  problematic_function().await?;  // ← Compiler error!
});

The error message was intimidating:

1
2
error: future cannot be sent between threads safely
the trait `Send` is not implemented for `Rc<RefCell<Vec<i32>>>`

Tokio’s scheduler can move tasks between threads, so all data in async tasks must be Send + Sync. This forced me to learn about async-safe data structures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
use std::sync::{Arc, Mutex};

async fn corrected_function() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
  // Use Arc<Mutex<T>> instead of Rc<RefCell<T>>
  let shared_data = Arc::new(Mutex::new(vec![1, 2, 3]));
  
  some_async_operation().await?;
  
  shared_data.lock().unwrap().push(4);
  Ok(())
}

This was my introduction to async-aware concurrency primitives – a whole new layer of complexity beyond basic async/await.


When Async Isn’t the Answer

After all this async evangelism, I needed to acknowledge async programming’s limitations and trade-offs.

CPU-Bound Tasks: Async’s Achilles Heel

I tested my async server with a CPU-intensive variation:

1
2
3
4
5
6
7
8
9
async fn cpu_intensive_handshake(mut stream: TcpStream) -> Result<(), Box<dyn std::error::Error>> {
  // ... receive HELLO X ...
  
  // Simulate expensive computation (e.g., cryptographic verification)
  let result = (0..1_000_000).map(|i| i * i).sum::<u64>();
  
  // ... send response ...
  Ok(())
}

Performance collapsed immediately:

Implementation 100 Concurrent CPU Tasks CPU Usage Responsiveness
Thread Pool 8.5 seconds 100% (8 cores) Good
Async 42.3 seconds 100% (1 core) Terrible

Async tasks share a single thread by default. When one task does CPU-intensive work, it starves all other tasks. The cooperative scheduling model assumes tasks yield frequently through .await points.

The Healthcare Data Processing Reality

In healthcare contexts, this distinction matters enormously:

  • Async excels: Processing thousands of concurrent claim submissions with network I/O
  • Threads excel: Parallel analysis of large datasets with CPU-intensive computations
  • Hybrid approaches: Use async for I/O coordination, spawn blocking tasks for CPU work
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
// Hybrid pattern for healthcare data processing
async fn process_claims_batch(claims: Vec<Claim>) -> Result<(), Box<dyn std::error::Error>> {
  for claim in claims {
    // Async: Fetch patient data from EHR system
    let patient_data = fetch_patient_data(&claim.patient_id).await?;
    
    // CPU-intensive: Spawn blocking task for fraud analysis
    let fraud_result = tokio::task::spawn_blocking(move || {
      analyze_fraud_patterns(&claim, &patient_data)
    }).await?;
    
    // Async: Submit results to database
    submit_analysis_result(&claim.id, fraud_result).await?;
  }
  Ok(())
}

Complexity Tax

Async programming also comes with a complexity tax:

  • Learning curve: Understanding futures, tasks, and runtimes
  • Debugging difficulty: Stack traces through async boundaries are confusing
  • Ecosystem fragmentation: Not all libraries have async variants
  • Error handling complexity: Nested timeouts and error propagation

For simple applications or teams new to Rust, the threaded approach might be more appropriate despite lower theoretical performance.


The Mental Model Transformation

By the end of my async journey, my mental model had completely transformed.

Old Model: Threads as Workers

1
2
3
Thread 1: [Accept] → [Handle Client A] → [Wait for I/O] → [Blocked]
Thread 2: [Handle Client B] → [Wait for I/O] → [Blocked]
Thread 3: [Handle Client C] → [Wait for I/O] → [Blocked]

Each thread was a dedicated worker with exclusive resources.

New Model: Tasks as Work Units

1
Event Loop: [Task A: Read] → [Task B: Process] → [Task C: Write] → [Task A: Continue] → ...

The event loop was a work multiplexer, constantly switching between ready tasks.

The Cooperative Insight

The key insight was cooperation vs preemption:

  • Threads: OS forcibly switches between threads (preemptive multitasking)
  • Async: Tasks voluntarily yield control at .await points (cooperative multitasking)

This cooperation enabled much more efficient resource utilization but required disciplined programming – tasks must yield regularly and avoid blocking operations.


Async vs Threading: The Final Verdict

After implementing and testing all three approaches, clear patterns emerged:

Use Single-Threading When:

  • Prototyping or educational projects
  • Very low connection rates (< 10 concurrent)
  • Simplicity is more important than performance
  • Team is new to concurrency concepts

Use Thread Pools When:

  • Mixed I/O and CPU-intensive workloads
  • Moderate connection rates (100-1,000 concurrent)
  • Team is comfortable with traditional threading
  • Need to integrate with blocking libraries
  • Debugging and profiling are critical

Use Async When:

  • High connection rates (1,000+ concurrent)
  • I/O-bound workloads dominate
  • Memory efficiency is crucial
  • Team is willing to invest in async expertise
  • Modern Rust ecosystem compatibility is important

Healthcare Data Context

In healthcare systems, I would like to choose depending on the use case. My speculation – because I have not tested all these empirically yet – is:

  • Claims processing pipelines: Async excels (thousands of concurrent network requests)
  • Clinical decision support: Threading might be better (CPU-intensive analysis)
  • Patient data synchronization: Async wins (high-concurrency, I/O-bound)
  • Audit log analysis: Hybrid approach (async coordination + blocking computation)

The Road Ahead: Beyond Basic Async

My async handshake server was just the beginning. Real-world async applications involve additional complexities:

  • Backpressure management: Preventing fast producers from overwhelming slow consumers
  • Circuit breakers: Graceful degradation when downstream services fail
  • Rate limiting: Controlling resource consumption per client
  • Metrics and observability: Understanding async system behavior
  • Testing async code: Ensuring correctness under concurrency

But those are adventures for future projects. For now, I had achieved something significant: I understood the fundamental trade-offs between threading and async models, and I could choose the right tool for the job.


Reflection: The Async Awakening

The journey from threads to async wasn’t just about learning new syntax – it was about fundamentally changing how I think about concurrency.

Threading taught me: How to safely share resources between parallel execution contexts.

Async taught me: How to efficiently multiplex work over limited execution resources.

Both are valuable mental models. Threading maps naturally to how we think about parallel work in the real world. Async requires embracing a more abstract model of cooperative multitasking.

The performance results spoke for themselves:

  • Thread pool: 1,500 concurrent connections, 468MB memory
  • Async: 5,000+ concurrent connections, 89MB memory

But the deeper insight was architectural: async didn’t just perform better – it eliminated entire categories of resource bottlenecks.

As I looked at my async server humming along with thousands of concurrent connections, I felt like I had unlocked a new level of systems programming. Not because async is inherently superior, but because I now had multiple tools in my concurrency toolkit and understood when to use each one.

The handshake protocol that started as a Friday night curiosity had become a vehicle for exploring the deepest concepts in concurrent systems design. Not bad for a simple three-message exchange.

The async awakening was complete. Time to wrap up. The upcoming final episode will be a comprehensive postmortem for my entire journey of the Rust Handshake project.


Github Repository

Please check out Handshake project repository containing full (refactored) source codes.

https://github.com/SaehwanPark/rust-handshake