Nov 25, 2025

From Over-Engineered to Obvious: Simplifying HAProxy SPOA Architecture

When I first built the HAProxy SPOA Remediation Component, I got ambitious.

I wanted something horizontally scalable, future-proof, and flexible enough to support features I had not even designed yet. I ended up with a parent and worker architecture, a custom IPC (Inter-Process Communication) mechanism, a bespoke TCP protocol, and an admin socket that could spawn new workers on demand.

On paper, it looked clever.

In reality, most users just wanted one fast and reliable HAProxy SPOA listener. They did not want to learn a mini distributed system to debug why a decision failed.

This is how I transitioned from a multi-process, IPC-heavy design to a single SPOA listener backed by goroutines and shared in-process state, and why I now prefer boring and obvious over clever and abstract.

The Original Design: Parent and Worker With Shared State

What I was trying to achieve

My original goals for the HAProxy SPOA component were:

Use multiple CPU cores efficiently
Avoid external dependencies such as Redis
Keep global decisions and configuration in one place, including:
- Blocked IPs and ranges
- Country ISO codes
- Per hostname behaviour, such as ban or captcha

From these goals, I designed a parent and worker model.

Parent process:

Central authority for the shared state
Knows about IP decisions, country codes, and per hostname configuration
Coordinates multiple workers

Worker processes:

Talk directly to HAProxy via SPOE over TCP or Unix sockets
For each SPOE request, call the parent to fetch decisions and configuration

I also imagined future use cases:

Multiple SPOA listeners per node
Dynamic worker spawning under high load
An admin socket that could:
- Spawn new workers without restarting
- Update per hostname configuration at runtime

In short, I had built a small SPOA platform instead of a straightforward remediation component.

How the IPC worked

To make this design work, I added:

A Unix socket between the parent and workers
A custom TCP protocol using gob to encode and decode Go structs

Each SPOE request followed roughly this path:

HAProxy → worker via SPOE (TCP or Unix)
Worker → parent via Unix socket with a gob encoded request
Parent → worker with a gob encoded response including IP decision, ISO code, and hostname config
Worker → HAProxy with the SPOE response

So every request involved an extra round-trip inside the server, using a custom protocol.

Where It Hurt: Complexity, Fragility, and Latency

The architecture worked on a good day, but the trade-offs were not worth it.

1. Code bloat and cognitive overload

Supporting the IPC architecture required:

An API layer for the parent
A server implementation to handle worker requests
A worker client that:
- Managed the Unix socket lifecycle
- Spoke the custom TCP protocol
- Serialized and deserialized structs using gob

Each layer had its own error handling, reconnect logic, and edge cases.

Over time, the codebase grew far beyond the size of the actual problem. New contributors and colleagues had to untangle:

Which part of the stack owned which responsibility
Whether an error came from HAProxy → SPOA or worker → parent
Where the state was stored and how it was retrieved

The architecture itself became a barrier. People spent more time learning how the system worked than adding features.

2. Fragile error propagation

gob and a homemade protocol made error handling harder than it needed to be.

If the parent hit an error while handling a worker request:

There was no simple, robust way to send that error back over the gob stream
A partially written message could corrupt the stream
The safest option was usually to close the connection

From the worker side:

It was waiting for a response that would never arrive
Eventually, it noticed the broken connection and reconnected
This added extra overhead during failures:
- Reconnect logic
- Re-initialisation of state
- Latency spikes

In the worst case:

A worker could end up stuck waiting forever
The parent had no clean way to recover that worker
With a single worker configured, the SPOA could silently stop answering

So I had an architecture that was meant to handle failure, but in practice could fail in subtle and hard-to-diagnose ways.

3. Hidden performance costs

I originally chose the parent and worker model to avoid an external state store like Redis, assuming local IPC would be cheap.

In practice:

gob encoding and decoding added overhead and pressure on the garbage collector
Each request required:
- A network round-trip between HAProxy and the worker
- An IPC round-trip between the worker and the parent
The system did more work per request than needed

The punchline is simple: hardly anyone used multiple SPOA listeners.

SPOE is already very efficient. Most users pointed HAProxy at a single listener and moved on. I had designed for multi-listener scaling that no one needed.

When AppSec WAF Exposed the Limitations

The real breaking point was adding AppSec support for the CrowdSec WAF.

To inspect HTTP requests properly, including uploads, I needed to pass much richer data through the pipeline:

Request headers
Method and URL
Potentially large request bodies

With the parent and worker IPC in place, this meant:

Serialising much larger structs using gob
Pushing large payloads over the Unix socket
Increasing memory usage for no real benefit

The architecture that already felt heavy for simple decision lookups was actively fighting AppSec support.

This was the inflexion point. The design that was meant to keep things flexible had become the main blocker for new features.

The New Design: One SPOA Listener, Many Goroutines

At that stage, I asked a simpler question.

What do users actually do with this HAProxy SPOA remediation component?

The answer was consistent:

Run a single SPOA listener
Expect it to be fast and reliable
Expect predictable decisions and AppSec behaviour

So I rebuilt the architecture around those real-world expectations.

High-level flow

The new design is intentionally straightforward:

HAProxy communicates with a single SPOA listener over TCP or Unix
Inside the process:
- A goroutine-based handler receives the request
- It reads shared in process state:
  - Blocked IPs and ranges
  - Country codes
  - Per hostname configuration, such as ban or captcha
The handler responds directly to HAProxy with the SPOE decision

There is now:

No parent process
No worker processes
No internal IPC
No custom TCP protocol between internal components

Just one Go process that uses goroutines and shared memory.

What changed internally

Concretely, I made three key changes:

Workers became goroutines
All request handling happens in a single process. Each connection or request runs in one or more goroutines that share the same memory.
Single SPOA listener
Instead of parent plus workers, there is a single listener capable of handling many concurrent requests.
Admin socket and IPC removed
There is no runtime spawning of processes and no Unix socket between parent and workers any more.

The refactor happened in three stages:

Converting worker logic into goroutine-based handlers
Consolidating everything around a single SPOA listener
Removing the admin socket, IPC protocol, and related code entirely

Operational changes

One notable operational change came out of this refactor.

The SPOA process now runs as a low-privilege user crowdsec-spoa. This means:

Configuration files must be readable by that user
Permissions need to be checked when upgrading

These changes are documented in the changelog and migration notes. Apart from that, deployment is simpler because there is no parent and worker orchestration to manage.

The Impact: Less Code, Better Behaviour

The results are quite clear.

Codebase size and complexity

More than 3,000 lines of code were removed
The new implementation provides the same behaviour in roughly 650 lines

That reduction is not just cosmetic. It gives:

Fewer moving parts
Fewer abstractions to understand
Fewer places where bugs can hide

New contributors can now read and understand the whole SPOA path in a single sitting.

Performance and reliability

The new design improves both performance and reliability:

Lower latency
There is no internal IPC hop per request and no gob encoding or decoding on the hot path.
Better throughput
Go routines and in-process state scale more naturally than juggling multiple external worker processes.
Lower memory usage
Large request structures are not serialised and shipped across Unix sockets, which matters a lot when handling request bodies for WAF rules.
Simpler error handling
Errors are handled using normal Go control flow. There is no need to encode failure states into a custom protocol.

What I Learned About Premature Complexity

Looking back, a few mistakes stand out.

1. Designing for hypothetical scale

I optimised for:

Multiple SPOA listeners
Dynamic worker spawning
A parent process that managed shared state

In practice, most users ran a single listener and never asked for internal autoscaling.

I built a mini orchestrator to solve a scaling problem that did not exist.

2. Underestimating SPOE

A lot of the complexity came from not fully trusting how performant SPOE already is.

Because I was unsure a single listener would be enough, I added:

Process-level parallelism
IPC-based state sharing
An admin socket for dynamic scaling

In reality, SPOE combined with a well-written Go process can handle a lot of traffic on its own. I should have started with that and waited for real users to hit limits before adding more.

3. Overcomplicating state sharing

I tried to avoid an external key value store such as Redis by building a state sharing layer within the parent, workers, and a custom protocol.

In hindsight, a better approach would have been:

Start with a simple in-process state for one listener
If multiple listeners were ever truly required, then:
- Introduce an external store like Redis
- Keep the SPOA processes themselves as simple as possible

I tried to solve the distributed state before I actually had a distributed deployment.

How I Will Approach Scale Next Time

I am not against multi-listener deployments. There will be environments where that makes sense.

Next time, I will:

Start with a single listener that is easy to reason about
Keep the state in memory to begin with
Only introduce an external store when:
- There is a real need for multiple instances
- I have real metrics and constraints to work with

If and when I need more scale, I will:

Use Redis or another key-value store for decisions
Avoid introducing custom in-house IPC layers
Keep SPOA instances as stateless and replaceable as possible

The rule is simple: introduce complexity only when real usage demands it.

Takeaways for HAProxy and Go Engineers

If you work with HAProxy, SPOE, or Go, here are the key lessons I would highlight.

Keep the first version boring
One process, one listener, shared memory, goroutines. See how far that takes you before you reach for more.
Let real users drive abstraction
If nobody is asking for multiple listeners, admin sockets, or fancy IPC, you probably do not need them yet or ever.
Refactors that remove categories of failure are worth it
This was not just a tidy-up. It removed entire classes of bugs and operational issues.

If You Use the HAProxy SPOA Remediation Component

If you are already using this component:

Try the new single listener design in 0.2.0
See how it behaves under your traffic patterns
Let me know if you ever hit a point where one listener is not enough

That kind of real-world constraint is the right trigger to discuss scaling out, not a theoretical future that might never arrive.

Until then, I will keep choosing simple designs that solve real problems today rather than clever designs aimed at imaginary ones.