When I first built the HAProxy SPOA Remediation Component, I got ambitious.
I wanted something horizontally scalable, future-proof, and flexible enough to support features I had not even designed yet. I ended up with a parent and worker architecture, a custom IPC (Inter-Process Communication) mechanism, a bespoke TCP protocol, and an admin socket that could spawn new workers on demand.
On paper, it looked clever.
In reality, most users just wanted one fast and reliable HAProxy SPOA listener. They did not want to learn a mini distributed system to debug why a decision failed.
This is how I transitioned from a multi-process, IPC-heavy design to a single SPOA listener backed by goroutines and shared in-process state, and why I now prefer boring and obvious over clever and abstract.
The Original Design: Parent and Worker With Shared State
What I was trying to achieve
My original goals for the HAProxy SPOA component were:
- Use multiple CPU cores efficiently
- Avoid external dependencies such as Redis
- Keep global decisions and configuration in one place, including:
- Blocked IPs and ranges
- Country ISO codes
- Per hostname behaviour, such as ban or captcha
- Blocked IPs and ranges
From these goals, I designed a parent and worker model.
Parent process:
- Central authority for the shared state
- Knows about IP decisions, country codes, and per hostname configuration
- Coordinates multiple workers
Worker processes:
- Talk directly to HAProxy via SPOE over TCP or Unix sockets
- For each SPOE request, call the parent to fetch decisions and configuration
I also imagined future use cases:
- Multiple SPOA listeners per node
- Dynamic worker spawning under high load
- An admin socket that could:
- Spawn new workers without restarting
- Update per hostname configuration at runtime
- Spawn new workers without restarting
In short, I had built a small SPOA platform instead of a straightforward remediation component.
How the IPC worked
To make this design work, I added:
- A Unix socket between the parent and workers
- A custom TCP protocol using gob to encode and decode Go structs
Each SPOE request followed roughly this path:
- HAProxy → worker via SPOE (TCP or Unix)
- Worker → parent via Unix socket with a gob encoded request
- Parent → worker with a gob encoded response including IP decision, ISO code, and hostname config
- Worker → HAProxy with the SPOE response
So every request involved an extra round-trip inside the server, using a custom protocol.
Where It Hurt: Complexity, Fragility, and Latency
The architecture worked on a good day, but the trade-offs were not worth it.
1. Code bloat and cognitive overload
Supporting the IPC architecture required:
- An API layer for the parent
- A server implementation to handle worker requests
- A worker client that:
- Managed the Unix socket lifecycle
- Spoke the custom TCP protocol
- Serialized and deserialized structs using gob
- Managed the Unix socket lifecycle
Each layer had its own error handling, reconnect logic, and edge cases.
Over time, the codebase grew far beyond the size of the actual problem. New contributors and colleagues had to untangle:
- Which part of the stack owned which responsibility
- Whether an error came from HAProxy → SPOA or worker → parent
- Where the state was stored and how it was retrieved
The architecture itself became a barrier. People spent more time learning how the system worked than adding features.
2. Fragile error propagation
gob and a homemade protocol made error handling harder than it needed to be.
If the parent hit an error while handling a worker request:
- There was no simple, robust way to send that error back over the gob stream
- A partially written message could corrupt the stream
- The safest option was usually to close the connection
From the worker side:
- It was waiting for a response that would never arrive
- Eventually, it noticed the broken connection and reconnected
- This added extra overhead during failures:
- Reconnect logic
- Re-initialisation of state
- Latency spikes
- Reconnect logic
In the worst case:
- A worker could end up stuck waiting forever
- The parent had no clean way to recover that worker
- With a single worker configured, the SPOA could silently stop answering
So I had an architecture that was meant to handle failure, but in practice could fail in subtle and hard-to-diagnose ways.
3. Hidden performance costs
I originally chose the parent and worker model to avoid an external state store like Redis, assuming local IPC would be cheap.
In practice:
- gob encoding and decoding added overhead and pressure on the garbage collector
- Each request required:
- A network round-trip between HAProxy and the worker
- An IPC round-trip between the worker and the parent
- A network round-trip between HAProxy and the worker
- The system did more work per request than needed
The punchline is simple: hardly anyone used multiple SPOA listeners.
SPOE is already very efficient. Most users pointed HAProxy at a single listener and moved on. I had designed for multi-listener scaling that no one needed.
When AppSec WAF Exposed the Limitations
The real breaking point was adding AppSec support for the CrowdSec WAF.
To inspect HTTP requests properly, including uploads, I needed to pass much richer data through the pipeline:
- Request headers
- Method and URL
- Potentially large request bodies
With the parent and worker IPC in place, this meant:
- Serialising much larger structs using gob
- Pushing large payloads over the Unix socket
- Increasing memory usage for no real benefit
The architecture that already felt heavy for simple decision lookups was actively fighting AppSec support.
This was the inflexion point. The design that was meant to keep things flexible had become the main blocker for new features.
The New Design: One SPOA Listener, Many Goroutines
At that stage, I asked a simpler question.
What do users actually do with this HAProxy SPOA remediation component?
The answer was consistent:
- Run a single SPOA listener
- Expect it to be fast and reliable
- Expect predictable decisions and AppSec behaviour
So I rebuilt the architecture around those real-world expectations.
High-level flow
The new design is intentionally straightforward:
- HAProxy communicates with a single SPOA listener over TCP or Unix
- Inside the process:
- A goroutine-based handler receives the request
- It reads shared in process state:
- Blocked IPs and ranges
- Country codes
- Per hostname configuration, such as ban or captcha
- Blocked IPs and ranges
- A goroutine-based handler receives the request
- The handler responds directly to HAProxy with the SPOE decision
There is now:
- No parent process
- No worker processes
- No internal IPC
- No custom TCP protocol between internal components
Just one Go process that uses goroutines and shared memory.
What changed internally
Concretely, I made three key changes:
- Workers became goroutines
All request handling happens in a single process. Each connection or request runs in one or more goroutines that share the same memory. - Single SPOA listener
Instead of parent plus workers, there is a single listener capable of handling many concurrent requests. - Admin socket and IPC removed
There is no runtime spawning of processes and no Unix socket between parent and workers any more.
The refactor happened in three stages:
- Converting worker logic into goroutine-based handlers
- Consolidating everything around a single SPOA listener
- Removing the admin socket, IPC protocol, and related code entirely
Operational changes
One notable operational change came out of this refactor.
The SPOA process now runs as a low-privilege user crowdsec-spoa. This means:
- Configuration files must be readable by that user
- Permissions need to be checked when upgrading
These changes are documented in the changelog and migration notes. Apart from that, deployment is simpler because there is no parent and worker orchestration to manage.
The Impact: Less Code, Better Behaviour
The results are quite clear.
Codebase size and complexity
- More than 3,000 lines of code were removed
- The new implementation provides the same behaviour in roughly 650 lines
That reduction is not just cosmetic. It gives:
- Fewer moving parts
- Fewer abstractions to understand
- Fewer places where bugs can hide
New contributors can now read and understand the whole SPOA path in a single sitting.
Performance and reliability
The new design improves both performance and reliability:
- Lower latency
There is no internal IPC hop per request and no gob encoding or decoding on the hot path. - Better throughput
Go routines and in-process state scale more naturally than juggling multiple external worker processes. - Lower memory usage
Large request structures are not serialised and shipped across Unix sockets, which matters a lot when handling request bodies for WAF rules. - Simpler error handling
Errors are handled using normal Go control flow. There is no need to encode failure states into a custom protocol.
What I Learned About Premature Complexity
Looking back, a few mistakes stand out.
1. Designing for hypothetical scale
I optimised for:
- Multiple SPOA listeners
- Dynamic worker spawning
- A parent process that managed shared state
In practice, most users ran a single listener and never asked for internal autoscaling.
I built a mini orchestrator to solve a scaling problem that did not exist.
2. Underestimating SPOE
A lot of the complexity came from not fully trusting how performant SPOE already is.
Because I was unsure a single listener would be enough, I added:
- Process-level parallelism
- IPC-based state sharing
- An admin socket for dynamic scaling
In reality, SPOE combined with a well-written Go process can handle a lot of traffic on its own. I should have started with that and waited for real users to hit limits before adding more.
3. Overcomplicating state sharing
I tried to avoid an external key value store such as Redis by building a state sharing layer within the parent, workers, and a custom protocol.
In hindsight, a better approach would have been:
- Start with a simple in-process state for one listener
- If multiple listeners were ever truly required, then:
- Introduce an external store like Redis
- Keep the SPOA processes themselves as simple as possible
- Introduce an external store like Redis
I tried to solve the distributed state before I actually had a distributed deployment.
How I Will Approach Scale Next Time
I am not against multi-listener deployments. There will be environments where that makes sense.
Next time, I will:
- Start with a single listener that is easy to reason about
- Keep the state in memory to begin with
- Only introduce an external store when:
- There is a real need for multiple instances
- I have real metrics and constraints to work with
- There is a real need for multiple instances
If and when I need more scale, I will:
- Use Redis or another key-value store for decisions
- Avoid introducing custom in-house IPC layers
- Keep SPOA instances as stateless and replaceable as possible
The rule is simple: introduce complexity only when real usage demands it.
Takeaways for HAProxy and Go Engineers
If you work with HAProxy, SPOE, or Go, here are the key lessons I would highlight.
- Keep the first version boring
One process, one listener, shared memory, goroutines. See how far that takes you before you reach for more. - Let real users drive abstraction
If nobody is asking for multiple listeners, admin sockets, or fancy IPC, you probably do not need them yet or ever. - Refactors that remove categories of failure are worth it
This was not just a tidy-up. It removed entire classes of bugs and operational issues.
If You Use the HAProxy SPOA Remediation Component
If you are already using this component:
- Try the new single listener design in 0.2.0
- See how it behaves under your traffic patterns
- Let me know if you ever hit a point where one listener is not enough
That kind of real-world constraint is the right trigger to discuss scaling out, not a theoretical future that might never arrive.
Until then, I will keep choosing simple designs that solve real problems today rather than clever designs aimed at imaginary ones.



