🚫BLOCK FRIDAY: Protect your systems & get 30% OFF the Blocklist Bundle.

Get discount now
haproxy remediation component

From Over-Engineered to Obvious: Simplifying HAProxy SPOA Architecture

When I first built the HAProxy SPOA Remediation Component, I got ambitious.

I wanted something horizontally scalable, future-proof, and flexible enough to support features I had not even designed yet. I ended up with a parent and worker architecture, a custom IPC (Inter-Process Communication) mechanism, a bespoke TCP protocol, and an admin socket that could spawn new workers on demand.

On paper, it looked clever.

In reality, most users just wanted one fast and reliable HAProxy SPOA listener. They did not want to learn a mini distributed system to debug why a decision failed.

This is how I transitioned from a multi-process, IPC-heavy design to a single SPOA listener backed by goroutines and shared in-process state, and why I now prefer boring and obvious over clever and abstract.

The Original Design: Parent and Worker With Shared State

What I was trying to achieve

My original goals for the HAProxy SPOA component were:

  • Use multiple CPU cores efficiently
  • Avoid external dependencies such as Redis
  • Keep global decisions and configuration in one place, including:
    • Blocked IPs and ranges
    • Country ISO codes
    • Per hostname behaviour, such as ban or captcha

From these goals, I designed a parent and worker model.

Parent process:

  • Central authority for the shared state
  • Knows about IP decisions, country codes, and per hostname configuration
  • Coordinates multiple workers

Worker processes:

  • Talk directly to HAProxy via SPOE over TCP or Unix sockets
  • For each SPOE request, call the parent to fetch decisions and configuration

I also imagined future use cases:

  • Multiple SPOA listeners per node
  • Dynamic worker spawning under high load
  • An admin socket that could:
    • Spawn new workers without restarting
    • Update per hostname configuration at runtime

In short, I had built a small SPOA platform instead of a straightforward remediation component.

How the IPC worked

To make this design work, I added:

  • A Unix socket between the parent and workers
  • A custom TCP protocol using gob to encode and decode Go structs

Each SPOE request followed roughly this path:

  1. HAProxy → worker via SPOE (TCP or Unix)
  2. Worker → parent via Unix socket with a gob encoded request
  3. Parent → worker with a gob encoded response including IP decision, ISO code, and hostname config
  4. Worker → HAProxy with the SPOE response

So every request involved an extra round-trip inside the server, using a custom protocol.

Where It Hurt: Complexity, Fragility, and Latency

The architecture worked on a good day, but the trade-offs were not worth it.

1. Code bloat and cognitive overload

Supporting the IPC architecture required:

  • An API layer for the parent
  • A server implementation to handle worker requests
  • A worker client that:
    • Managed the Unix socket lifecycle
    • Spoke the custom TCP protocol
    • Serialized and deserialized structs using gob

Each layer had its own error handling, reconnect logic, and edge cases.

Over time, the codebase grew far beyond the size of the actual problem. New contributors and colleagues had to untangle:

  • Which part of the stack owned which responsibility
  • Whether an error came from HAProxy → SPOA or worker → parent
  • Where the state was stored and how it was retrieved

The architecture itself became a barrier. People spent more time learning how the system worked than adding features.

2. Fragile error propagation

gob and a homemade protocol made error handling harder than it needed to be.

If the parent hit an error while handling a worker request:

  • There was no simple, robust way to send that error back over the gob stream
  • A partially written message could corrupt the stream
  • The safest option was usually to close the connection

From the worker side:

  • It was waiting for a response that would never arrive
  • Eventually, it noticed the broken connection and reconnected
  • This added extra overhead during failures:
    • Reconnect logic
    • Re-initialisation of state
    • Latency spikes

In the worst case:

  • A worker could end up stuck waiting forever
  • The parent had no clean way to recover that worker
  • With a single worker configured, the SPOA could silently stop answering

So I had an architecture that was meant to handle failure, but in practice could fail in subtle and hard-to-diagnose ways.

3. Hidden performance costs

I originally chose the parent and worker model to avoid an external state store like Redis, assuming local IPC would be cheap.

In practice:

  • gob encoding and decoding added overhead and pressure on the garbage collector
  • Each request required:
    • A network round-trip between HAProxy and the worker
    • An IPC round-trip between the worker and the parent
  • The system did more work per request than needed

The punchline is simple: hardly anyone used multiple SPOA listeners.

SPOE is already very efficient. Most users pointed HAProxy at a single listener and moved on. I had designed for multi-listener scaling that no one needed.

When AppSec WAF Exposed the Limitations

The real breaking point was adding AppSec support for the CrowdSec WAF.

To inspect HTTP requests properly, including uploads, I needed to pass much richer data through the pipeline:

  • Request headers
  • Method and URL
  • Potentially large request bodies

With the parent and worker IPC in place, this meant:

  • Serialising much larger structs using gob
  • Pushing large payloads over the Unix socket
  • Increasing memory usage for no real benefit

The architecture that already felt heavy for simple decision lookups was actively fighting AppSec support.

This was the inflexion point. The design that was meant to keep things flexible had become the main blocker for new features.

The New Design: One SPOA Listener, Many Goroutines

At that stage, I asked a simpler question.

What do users actually do with this HAProxy SPOA remediation component?

The answer was consistent:

  • Run a single SPOA listener
  • Expect it to be fast and reliable
  • Expect predictable decisions and AppSec behaviour

So I rebuilt the architecture around those real-world expectations.

High-level flow

The new design is intentionally straightforward:

  1. HAProxy communicates with a single SPOA listener over TCP or Unix
  2. Inside the process:
    • A goroutine-based handler receives the request
    • It reads shared in process state:
      • Blocked IPs and ranges
      • Country codes
      • Per hostname configuration, such as ban or captcha
  3. The handler responds directly to HAProxy with the SPOE decision

There is now:

  • No parent process
  • No worker processes
  • No internal IPC
  • No custom TCP protocol between internal components

Just one Go process that uses goroutines and shared memory.

What changed internally

Concretely, I made three key changes:

  1. Workers became goroutines
    All request handling happens in a single process. Each connection or request runs in one or more goroutines that share the same memory.
  2. Single SPOA listener
    Instead of parent plus workers, there is a single listener capable of handling many concurrent requests.
  3. Admin socket and IPC removed
    There is no runtime spawning of processes and no Unix socket between parent and workers any more.

The refactor happened in three stages:

  • Converting worker logic into goroutine-based handlers
  • Consolidating everything around a single SPOA listener
  • Removing the admin socket, IPC protocol, and related code entirely

Operational changes

One notable operational change came out of this refactor.

The SPOA process now runs as a low-privilege user crowdsec-spoa. This means:

  • Configuration files must be readable by that user
  • Permissions need to be checked when upgrading

These changes are documented in the changelog and migration notes. Apart from that, deployment is simpler because there is no parent and worker orchestration to manage.

The Impact: Less Code, Better Behaviour

The results are quite clear.

Codebase size and complexity

  • More than 3,000 lines of code were removed
  • The new implementation provides the same behaviour in roughly 650 lines

That reduction is not just cosmetic. It gives:

  • Fewer moving parts
  • Fewer abstractions to understand
  • Fewer places where bugs can hide

New contributors can now read and understand the whole SPOA path in a single sitting.

Performance and reliability

The new design improves both performance and reliability:

  • Lower latency
    There is no internal IPC hop per request and no gob encoding or decoding on the hot path.
  • Better throughput
    Go routines and in-process state scale more naturally than juggling multiple external worker processes.
  • Lower memory usage
    Large request structures are not serialised and shipped across Unix sockets, which matters a lot when handling request bodies for WAF rules.
  • Simpler error handling
    Errors are handled using normal Go control flow. There is no need to encode failure states into a custom protocol.

What I Learned About Premature Complexity

Looking back, a few mistakes stand out.

1. Designing for hypothetical scale

I optimised for:

  • Multiple SPOA listeners
  • Dynamic worker spawning
  • A parent process that managed shared state

In practice, most users ran a single listener and never asked for internal autoscaling.

I built a mini orchestrator to solve a scaling problem that did not exist.

2. Underestimating SPOE

A lot of the complexity came from not fully trusting how performant SPOE already is.

Because I was unsure a single listener would be enough, I added:

  • Process-level parallelism
  • IPC-based state sharing
  • An admin socket for dynamic scaling

In reality, SPOE combined with a well-written Go process can handle a lot of traffic on its own. I should have started with that and waited for real users to hit limits before adding more.

3. Overcomplicating state sharing

I tried to avoid an external key value store such as Redis by building a state sharing layer within the parent, workers, and a custom protocol.

In hindsight, a better approach would have been:

  1. Start with a simple in-process state for one listener
  2. If multiple listeners were ever truly required, then:
    • Introduce an external store like Redis
    • Keep the SPOA processes themselves as simple as possible

I tried to solve the distributed state before I actually had a distributed deployment.

How I Will Approach Scale Next Time

I am not against multi-listener deployments. There will be environments where that makes sense.

Next time, I will:

  • Start with a single listener that is easy to reason about
  • Keep the state in memory to begin with
  • Only introduce an external store when:
    • There is a real need for multiple instances
    • I have real metrics and constraints to work with

If and when I need more scale, I will:

  • Use Redis or another key-value store for decisions
  • Avoid introducing custom in-house IPC layers
  • Keep SPOA instances as stateless and replaceable as possible

The rule is simple: introduce complexity only when real usage demands it.

Takeaways for HAProxy and Go Engineers

If you work with HAProxy, SPOE, or Go, here are the key lessons I would highlight.

  1. Keep the first version boring
    One process, one listener, shared memory, goroutines. See how far that takes you before you reach for more.
  2. Let real users drive abstraction
    If nobody is asking for multiple listeners, admin sockets, or fancy IPC, you probably do not need them yet or ever.
  3. Refactors that remove categories of failure are worth it
    This was not just a tidy-up. It removed entire classes of bugs and operational issues.

If You Use the HAProxy SPOA Remediation Component

If you are already using this component:

  • Try the new single listener design in 0.2.0
  • See how it behaves under your traffic patterns
  • Let me know if you ever hit a point where one listener is not enough

That kind of real-world constraint is the right trigger to discuss scaling out, not a theoretical future that might never arrive.

Until then, I will keep choosing simple designs that solve real problems today rather than clever designs aimed at imaginary ones.

WRITTEN BY

You may also like

open source waf
Inside CrowdSec

CrowdSec WAF: From First Steps to Advanced Deployments

Secure apps with CrowdSec WAF: start with virtual patching, extend with CRS, add custom rules, and scale to enterprise protection.

cybersecurity effectiveness: crowdsec metrics
Inside CrowdSec

Measuring Cybersecurity Defense Effectiveness with CrowdSec Remediation Metrics

Discover how CrowdSec Remediation Metrics turn blocked attacks into actionable insights, optimized defenses, & demonstrate measurable results.

What Our Community Built with CrowdSec WAF: Real Stories, Real Security
Inside CrowdSec

What Our Community Built with CrowdSec WAF: Real Stories, Real Security

Discover how users around the world are deploying CrowdSec WAF across Kubernetes, cloud, and on-prem environments.