May 23, 2024

Upgrading the CrowdSec Infrastructure to Support IPv6-Only Users

In 2024, the imperative of IPv6 compatibility looms large. CrowdSec’s unwavering commitment to technical excellence has driven us to tackle this challenge head-on.

In case you didn’t know, the CrowdSec Security Engine has been IPv6 compatible since day one, and it can handle IPv6 threats and decisions. Nonetheless, our infrastructure needs to catch up to allow our IPv6-only users to set up CrowdSec without hiccups.

As we said, we’re in 2024, so let’s fix this! Let’s look at what we’re running to allow the Security Engine to function correctly that might interfere with our plan to fully serve our IPv6-only users:

The CrowdSec Hub is a repository (hosted on GitHub) that allows the Security Engine to fetch scenarios and parsers and receive updates for those.
An API that receives attack-related metadata allowing us to build our curated blocklists and CTI information

Time to dive in and explore the solutions we came up with to overcome our API and Hub obstacles.

The API

As stated, we expose a few API endpoints to receive attack-related metadata, allowing us to build our highly curated blocklists and CTI information. Those endpoints are exposed using the API Gateway service on AWS. As of today, the API Gateway is an IPv4-only service.

Architecture

This is what our very, very simplified logical architecture looks like. On purpose, we only show the simplified input stages that are relevant to this article.

We chose to set up our API Gateway in EDGE mode since we have customers all over the world. And we know that, in EDGE mode, AWS automatically creates a managed Cloudfront distribution in front of our API Gateway. CloudFront is compatible with IPV6, so it should be easy to activate the option for an API Gateway in EDGE mode, right? Well, not really — there is no option to activate it here. The managed CloudFront distribution doesn’t expose any option, so we’ll have to deploy our solution.

CloudFront

We’ll use Cloudfront as it supports IPv6, and we can use it as a reverse proxy for our API Gateways for minimal architecture and source code modification. So this is what we need to do:

Create a CloudFront distribution with the IPv6 option activated. As we have one domain name for all API Gateways, we can only create one distribution.
Create origins and behaviors for each API Gateway.
Create A and AAAA records in the Hosted Zone of Route53 to route IPv4 and IPv6 traffic to CloudFront distribution.

This is what the target architecture should look like:

Let’s take a look at a Terraform snippet:

For each API Gateway, we need to create an origin block that points to the execute-api URL of the API Gateway and gives it an origin ID. Then, we need an ordered_cache_behavior block that maps a path_pattern to a target origin.

URL rewrite

So, now, a GET on https://api.crowdsec.net/v2/decisions/stream will be routed to https:///v2/decisions/stream. This is fine as long as the API Gateway stage we target is called v2 because the stage is part of the URL path with the API Gateway. As we don’t use multiple stages on our API Gateways, we managed to use v2 and v3 stages, so we weren’t required to rewrite the URLs, except for …

The infamous dot

The .well-known path cannot be defined as a stage name because of the forbidden dot in the name. Fortunately enough, CloudFront now offers an easy way to rewrite URLs using Lambda@Edge (not so easy) or CloudFront Functions, which are easier but limited. You can change the origins and the viewer request with Lambda@Edge, but you can only change viewer requests with CloudFront Functions. But the latter is much easier to deploy, as we’ll see now.

Here’s a Terraform snippet of a CloudFront Function that will simply remove the /.well-known part from the URL. The CloudFront origin will need the stage name origin_path, and the behavior will need to reference this function.

So now you’re good to go; create the A and AAAA records pointing to your CloudFront distribution, and you’re done.

The client IP tale

Interesting plot twist — the client IP (or the viewer IP in CloudFront terminology) is important, and somehow, we lost it in the process! We used to get it from the lambda event in the requestContext.identity.sourceIp field, but now we only get CloudFront IP there. As we already saw, a managed CloudFront distribution exists in front of each API Gateway because of the EDGE mode. Adding another CloudFront distribution in front of the managed CloudFront, API Gateway can’t do its magic anymore, and we get the IP of the new CloudFront as the client IP.

We see two options for that.

X-Forwarded-For header

This header will contain the true client IP. The logic is documented, and CloudFront will add its IP (and so on) to the right of the header’s content. As we have two CloudFront distributions, there should be at least three addresses in the header, and we need to extract the third starting from the right. We choose not to use this solution because it will break in case we change the infrastructure again (like, let’s be crazy, AWS decides to natively support IPv6 in API Gateway).

CloudFront function, again

We’ll use a CloudFront function to add a new header containing the true client IP. Here’s the snippet.

As for the rewrite_url function, the ordered_cache_behavior will have to reference the function.

API Gateway and IPv6: Verdict

What was supposed to be an infrastructure formality proved to be more complicated than necessary. Especially since some cloud providers decide to make their customers pay for IPv4 addresses, we would have expected an easy solution that “just works.” A few days later and after a few round trips to solve some unexpected issues, it does, so we hope this might be useful for someone else.

The Hub

Having addressed the API concerns, our attention shifted toward optimizing the Hub’s infrastructure. We’ve streamlined the process by utilizing GitHub for repository management and CloudFront as a Content Delivery Network (CDN). This strategic choice prevents our users from encountering GitHub’s rate limits. However, an initial assessment revealed a significant challenge: essential data for parsers and scenarios, such as IP allowlists and the GeoIP MMDB database, was dispersed and not uniformly accessible over IPv6.

We’ve reorganized and centralized these resources to resolve this, establishing a dedicated S3 bucket paired with its own CloudFront CDN. This repository, accessible at https://hub-data.crowdsec.net, ensures that all necessary data for Hub components is readily available.

And to make sure that this data is stays always up to date, we implemented new processes:

A Continuous Integration (CI) job is responsible for regularly uploading allowlists and other data to the hub-data S3 bucket.
A daily cron job updates the GeoIP MMDB database in the hub-data S3 bucket, ensuring that our data reflects the latest IP geography mappings.

These measures guarantee that our hub remains a reliable and up-to-date resource for our users.

To sum up

The journey to allow our IPv6 users to set up CrowdSec in the easiest way possible may not have been as straightforward and “easy” as we expected! Nonetheless, we successfully navigated the complexities of IPv6 integration through meticulous architectural adjustments, utilizing solutions like CloudFront, centralizing resources, and implementing processes such as CI and cron jobs.

Ensuring the availability and accuracy of critical data within our infrastructure is a top priority for us, and we hope we achieved that for our IPv6 users. If you have any feedback or questions, don’t hesitate to reach out to us on Discord or Discourse.