How to process 2.6 billion daily events with CrowdSec?
In this article, we will see how to achieve high performance in log processing and push limits with horizontal scaling.
At CrowdSec , we like challenges. And one of them is high logs volume processing, to protect web infrastructures with larger traffic.
Very often those websites have a centralized log infrastructure and therefore must use one CrowdSec instance otherwise they would lose some detection capabilities. This is because some scenarios leverage leaky buckets, meaning they have to accumulate log entries that have the same given criteria (IP, Range, Country, etc) to trigger properly and this is difficult to distribute to many workers.
This is where using only one CrowdSec agent (IDS) to parse heaps of log lines starts to be challenging. Even though one can always get a bigger machine, trees don’t grow to the sky and we will always find log volumes we can’t manage (not to mention high availability issues).
Our goal here is to optimize the configuration and propose a horizontally scalable architecture to be able to address even the biggest use cases.
We used Ubuntu 22.04 virtual machines, on AWS, with 2 sizes: 2vCPU and 8vCPU.
To set a baseline, we set up Nginx then we used a standard CrowdSec installation (1.4 pre-release), "out of the box" with crowdsecurity/base-http and crowdsecurity/nginx collections that were automatically installed along with the CrowdSec agent.
We also used a homemade log generator with rate control, simulating several IP addresses and both normal and suspicious behaviors in order to trigger scenarios. We deliberately chose a “worst-case” scenario as HTTP, making those logs resource-intensive to parse and are the ones where we wanted to apply multiple complex scenarios.
Temperature and air pressure
Nah, just kidding
With out-of-the-box installation and 75% of CPU usage as a target (remember in this scenario, those servers are dedicated to processing logs), we achieved an initial parsing rate of 250 Events/Seconds (EPS) per vCPU. (hence 500 EPS on our 2vCPU and 2000 EPS on the 8vCPU). This could project to a maximum of around 20 million log lines parsed, per day, per vCPU.
This configuration fits most users and doesn't particularly address performance even if we have a strong focus on the agent (IDS) performance optimizations. Keep in mind that the base collection tries - amongst other things - to act as a poor-man’s WAF, which comes along with a tag price: evaluating a *lot* of regular expressions. We are looking with much interest at the improvements on regexp performance that go 1.19 promises to deliver.
However, one can fine-tune the list of scenarios to achieve better performance. Removing the scenario crowdsecurity/http-bad-user-agent known to be costly, we can manage 650 EPS with 2vCPU and 2800 EPS with 8vCPU.
We consider those very high-traffic websites to have a particular setup: a subset of scenarios and/or specific scenarios and already a CDN with a log parsing solution in-house. We can build a specific CrowdSec configuration, optimized to deal with this environment.
Considered use case: high traffic website, using Fastly, with crawler detection scenario (crowdsecurity/http-crawl-non_statics). This scenario detects aggressive crawl of non-static pages, no gentle indexing bot here but rather someone’s attempt to copy your content. We use the Fastly Syslog connector to get already parsed logs in JSON format.
In this configuration, we can manage 5000 EPS with 2vCPU and 16000 EPS with 8vCPU
So, compared with baseline this is quite amazing, right? Is this magic or have we measured wrong? Well, it’s explained because we consume logs that are already parsed in JSON so we save a lot of work (remember regexp that are cheap but come in numbers…) and also we use only a subset of scenarios. Most of the work is actually related to bucket management.
We also found that, on the 8vCPU machine, increasing the number of parser routines and bucket routines to 10 allows CrowdSec to better use the CPU. Those parameters control the number of go routines responsible for parsing and buckets management and are located in the config.yaml file:
We measure memory usage to sit somewhere between 1.2GB and 1.5GB, on both machine sizes. Memory usage is mostly related to the number of leaky buckets in memory and here we had around 15 000 buckets at the same time. In a test with only 256 distinct IP addresses - so only 256 buckets - the memory consumption was as low as 150MB.
Memory consumption doesn’t seem to be a limiting factor even in high-volume use cases.
Alright, so we can manage 15k events per second with a big, yet reasonable, AWS machine. What if I need to consume even more like 30k or 100k events per second?
As much as we would like to just let many CrowdSec agents read logs in parallel, life is not that simple. Scenarios are stateful - they have the notion of leaky buckets so logs with given criteria (like IP, Range, or Country) have to stick to the same agent. Therefore we need a sharding solution to dispatch logs to the right CrowdSec Agent and make sure that logs with the same criteria will be managed by the same agent.
The proposed architecture uses Rsyslog as a sharding solution. Here's a configuration sample that enables Rsyslog to forward logs to another 2 destinations, according to the client's IP.
Logs are consumed by 2 (or more) workers, each with a Rsyslog instance configured to write logs to a file which is then consumed by a CrowdSec Agent. After parsing the agents send decisions to one or more CrowdSec LAPI(s).
Testing multi-agent configuration
We tested horizontally scaling architecture in the following configuration:
- Dispatcher 2 vCPU running data injector + Rsyslog
- 2 x 8 vCPU workers running Rsyslog + Crowdsec agent
- LAPI 2 vCPU
The architecture scaled linearly, and we were able to inject 30k EPS with a mean CPU usage on the workers of 75%. Dispatcher and LAPI weren’t overloaded.
High performance is possible with CrowdSec and most importantly limits can be pushed with horizontal scaling. However, in these particular use-cases, it is required to have a custom-tailored deployment to get the best out of CrowdSec. Yet parsing the equivalent of 2.6 billion daily log lines (or Events to be more accurate), is achievable while being resource conservative and consuming only 20 vCPU. (Keep in mind that 2.6B EPS means you probably run thousands of web servers). This narrows down to a processing speed of ~130 million daily EPS per vCPU.