Logo
Published on

Cache to save ca$h

809 words5 min read
Authors
  • avatar
    Name
    Phil
    Twitter
    lead engineer @ vatsim

Introduction

A few of you might be familiar with the now infamous 'CloudFlare: checking your connection is secure' pages:

CloudFlare Security Check

...what you may not be familiar with is why they're becoming more commonplace across the VATSIM service network.

Saturday 3rd June 2023

Saturday the 3rd of June, 2023, started like any other day on VATSIM, a lot of eager pilots desperate to get out westbound across the ocean. There were a few European events lined up that day too - it was going to be a busy day, pretty much like every Saturday morning (UTC) on the network.

Around 11:30AM UTC we started getting a lot of emails from Cloudflare informing us that it had entered mitigation mode, and was actively blocking traffic hitting our network edge. This traffic was distributed across the world, and was targeting myVATSIM, making it unavailable to our users in the process. These emails normally wouldn't be a cause for concern, but the delayed reaction time of Cloudflare (due to some configured thresholds) meant that our backend services supporting myVATSIM had taken the brunt of the first part of the attack. They were struggling...but a quick check on the Cloudflare dashboard, some tweaked rules, some managed challenges here and there, and the traffic was dropping off about 20 minutes later.

So here's the value in Cloudflare already...they have a network capable of sustaining several TBps of DDoS attack, and all for a measly $20 a month? Seems insane right?

Caching to save cash

Cloudflare not only acts as a security guard for the network, but as a perimeter caching device.

In a typical week we process 40+ million requests to our web services. Most of that comes from people scraping the VATSIM data feed (it accounts for approximately 40% of our traffic).

The data feed

The data feed serves approximately 18M requests a week, with about 10TB of bandwidth used, so if you do the maths, in a month thats over 70M requests, and 40TB of bandwidth.

So, as Nickelback didn't sing...look at this graaaaaph:

Data Feed Stats

DigitalOcean (our object storage provider), kindly gives us 1TB of traffic per month for free, but that leaves (in this example) 39TB of bandwidth to pay for, and at a cost of $0.01/GB, that will result in a bill of $390/month on bandwidth alone.

Enter Cloudflare...

You can see from the graph above, not only are we saving bandwidth, but 99% of requests are returning instantly from Cloudflare, giving better performance which in turn allows your services which use the datafeed to get the data so much quicker. Unlike VATSIM, which houses most of its compute in Toronto, Canada, Cloudflare has many datacentres all over the planet. This means you'll be getting the data from a location nearby, resulting in better loading times, and, for applications, faster responses to data queries.

Cloudflare doesn't charge for bandwidth, you're charged for the features you wish to use, so for our measly $20 a month, we're saving $370/month purely on bandwidth costs, and that's just for the data feed!

myVATSIM

myVATSIM is the central service all of our users make use of to learn about events, pre-file flight plans, get news from the VATSIM marketing team, complete exams, and get access to learning material. All of these hugely important features are critical to the normal operation of the network.

Back to the graphs...

myVATSIM Flood

It's quite obvious where the attack happened, right? but you can see from the orange line, shortly after the attack started to happen, Cloudflare not only applied mitigations to stop the traffic, but it started caching an enormous amount of assets to reduce the load on the backend.

myVATSIM isn't as bandwidth heavy, it uses about 125GB/month, but processes around 7M requests.

myVATSIM Cache Stats

Using Cloudflare to cache commonly accessed resources gives us a nice cache hit rate on a typical day of 93% meaning less stress on the backend servers, and better performance for our users.

We're not stopping there though, cache doesn't solve everything, and the team is constantly making improvements to myVATSIM to make the experience faster and more reliable.

Conclusion

As a tech team, we're constantly looking at new ways to leverage the power of cloud services to improve the experience for our users, implement exciting new features, and provide great tools for the VATSIM developer community.

Cloudflare is one such service; without it the network would be so much more expensive to run - we'd probably need 5 or 10 times as many servers! It has an incredible amount of features for its low price point, from caching and security tools, to simple one click website deploys, from GDPR cookie opt-in banners to running compute functions at edge.

...at $20/month it's really a no brainer for any high traffic site.