Feb 1, 2025 - 6 MIN READ

Finding $180K in AWS Savings Without Breaking Anything

How systematic infrastructure auditing, right-sizing, and lifecycle policies turned our AWS bill into something that made the CFO smile.

Clay Levering

Engineering Leader at Blu Digital Group

There's a particular flavor of satisfaction that comes from making infrastructure cheaper and faster at the same time. Most people assume cost optimization means degrading service — accepting slower responses, less redundancy, or reduced capacity. In practice, the biggest savings usually come from eliminating waste that was never providing value in the first place.

Here's how we identified $180K+ in annual AWS savings at Blu Digital Group.

Start With the Bill

The first step is unglamorous but essential: actually understanding your AWS bill. Not the summary — the line items. Cost Explorer is your friend, but you need to slice the data by service, by tag, by usage type.

What we found wasn't unusual: a handful of services dominated spend, and within those services, a handful of resources were the primary cost drivers. The Pareto principle applies aggressively to cloud bills.

The Big Wins

RDS Right-Sizing. Our production database instances were provisioned for peak load that happened maybe 2% of the time. By analyzing CloudWatch metrics over a 90-day window, we identified that our primary instance could drop two sizes without impacting the P99 response time. Combined with our query optimization work, the smaller instance actually performed better than the oversized one had.

ECS Task Definition Cleanup. We had task definitions reserving significantly more memory and CPU than the containers ever used. This wasn't just wasteful — it was limiting how many tasks could run per instance, which in turn was causing unnecessary auto-scaling. Tightening the reservations reduced our ECS spend and improved task scheduling density.

S3 Lifecycle Policies. Media processing generates a lot of intermediate artifacts — temporary transcodes, QC thumbnails, log files. We were retaining everything in Standard storage indefinitely. Implementing tiered lifecycle policies (Standard → Infrequent Access → Glacier → Delete) for different object types based on access patterns was straightforward and impactful.

Reserved Instance Strategy. For baseline compute that we knew we'd need for the foreseeable future, converting from On-Demand to Reserved Instances was essentially free money. The key is being conservative — only reserve what you're confident you'll use.

What Didn't Work

Not every optimization idea pans out. We explored Spot Instances for our transcoding workloads, but the interruption rate was too high for our SLA requirements. We also looked at Graviton-based instances, which showed promise but required more application testing than the savings justified at our scale.

The Meta-Lesson

Cloud cost optimization isn't a one-time project — it's a practice. Workloads change, pricing changes, new instance types become available. The most valuable thing we built wasn't any single optimization; it was the habit of reviewing spend monthly and the tooling to make that review meaningful.

Set up billing alerts. Tag your resources. Review Cost Explorer weekly. The savings compound.

From the Engine Room to the Eng Team: What the Navy Taught Me About Leadership

How maintaining electrical systems on an aircraft carrier shaped my approach to engineering management, incident response, and building teams that perform under pressure.