How systematic infrastructure auditing, right-sizing, and lifecycle policies turned our AWS bill into something that made the CFO smile.
Clay Levering
Engineering Leader at Blu Digital Group
There's a particular flavor of satisfaction that comes from making infrastructure cheaper and faster at the same time. Most people assume cost optimization means degrading service — accepting slower responses, less redundancy, or reduced capacity. In practice, the biggest savings usually come from eliminating waste that was never providing value in the first place.
Here's how we identified $180K+ in annual AWS savings at Blu Digital Group.
The first step is unglamorous but essential: actually understanding your AWS bill. Not the summary — the line items. Cost Explorer is your friend, but you need to slice the data by service, by tag, by usage type.
What we found wasn't unusual: a handful of services dominated spend, and within those services, a handful of resources were the primary cost drivers. The Pareto principle applies aggressively to cloud bills.
RDS Right-Sizing. Our production database instances were provisioned for peak load that happened maybe 2% of the time. By analyzing CloudWatch metrics over a 90-day window, we identified that our primary instance could drop two sizes without impacting the P99 response time. Combined with our query optimization work, the smaller instance actually performed better than the oversized one had.
ECS Task Definition Cleanup. We had task definitions reserving significantly more memory and CPU than the containers ever used. This wasn't just wasteful — it was limiting how many tasks could run per instance, which in turn was causing unnecessary auto-scaling. Tightening the reservations reduced our ECS spend and improved task scheduling density.
S3 Lifecycle Policies. Media processing generates a lot of intermediate artifacts — temporary transcodes, QC thumbnails, log files. We were retaining everything in Standard storage indefinitely. Implementing tiered lifecycle policies (Standard → Infrequent Access → Glacier → Delete) for different object types based on access patterns was straightforward and impactful.
Reserved Instance Strategy. For baseline compute that we knew we'd need for the foreseeable future, converting from On-Demand to Reserved Instances was essentially free money. The key is being conservative — only reserve what you're confident you'll use.
Not every optimization idea pans out. We explored Spot Instances for our transcoding workloads, but the interruption rate was too high for our SLA requirements. We also looked at Graviton-based instances, which showed promise but required more application testing than the savings justified at our scale.
Cloud cost optimization isn't a one-time project — it's a practice. Workloads change, pricing changes, new instance types become available. The most valuable thing we built wasn't any single optimization; it was the habit of reviewing spend monthly and the tooling to make that review meaningful.
Set up billing alerts. Tag your resources. Review Cost Explorer weekly. The savings compound.