{"id":2905,"date":"2026-06-03T09:30:00","date_gmt":"2026-06-03T09:30:00","guid":{"rendered":"https:\/\/mugnos-it.com\/?p=2905"},"modified":"2026-06-02T11:33:54","modified_gmt":"2026-06-02T11:33:54","slug":"is-cost-optimization-also-an-sre-responsibility","status":"publish","type":"post","link":"https:\/\/mugnos-it.com\/pt\/is-cost-optimization-also-an-sre-responsibility\/","title":{"rendered":"Is Cost Optimization Also an SRE Responsibility?"},"content":{"rendered":"<div data-elementor-type=\"wp-post\" data-elementor-id=\"2905\" class=\"elementor elementor-2905\" data-elementor-post-type=\"post\">\n\t\t\t\t<div class=\"elementor-element elementor-element-6d065e47 e-flex e-con-boxed e-con e-parent\" data-id=\"6d065e47\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-39610555 elementor-widget elementor-widget-text-editor\" data-id=\"39610555\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>The SRE has <em>reliability<\/em> right there in the name. So is that all it&#8217;s about?<\/p>\n\n\n\n<p>For anyone who&#8217;s been following this newsletter for a while, you already know my take: the SRE has always had a broader scope than the title suggests. It&#8217;s one of the roles with the most comprehensive, generalist vision in engineering \u2014 capable of adapting across very different contexts and problems. I jokingly call it the platypus of tech. A strange, hybrid creature that doesn&#8217;t fit neatly into any single category.<\/p>\n\n\n\n<p>And now people are asking: should the SRE also be responsible for managing costs? well, &#8220;Managing costs&#8221; sounds strong when you say it out loud. but optimize costs looks fair, doesn&#8217;t it ?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">The SRE Already Has the Vision<\/h3>\n\n\n\n<p>If you look at SRE principles, the role already covers observability, embrace risk, and a deep understanding of architecture. That comprehensive vision \u2014 understanding what&#8217;s running, how much it&#8217;s consuming, where the risks are and that is exactly what you need to spot cost optimization opportunities.<\/p>\n\n\n\n<p>But here&#8217;s where things get confused: If you mention resilience in a meeting someone will immediately picture bigger machines, more instances, more capacity new datacenters&#8230; But that&#8217;s not how it works. Real resilience comes from automation, scalable applications, and smart design \u2014 not from throwing hardware at the problem. And the same effort that makes a system more resilient often makes it leaner. Reducing cost is, in many cases, a natural outcome of doing SRE work well.<\/p>\n\n\n\n<p>Think about it this way: what&#8217;s better \u2014 an active\/active cluster with two powerful servers, or several smaller servers behind an autoscaling group? With autoscaling, you get elasticity. The fleet of small servers grows when demand spikes and shrinks when traffic drops. At 3am on a Sunday, you&#8217;re not paying for capacity nobody is using. That&#8217;s resilience <em>and<\/em> cost efficiency coming from the same decision.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">What the SRE Can Actually Do<\/h3>\n\n\n\n<p>Let me get specific, because this is where it gets practical.<\/p>\n\n\n\n<p><strong>Capacity forecasting<\/strong> is already an SRE practice. Understanding how much compute a service needs today, and how that changes with new demand, is core to the role. That same analysis also tells you if you&#8217;re running 40% over-provisioned right now \u2014 and paying for it every hour.<\/p>\n\n\n\n<p>From there, it&#8217;s a natural step to ask: could we use a cheaper disk for this workload? Do we actually need an Intel x86 instance, or would an AMD give us the same performance at a lower price? Are we on AWS? Graviton (ARM-based) instances are significantly cheaper for the right workloads \u2014 and many services run perfectly on them.<\/p>\n\n\n\n<p>The SRE can also work with the team on <strong>capacity commitments<\/strong>. When you have visibility into capacity trends, you can confidently commit to a certain level of compute and unlock 20-60% in discounts for 1-3 years. That conversation starts with data the SRE already has.<\/p>\n\n\n\n<p>And then there&#8217;s <strong>backup\/DR\/Contingency strategy<\/strong>. Based on your RTO and RPO, what&#8217;s the cheapest approach that still meets the requirement? Are you paying for hourly snapshots on a service where 24-hour recovery is perfectly acceptable? Are old backups sitting around with no retention policy? These are questions the SRE is already positioned to answer \u2014 they just require being looked at through a cost lens as well.<\/p>\n\n\n\n<p>Of course, the SRE still needs business context to make the right calls. But the technical foundation is already there.<\/p>\n\n\n\n<p>Recently I was at the AWS User Group meetup in Campinas talking about cost reduction. And the part that stuck with me wasn&#8217;t what I said \u2014 it was the questions. Engineers weren&#8217;t asking &#8220;isn&#8217;t this a FinOps thing?&#8221; They were asking &#8220;how do I start this conversation at my company?&#8221; and &#8220;where do I even begin?&#8221; That told me everything. SREs already have the mindset and the technical tools to tackle this. What&#8217;s missing is the confidence to own it.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">This Is Only Getting More Relevant<\/h3>\n\n\n\n<p>Since the early days of cloud adoption, cost efficiency has been a growing priority. And now, with AI agents simplifying infrastructure orchestration and teams doing more with fewer people, the question of &#8220;how much are we optimizing?&#8221; is coming up more and more.<\/p>\n\n\n\n<p>The SRE who can connect reliability decisions to cost outcomes \u2014 who can say &#8220;this architecture choice improves resilience <em>and<\/em> reduces the bill&#8221; \u2014 is the one who gets listened to beyond the incident room.<\/p>\n\n\n\n<p>You don&#8217;t need a dedicated FinOps team to get started. Start with your own systems and ask yourself:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Are you using the right compute, or are you oversized?<\/li>\n\n\n\n<li>Are there resources provisioned for old projects that were never decommissioned?<\/li>\n\n\n\n<li>Do you have a retention policy, or is old backup data just accumulating?<\/li>\n\n\n\n<li>Have you evaluated other processor architecture (intel vs amd vs arm) for your workloads?<\/li>\n\n\n\n<li>Do you know what your top cost drivers are right now?<\/li>\n\n\n\n<li>Are you paying for a proprietary database engine where an open-source alternative would do the job?<\/li>\n\n\n\n<li>Does your workload actually need VMs, or would containers or a serverless solution be a better fit?<\/li>\n\n\n\n<li>Are you applying efficient elasticity to your resources, or is your infrastructure static regardless of demand?<\/li>\n\n\n\n<li>Do you have cost alerts configured at different granularities \u2014 per service, per resource, usage trends, and budget forecasts?<\/li>\n<\/ul>\n\n\n\n<p>If you can&#8217;t answer those questions confidently, that&#8217;s your starting point.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">So, What&#8217;s My Take?<\/h3>\n\n\n\n<p>Cost and resilience aren&#8217;t competing priorities \u2014 they&#8217;re two sides of the same decision. The best infrastructure recovers fast, scales on demand, and doesn&#8217;t burn budget when no one is watching. You already have the technical foundation to think about both. All it takes is looking at your systems with that second lens open.<\/p>\n\n\n\n<p>According to the <a href=\"https:\/\/resources.flexera.com\/web\/pdf\/Flexera-State-of-the-Cloud-Report-2026.pdf\">Flexera State of the Cloud Report 2026<\/a>, <strong>29% of cloud spend is wasted<\/strong> \u2014 and after five years of decline, that number went back up, driven by growing cost complexity from AI and new IaaS and PaaS services. That&#8217;s not a FinOps problem. That&#8217;s an engineering problem. And it&#8217;s sitting inside the same systems you&#8217;re already responsible for.<\/p>\n\n\n\n<p>Basically&#8230; the waste is just there waiting for you to get fixed.<\/p>\n\n\n\n<p>I&#8217;d love to hear your take on this. Do you think the SRE should avoid absorbing cost optimization and just hand it off to someone who has no idea how your application actually works? Or does it make more sense to keep that responsibility close to the people who truly understand the system?<\/p>\n\n\n\n<p>Reply and share your thoughts \u2014 I read every response.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p>Cheers,<\/p>\n\n\n\n<p>Douglas Mugnos<\/p>\n\n\n\n<p>MUGNOS-IT \ud83d\ude80<\/p>\n\n\n\n<p><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-1769c54 e-flex e-con-boxed e-con e-parent\" data-id=\"1769c54\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>","protected":false},"excerpt":{"rendered":"<p>The SRE has reliability right there in the name. So is that all it&#8217;s about? For anyone who&#8217;s been following this newsletter for a while, you already know my take: the SRE has always had a broader scope than the title suggests. It&#8217;s one of the roles with the most comprehensive, generalist vision in engineering [&hellip;]<\/p>","protected":false},"author":3,"featured_media":2906,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2905","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/mugnos-it.com\/wp-content\/uploads\/2026\/06\/ChatGPT-Image-2-de-jun.-de-2026-08_31_47.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/comments?post=2905"}],"version-history":[{"count":4,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2905\/revisions"}],"predecessor-version":[{"id":2910,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2905\/revisions\/2910"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/media\/2906"}],"wp:attachment":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/media?parent=2905"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/categories?post=2905"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/tags?post=2905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}