{"id":2686,"date":"2026-03-18T09:30:00","date_gmt":"2026-03-18T09:30:00","guid":{"rendered":"https:\/\/mugnos-it.com\/?p=2686"},"modified":"2026-03-10T12:27:10","modified_gmt":"2026-03-10T12:27:10","slug":"static-stability-is-your-infrastructure-truly-resilient","status":"publish","type":"post","link":"https:\/\/mugnos-it.com\/pt\/static-stability-is-your-infrastructure-truly-resilient\/","title":{"rendered":"Static Stability: Is Your Infrastructure Truly Resilient?"},"content":{"rendered":"<div data-elementor-type=\"wp-post\" data-elementor-id=\"2686\" class=\"elementor elementor-2686\" data-elementor-post-type=\"post\">\n\t\t\t\t<div class=\"elementor-element elementor-element-6e240664 e-flex e-con-boxed e-con e-parent\" data-id=\"6e240664\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-119b323f elementor-widget elementor-widget-text-editor\" data-id=\"119b323f\" data-element_type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\n<p>Today I want to bring you a reflection that, if you haven\u2019t had it yet, <strong>you probably will soon<\/strong> \u2014 especially if you work (or want to work) with distributed systems, high availability, or cloud environments.<\/p>\n\n\n\n<p>After all\u2026 <strong>does using one, two, or even three data centers really make your application more resilient and therefore increase your SLA?<\/strong><\/p>\n\n\n\n<p>And what if I told you that, in practice, <strong>it can actually increase the risk of downtime<\/strong> if it\u2019s not well planned?<\/p>\n\n\n\n<p>Sounds strange, right? But that\u2019s exactly the core idea behind the concept of <strong>static stability<\/strong>. I talked about this in a YouTube video some time ago and i saw it was a new concept for a lot of people.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">What Is Static Stability?<\/h2>\n\n\n\n<p>It\u2019s the <strong>ability of your system to keep operating even when part of it fails<\/strong>. And more than that: to keep operating in a stable way, without collapsing the rest of the infrastructure.<\/p>\n\n\n\n<p>This concept comes from \u201ctraditional\u201d engineering \u2014 like civil or automotive engineering. Imagine a car designed to keep running with three wheels if one blows out. The system (the car) adapts and keeps moving, even if with limitations.<\/p>\n\n\n\n<p>Now think with me:<\/p>\n\n\n\n<p>If I build a car with four wheels, I also create <strong>four failure points<\/strong>. And if one tire blows? The car may need to reduce from 100 km\/h to 20 km\/h \u2014 or even stop completely until the tire is replaced.<\/p>\n\n\n\n<p><strong>In that case, wouldn\u2019t a tricycle actually be more resilient?<\/strong> It can go 100 km\/h like the car, but with 1\/4 less chance of failure, since it has only 3 failure points \ud83d\udc40<\/p>\n\n\n\n<p>Sounds crazy, but this is exactly the kind of logic we need to apply when designing distributed systems. <strong>More components don\u2019t always mean more safety.<\/strong><\/p>\n\n\n\n<p>Now think: <strong>can your software system actually do this?<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Classic \u201cFake Resilient\u201d Mistake<\/h2>\n\n\n\n<p>People love to draw that beautiful diagram:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>Users \u2192 Load Balancer \u2192 3 AZs\n<\/code><\/pre>\n\n\n\n<p>And proudly say: \u201cWe\u2019re running in three availability zones. We\u2019re resilient!\u201d<\/p>\n\n\n\n<p><strong>Is that really true?<\/strong> Let\u2019s bring it to reality\u2026<\/p>\n\n\n\n<p>Imagine you have <strong>3 data centers<\/strong> (AZ-A, AZ-B, and AZ-C), and each one is operating at <strong>75% of its resource capacity<\/strong>.<\/p>\n\n\n\n<p>(By the way, a quick note: when we talk about an AZ \u2014 Availability Zone \u2014 we mean an isolated availability zone within a region. In practice, it may contain more than one physical data center, but architecturally you should treat it as a single failure point. After all, they\u2019re geographically close and often share energy, network, or even environmental dependencies. \ud83c\udf29\ufe0f)<\/p>\n\n\n\n<p>Going back to the car analogy: in this case, you have a tricycle with 3 possible failure points.<\/p>\n\n\n\n<p>Now what happens if <strong>AZ-C goes down?<\/strong> How do you fit the <strong>75% load<\/strong> that was running there into the other two AZs?<\/p>\n\n\n\n<p>Let\u2019s do the math:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>AZ-A and AZ-B were already at 75%.<\/li>\n\n\n\n<li>Now each must absorb 37.5% more (half of AZ-C\u2019s 75%).<\/li>\n\n\n\n<li>Result? <strong>AZ-A and AZ-B jump to 112.5% utilization.<\/strong><\/li>\n<\/ol>\n\n\n\n<p>And then what happens?<\/p>\n\n\n\n<p>\ud83d\udca5 <strong>Downtime.<\/strong><\/p>\n\n\n\n<p>\ud83d\udd25 <strong>Cascading failures.<\/strong><\/p>\n\n\n\n<p>And if you also implemented automatic retries \u201cto be resilient,\u201d that\u2019s when chaos really spreads \u2014 the system starts drowning in itself.<\/p>\n\n\n\n<p>(Spoiler: I have a nearly 1-hour class just about <em>retry<\/em> inside onde of my architecture trainings, because if it\u2019s not implemented properly, it can create massive problems.)<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">But Douglas, can\u2019t we just scale?<\/h2>\n\n\n\n<p>Yes\u2026 <strong>if there\u2019s time.<\/strong><\/p>\n\n\n\n<p>But what I usually see in real-world architectures is that scaling takes longer than traffic rebalancing between the failed zone and the remaining ones.<\/p>\n\n\n\n<p>If traffic redirected to the other AZs is intense and your autoscaling takes time to react, overload hits first. CPU spikes to 100%, threads lock, queues explode.<\/p>\n\n\n\n<p>You\u2019ve seen this movie before, right?<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">So what is a statically stable system?<\/h2>\n\n\n\n<p>It\u2019s a system where <strong>each part<\/strong> (for example, each AZ) <strong>has enough capacity to operate on its own<\/strong> if the others fail.<\/p>\n\n\n\n<p>Meaning: if one zone goes down, the remaining ones can absorb the total load without exceeding 100%.<\/p>\n\n\n\n<p>So, if you have 3 data centers, make sure they operate at a maximum of 60%. That way, if one fails, the other two go up to 90% and can start scaling additional resources in the healthy AZs.<\/p>\n\n\n\n<p>That\u2019s static stability.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">To conclude<\/h2>\n\n\n\n<p>Static stability is just one of many critical resilience concepts \u2014 along with load shedding, backpressure, retry strategies, graceful degradation, capacity planning, and proper SLO design.<\/p>\n\n\n\n<p>As SREs, Staff, or senior engineers, we need to deeply understand these principles if we truly want to design and operate reliable systems \u2014 not just architectures that look good on diagrams.<\/p>\n\n\n\n<p>Resilience is not about adding more zones or more components. It\u2019s about understanding limits, failure modes, and system behavior under stress.<\/p>\n\n\n\n<p>If you want to go deeper into these topics and truly master modern reliability engineering, stay tuned.<\/p>\n\n\n\n<p>More content coming soon. \ud83d\ude80<\/p>\n\n\n\n<p><\/p>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t<div class=\"elementor-element elementor-element-4bd94cc e-flex e-con-boxed e-con e-parent\" data-id=\"4bd94cc\" data-element_type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>","protected":false},"excerpt":{"rendered":"<p>Today I want to bring you a reflection that, if you haven\u2019t had it yet, you probably will soon \u2014 especially if you work (or want to work) with distributed systems, high availability, or cloud environments. After all\u2026 does using one, two, or even three data centers really make your application more resilient and therefore [&hellip;]<\/p>","protected":false},"author":3,"featured_media":2687,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2686","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/mugnos-it.com\/wp-content\/uploads\/2026\/03\/ChatGPT-Image-10-de-mar.-de-2026-08_33_37.png","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2686","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/comments?post=2686"}],"version-history":[{"count":4,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2686\/revisions"}],"predecessor-version":[{"id":2691,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/posts\/2686\/revisions\/2691"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/media\/2687"}],"wp:attachment":[{"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/media?parent=2686"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/categories?post=2686"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mugnos-it.com\/pt\/wp-json\/wp\/v2\/tags?post=2686"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}