<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Resilience on vnykmshr</title><link>https://blog.vnykmshr.com/writing/tags/resilience/</link><description>Recent content in Resilience on vnykmshr</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 15 Nov 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.vnykmshr.com/writing/tags/resilience/index.xml" rel="self" type="application/rss+xml"/><item><title>autobreaker: adaptive circuit breaking</title><link>https://blog.vnykmshr.com/writing/autobreaker/</link><pubDate>Sat, 15 Nov 2025 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/autobreaker/</guid><description>&lt;p&gt;The &lt;a href="https://blog.vnykmshr.com/writing/circuit-breaking-go/"&gt;circuit breaker post&lt;/a&gt; from last year used a common trigger: trip after N consecutive failures. This works when traffic is predictable. It falls apart when it&amp;rsquo;s not.&lt;/p&gt;
&lt;p&gt;At 10,000 requests per second, 10 failures is noise &amp;ndash; a 0.1% error rate. A static threshold trips the circuit on what&amp;rsquo;s essentially a healthy service. At 10 requests per second, 10 failures is total collapse &amp;ndash; 100% error rate over one interval. The same threshold that false-positives under high traffic is too slow to protect under low traffic.&lt;/p&gt;</description></item><item><title>Circuit breaking in Go</title><link>https://blog.vnykmshr.com/writing/circuit-breaking-go/</link><pubDate>Sat, 28 Sep 2024 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/circuit-breaking-go/</guid><description>&lt;p&gt;A service calls a dependency. The dependency is slow or down. The service waits, ties up a goroutine, maybe a connection. Multiply that by every request in flight, and the caller is now as broken as the dependency it called.&lt;/p&gt;
&lt;p&gt;Circuit breaking stops this. Instead of waiting on something that&amp;rsquo;s failing, stop calling it. Let it recover. Try again later.&lt;/p&gt;
&lt;h2 id="three-states"&gt;Three states&lt;/h2&gt;
&lt;p&gt;A circuit breaker wraps external calls and tracks their outcomes.&lt;/p&gt;</description></item><item><title>PostgreSQL HA</title><link>https://blog.vnykmshr.com/writing/postgres-ha/</link><pubDate>Mon, 15 Mar 2021 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/postgres-ha/</guid><description>&lt;p&gt;PostgreSQL&amp;rsquo;s streaming replication is straightforward to set up. The documentation is clear, the configuration is well-understood, and base backups with &lt;code&gt;pg_basebackup&lt;/code&gt; work reliably.&lt;/p&gt;
&lt;p&gt;The operational problems are the hard part. They show up when the primary goes down and the automated failover does the wrong thing. Or when you promote a replica that&amp;rsquo;s silently been two hours behind. Or when you discover that backups you&amp;rsquo;ve been taking for months don&amp;rsquo;t actually restore.&lt;/p&gt;</description></item></channel></rss>