<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ops on vnykmshr</title><link>https://blog.vnykmshr.com/writing/categories/ops/</link><description>Recent content in Ops on vnykmshr</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 27 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.vnykmshr.com/writing/categories/ops/index.xml" rel="self" type="application/rss+xml"/><item><title>The dismissal</title><link>https://blog.vnykmshr.com/writing/the-dismissal/</link><pubDate>Fri, 27 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/the-dismissal/</guid><description>&lt;p&gt;A validation layer that checks 3 of 4 fields is worse than one that checks none.&lt;/p&gt;
&lt;p&gt;Zero checks, the developer tests everything. Three checks, they assume the fourth is covered. That gap &amp;ndash; between nothing and almost everything &amp;ndash; is where the actual damage hides.&lt;/p&gt;
&lt;p&gt;I keep running into this. Filed a security report recently &amp;ndash; clear bug, one-line fix, obvious PoC. Response: &amp;ldquo;not applicable.&amp;rdquo; The code did exactly what I said it did. But the team&amp;rsquo;s threat model said &amp;ldquo;caller is trusted,&amp;rdquo; and three other fields had validation, so the missing one looked intentional. It wasn&amp;rsquo;t. It was just the one nobody got to.&lt;/p&gt;</description></item><item><title>Trust boundaries</title><link>https://blog.vnykmshr.com/writing/trust-boundaries/</link><pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/trust-boundaries/</guid><description>&lt;p&gt;I use coding agents on my own private repos every day. Security research, side projects, things I wouldn&amp;rsquo;t put on a public GitHub. Not something I&amp;rsquo;d do blindly with work source code though.&lt;/p&gt;
&lt;p&gt;So when someone turns off WiFi to prove the agent needs a network connection, I get it. But that&amp;rsquo;s the architecture. It&amp;rsquo;s on the pricing page. The agent works on your local files, the reasoning runs on a remote model. Both true, neither a secret.&lt;/p&gt;</description></item><item><title>The personal agent trap</title><link>https://blog.vnykmshr.com/writing/personal-agent-trap/</link><pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/personal-agent-trap/</guid><description>&lt;p&gt;Spent a week going through the personal agent ecosystem &amp;ndash; OpenClaw, ZeroClaw, PicoClaw, the whole *Claw family. Channel testing, security audit, the whole thing.&lt;/p&gt;
&lt;p&gt;If you want a personal assistant that messages you reminders, triages your inbox, schedules things, posts updates &amp;ndash; these frameworks are actually good at that. OpenClaw connects to 50+ channels out of the box, the setup is real, it works. For that, a $7 VPS and an afternoon gets you something useful.&lt;/p&gt;</description></item><item><title>Prescaling for a known spike</title><link>https://blog.vnykmshr.com/writing/prescaling-for-a-known-spike/</link><pubDate>Fri, 15 Mar 2019 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/prescaling-for-a-known-spike/</guid><description>&lt;p&gt;Our biggest sale event of the year is on the calendar. The date is fixed, the hour is fixed, and when it starts, traffic hits a multiple of normal within minutes. The engineering challenge isn&amp;rsquo;t handling surprise. It&amp;rsquo;s handling certainty at a scale we&amp;rsquo;ve never seen before.&lt;/p&gt;
&lt;p&gt;We prepare for months. Six months out, teams start thinking about what their services need. Backend teams work with SRE and infra to define prescale configurations and autoscale rules. Terraform handles the provisioning. Every service team shares their estimates with infra, and the configurations get codified.&lt;/p&gt;</description></item><item><title>Hazard lights</title><link>https://blog.vnykmshr.com/writing/hazard-lights/</link><pubDate>Sat, 10 Jun 2017 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/hazard-lights/</guid><description>&lt;p&gt;There are about fifteen of us in the enclosure. Backend engineers, SRE, devops, infra &amp;ndash; handpicked from across the floor. The rest of the team, about a hundred people, sit outside. They call us the fishes in the aquarium.&lt;/p&gt;
&lt;p&gt;The aquarium has hazard lights. Physical ones &amp;ndash; wired to fire on any 5xx in the system. When something breaks in production, the room goes red.&lt;/p&gt;
&lt;p&gt;It sounds like a gimmick. It isn&amp;rsquo;t.&lt;/p&gt;</description></item><item><title>Nginx load balancing decisions</title><link>https://blog.vnykmshr.com/writing/nginx-load-balancing/</link><pubDate>Thu, 18 May 2017 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/nginx-load-balancing/</guid><description>&lt;p&gt;Nginx as a reverse proxy and load balancer is well-documented. The configuration syntax is not the hard part. The decisions are.&lt;/p&gt;
&lt;h2 id="algorithm-selection"&gt;Algorithm selection&lt;/h2&gt;
&lt;p&gt;Three algorithms cover most workloads.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Round-robin&lt;/strong&gt; (the default). Requests cycle through backends sequentially. Weights let you bias toward higher-capacity servers. Simple, predictable, works well when request processing times are uniform.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-nginx" data-lang="nginx"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="k"&gt;upstream&lt;/span&gt; &lt;span class="s"&gt;api&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="n"&gt;api-01&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt; &lt;span class="s"&gt;weight=3&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="n"&gt;api-02&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt; &lt;span class="s"&gt;weight=2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kn"&gt;server&lt;/span&gt; &lt;span class="n"&gt;api-03&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;8080&lt;/span&gt; &lt;span class="s"&gt;weight=1&lt;/span&gt; &lt;span class="s"&gt;backup&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="kn"&gt;keepalive&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The &lt;code&gt;backup&lt;/code&gt; directive keeps a server in reserve &amp;ndash; it only receives traffic when all non-backup servers are down. Useful for a smaller instance that can keep the service alive during a partial outage but shouldn&amp;rsquo;t take production load normally.&lt;/p&gt;</description></item><item><title>Zero-downtime deploys</title><link>https://blog.vnykmshr.com/writing/zero-downtime-deploys/</link><pubDate>Mon, 15 Jul 2013 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/zero-downtime-deploys/</guid><description>&lt;p&gt;I deploy everything from the terminal. No web interface, no CI service, no dashboard with green buttons. Just &lt;code&gt;deploy production&lt;/code&gt; from my laptop, and the code goes live.&lt;/p&gt;
&lt;p&gt;The setup is two pieces. A bash script that handles the remote work &amp;ndash; SSH in, pull the latest code, run hooks. And a Node.js process that watches a file on the server and reloads the app cluster when the file changes. Between them, they do zero-downtime deploys in under ten seconds.&lt;/p&gt;</description></item><item><title>Running Node.js in production</title><link>https://blog.vnykmshr.com/writing/nodejs-in-production/</link><pubDate>Wed, 29 May 2013 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/nodejs-in-production/</guid><description>&lt;p&gt;We&amp;rsquo;ve been running Node.js in production since the 0.4 days. The language is easy to get started with. Keeping it running under real traffic is a different problem.&lt;/p&gt;
&lt;h2 id="process-management"&gt;Process management&lt;/h2&gt;
&lt;p&gt;The application needs to start at boot, restart on crash, and respond to system signals. Upstart handles this on Ubuntu without additional dependencies:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;description &lt;span class="s2"&gt;&amp;#34;myserver&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;env &lt;span class="nv"&gt;APP_HOME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/var/www/myserver/releases/current
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;env &lt;span class="nv"&gt;NODE_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;env &lt;span class="nv"&gt;RUN_AS_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;www-data
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;start on &lt;span class="o"&gt;(&lt;/span&gt;net-device-up and local-filesystems and runlevel &lt;span class="o"&gt;[&lt;/span&gt;2345&lt;span class="o"&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;stop on runlevel &lt;span class="o"&gt;[&lt;/span&gt;016&lt;span class="o"&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;respawn
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;respawn limit &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="m"&gt;60&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;pre-start script
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;test&lt;/span&gt; -x /usr/local/bin/node &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; stop&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;exit&lt;/span&gt; 0&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;test&lt;/span&gt; -e &lt;span class="nv"&gt;$APP_HOME&lt;/span&gt;/logs &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt; stop&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;exit&lt;/span&gt; 0&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="o"&gt;}&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;end script
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;script
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; chdir &lt;span class="nv"&gt;$APP_HOME&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; &lt;span class="nb"&gt;exec&lt;/span&gt; /usr/local/bin/node bin/cluster app.js &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -u &lt;span class="nv"&gt;$RUN_AS_USER&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -l logs/myserver.out &lt;span class="se"&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt; -e logs/myserver.err &amp;gt;&amp;gt; &lt;span class="nv"&gt;$APP_HOME&lt;/span&gt;/logs/upstart
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;end script
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;respawn limit 5 60&lt;/code&gt; prevents a crash loop &amp;ndash; if the process dies 5 times within 60 seconds, Upstart stops trying. The &lt;code&gt;pre-start&lt;/code&gt; script verifies that Node and the log directory exist before attempting to start.&lt;/p&gt;</description></item><item><title>MySQL on XFS</title><link>https://blog.vnykmshr.com/writing/mysql-xfs/</link><pubDate>Thu, 11 Apr 2013 00:00:00 +0000</pubDate><guid>https://blog.vnykmshr.com/writing/mysql-xfs/</guid><description>&lt;p&gt;XFS handles database workloads better than ext4 &amp;ndash; better concurrent I/O, more efficient metadata operations for tables-heavy schemas, and delayed allocation that improves write throughput. The obvious approach is to change MySQL&amp;rsquo;s &lt;code&gt;datadir&lt;/code&gt; in the config. The less obvious approach is bind mounts, which keep every path where the system expects it.&lt;/p&gt;
&lt;h2 id="setup"&gt;Setup&lt;/h2&gt;
&lt;p&gt;Install XFS utilities alongside MySQL:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;sudo apt-get install -y xfsprogs mysql-server
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Create the filesystem on the dedicated volume:&lt;/p&gt;</description></item></channel></rss>