By the morning of the event, there is nothing left to prepare. The capacity plan is done. The prescale is loaded. Infra has been briefed, service owners have been briefed, the runbook has been circulated.
The traffic comes in multi-x what we had planned for.
Nothing about the plan anticipated this. The numbers we used were the best numbers we had. The headroom we carved was the headroom we could afford. Neither was enough.
The floor
The whole floor becomes the war room. This is not a metaphor. There is a designated room somewhere with a table and a door, and nobody is using it. People are at their desks, heads up, LED screens overhead for every service, the mission dashboard glowing green-yellow-red above the middle of the floor.
The dashboard does not speak in “CPU at 94%” and “P95 at 3.2 seconds.” It speaks in “checkout is yellow.” “Seller dashboard is red.” “Search is green.” The row of boxes maps to the user journey, not the services. It takes a half-second to read.
Every backend team has their own screens alongside. Queues, caches, database connection counts, response times per endpoint. Someone pulls up the gateway’s resolver view – which backend calls are timing out, which are falling back, which cached response is being served instead of a live one. There are a lot of screens. Nobody has to filter or dig. Whatever is hurting is visible.
The improvisation
A pattern emerges. The prep got us to the gate. What happens after is improvised in real time. The gateway is the place we patch – cached responses for non-essential fields, short-circuits on slow backends, partial data where the aggregation would otherwise block. Some of these are pre-staged options; some are written in the moment, deployed in minutes, rolled back if they make things worse. The buffer does what it was built to do.
The backends recover. The page renders slower, thinner, but it renders. The seller dashboard’s red box turns yellow, then green. The row of overhead boxes settles into a shape that is not normal but is also not in flames.
Everyone on
What I remember is not the patches. Patches are just work.
What I remember is the room. Everyone stayed back. Nobody left at six. Engineers paired spontaneously – someone from search pulled up next to someone from catalog because their services were calling each other in a way that was suddenly expensive. Product managers came over to ask what was happening and stayed to explain fallbacks to customer service. The CTO was on the floor, not in a conference room. People ordered food. Nobody watched a clock.
This is the part I go back to when I am trying to explain what engineering is. It is not the runbook. The runbook is the ceiling, and we hit it ninety minutes in. What happened after was people thinking out loud in the same room, trusting what was on each other’s screens, making calls that would be second-guessed the next week and defended later. Everyone owned it. Everyone was in it.
I have had heavier engineering weeks. I have shipped more code in a day. I have not had a better team moment.
What stays
The sale ends. Traffic drops. The runbook gets updated – new warnings in the margin, new pre-staged fallbacks, a different prescale target for next year. The prescaling gets measured against load tests instead of estimates. The dashboard gets a couple of new boxes. The gateway grows a few more resolver-level controls.
All of that is real improvement. None of it is the part I think about.
The war room is where the prep stops being enough and the rest of engineering begins. You build the tools in advance for exactly this – the dashboards you can read at a glance, the layer you can patch in real time, the team you can pair with without a meeting. The quality of what you built shows in how fast you can improvise when the plan runs out.
The other thing that shows is what the floor sounds like when everyone is on.
