The mask in the README

There is an attention mask drawn in ASCII in X’s algorithm release README. On first read, it looks normal. On second read, it shows what they gave up.

The mask blocks one thing. Each candidate post in the ranker can attend to the user embedding and the user’s history – that part is normal. What it cannot do is attend to other candidates in the same batch. Each row of the mask: the candidate sees the user, sees the history, sees itself, sees nothing of its neighbors.

The cost is real. Letting candidates attend to each other is how the model could learn diversity at attention time. Three sports posts in a row could pull on each other, surface a news post, balance the feed organically. That capability is given up at the mask – diversity gets done elsewhere now, by hand-coded scorers.

At the scale where you score a thousand candidates per request, coupling between them kills you. Score (user, candidate-7) and the answer depends on whoever else rode along in the batch. The cache stops helping. The parallelism story gets complicated fast.

With the mask: the score for (user, candidate) is identical regardless of what else rides along. Order-independent. Cacheable.

Most attention work pushes for richer attention. This one constrains it, deliberately, to buy production properties.

The move is the trade. A model capability for a system property – knowing which direction the trade goes.

Idempotent endpoints work the same way. Give up responses that vary with prior state, get retry safety in return. It looks like a downgrade until you’ve shipped under retry storms, and then the rich version stops being something you want. The engineers who can name the trade tend to be the ones who’ve shipped both versions before.

The mask is one place to read the release. The wider story is the gap between what people repeat about the algorithm and what the source actually says. The named subsystems that used to handle community signals and trust scoring – SimClusters, RealGraph – are gone. One transformer does retrieval and ranking now. Advice that still names them is a generation behind the code. The hand-tuned rules people repeat – boost links, favor verified, downrank certain content types – aren’t there either. They were declared eliminated. Whatever bias the system has now is learned from engagement, not hand-coded.

The system gets rewritten faster than the advice does. The source is open and the disagreement is checkable.

The mask, drawn in ASCII in a README, is one of those engineers waving.