This is a fascinating take on latency optimization, but I have to ask: what’s the eval setup? I’ve started keeping a running list of "demo-only tricks" that break under load, like assuming consistent memory bandwidth context engineering for multi-agent ai systems