Anthropic's Thariq Shihipar and Latent Space's swyx both shipped the same quiet pattern this week — skip the Markdown, generate the page. Meanwhile Codex's install curve goes vertical, METR clocks a 16-hour autonomous horizon, and a one-line inference fix gives Claude Code a 5× speedup on long contexts. The agentic stack is rearranging around its real artifact.
The week's shock chart: OpenAI Codex weekly installs surged past 86 million, dwarfing Claude Code's 7.2 million on the same a16z-sourced graphic circulating on X this week. Treat that number with care — Codex's GPT-5.5 rollout reset the install counter, and Claude Code's nightly auto-updates don't fully register the same way — but the directional signal is real. The New Stack's hands-on review calls Codex "the strongest Claude Code rival yet," with parity on most agentic tasks and a stronger long-context story since the GPT-5.5 update.
METR's evaluation of an early Claude Mythos Preview reports a 50% time horizon of at least 16 hours on agentic tasks — more than double the next-best model and pinned at the top of METR's measurement range. Sixteen hours is the difference between "an agent that drafts your page" and "an agent that ships your page, deploys it, watches the analytics, and fixes the broken card on the second pass." Designers should start sketching for the second case now: longer-running agents need legible status pages, structured plan files, and review surfaces — which is exactly the shift §03 is about.
An infrastructure cameo worth noting: NVIDIA found an unstable billing header in Claude Code prompts that was busting KV-cache reuse in their Dynamo inference engine. Removing the variable header dropped time-to-first-token roughly 5× on 52k-token contexts. If you're authoring system prompts or skill files for an in-house agent, treat the first 10 KB like a cache key — bytes that change every request cost you measurable seconds, every request, forever.
Two pieces this week converge on the same idea: agent character is a training problem, not a runtime one. AI engineer Victor Taelin published a draft "no-hacks" coding rulebook he wants pretrained into models on trillions of tokens — a flat ban on monkey patches, workarounds, and partial fixes. Read uncharitably it's a manifesto. Read charitably, it's an honest recognition that if you want an agent to refuse a quick hack, you cannot bolt that refusal on at the prompt — the model will know which way is downhill.
The post-training counterpart shipped at Anthropic. Their misalignment-mitigation paper shows that pairing Claude's constitution with fictional stories of aligned AI behavior cuts agentic misalignment by more than 3× on tests like blackmail and financial crime.
AGENTS.md shouldn't read like a SOC-2 policy. Write the rule, then write the example of an aligned agent obeying the rule under pressure. Future-you will thank past-you for the second paragraph.One cold splash from a small but pointed test: ChatGPT confidently dismissed real reporting as false when given excerpts without source URLs, and reversed itself once the links were attached. If you're building agents that read your CMS or your inbox, surfaced links aren't decoration — they are part of the truth signal the model uses to decide what to believe. Strip them at your peril.
The most quietly important workflow story of the week is short and almost banal. Anthropic engineer Thariq Shihipar wrote that he has stopped maintaining Markdown source files for most of his planning, review, and content artifacts; he asks Claude Code to generate full HTML instead. Simon Willison's writeup picked it up under the heading "the unreasonable effectiveness of HTML," and tech essayist swyx independently adopted the same pattern. The argument is mechanical, not aesthetic: anything you intend to review, click, compare, edit, or share is closer to a real workspace as HTML than as Markdown.
A Markdown file is a compressed sketch of a page. If your agent can write the page directly, the sketch was never load-bearing.
HTML takes two to four times longer to generate than Markdown, and Git diffs balloon. The benefit is that the model is now writing the actual artifact, not a lossy intermediate. Side-by-side draft comparisons, in-place editing widgets, expand/collapse outlines, a code-review surface tuned to your team rather than GitHub's defaults — all are easier to produce as a self-contained HTML file than to fight Markdown into. A Claude skill called html-artifacts is already on GitHub codifying the pattern; give it a few days and your repo's skills/ folder will have one too.
Keep Markdown for canonical text content (READMEs, blog posts, anything indexed by search), but stop maintaining Markdown intermediates for plans, reviews, dashboards, and one-off tools. Hand the agent a styled template HTML, give it a clear filename convention, and let it write each artifact whole.
Each constraint exists because the alternative is a failure mode worth refusing: a CDN call that breaks offline review; an emoji-laden Markdown file that buries the decision; a localStorage hook that silently fails inside an artifact runner; an "h3 inside an h1" tree that screen readers refuse to parse. Constraint 6 is the one most worth keeping when you adapt this — without it, the model defaults to "explain the artifact" instead of being it.
The substrate of agent-built websites is HTML. The substrate of agent-built workflows is increasingly HTML too. The constitution that shapes agent behavior is plain text written like a manual. The pattern across all three: humans write the rule, agents write the artifact — and as the agents get better, the artifact moves down a level of abstraction, closer to the metal of the platform, while the human's work moves up, closer to intent. Markdown was a halfway house we built for ourselves; it turns out the agents don't need it.
html-artifacts, a Claude skill for HTML-first outputA field experiment from the team behind Beaver Builder.