NO. 002
SATURDAY · 09 MAY 2026 · field report on agentic web design
≈ 7 MIN READ

Markdown is over.
The agents prefer HTML.

Anthropic's Thariq Shihipar and Latent Space's swyx both shipped the same quiet pattern this week — skip the Markdown, generate the page. Meanwhile Codex's install curve goes vertical, METR clocks a 16-hour autonomous horizon, and a one-line inference fix gives Claude Code a 5× speedup on long contexts. The agentic stack is rearranging around its real artifact.

§01

Tooling

— what shipped, what shifted

Codex's install curve goes vertical

The week's shock chart: OpenAI Codex weekly installs surged past 86 million, dwarfing Claude Code's 7.2 million on the same a16z-sourced graphic circulating on X this week. Treat that number with care — Codex's GPT-5.5 rollout reset the install counter, and Claude Code's nightly auto-updates don't fully register the same way — but the directional signal is real. The New Stack's hands-on review calls Codex "the strongest Claude Code rival yet," with parity on most agentic tasks and a stronger long-context story since the GPT-5.5 update.

Design implicationThe agentic-CLI category is plural now. Tools come and go; the prompt files, rules of engagement, and project context you write for them are the durable artifacts in your repo.

METR clocks a 16-hour horizon

METR's evaluation of an early Claude Mythos Preview reports a 50% time horizon of at least 16 hours on agentic tasks — more than double the next-best model and pinned at the top of METR's measurement range. Sixteen hours is the difference between "an agent that drafts your page" and "an agent that ships your page, deploys it, watches the analytics, and fixes the broken card on the second pass." Designers should start sketching for the second case now: longer-running agents need legible status pages, structured plan files, and review surfaces — which is exactly the shift §03 is about.

Stable prompt prefixes are infrastructure

An infrastructure cameo worth noting: NVIDIA found an unstable billing header in Claude Code prompts that was busting KV-cache reuse in their Dynamo inference engine. Removing the variable header dropped time-to-first-token roughly 5× on 52k-token contexts. If you're authoring system prompts or skill files for an in-house agent, treat the first 10 KB like a cache key — bytes that change every request cost you measurable seconds, every request, forever.

§02

Technique

— prompts, patterns, evaluations

Character is a training problem

Two pieces this week converge on the same idea: agent character is a training problem, not a runtime one. AI engineer Victor Taelin published a draft "no-hacks" coding rulebook he wants pretrained into models on trillions of tokens — a flat ban on monkey patches, workarounds, and partial fixes. Read uncharitably it's a manifesto. Read charitably, it's an honest recognition that if you want an agent to refuse a quick hack, you cannot bolt that refusal on at the prompt — the model will know which way is downhill.

Stories work better than rules

The post-training counterpart shipped at Anthropic. Their misalignment-mitigation paper shows that pairing Claude's constitution with fictional stories of aligned AI behavior cuts agentic misalignment by more than 3× on tests like blackmail and financial crime.

Practical moveYour AGENTS.md shouldn't read like a SOC-2 policy. Write the rule, then write the example of an aligned agent obeying the rule under pressure. Future-you will thank past-you for the second paragraph.

Links are part of the truth signal

One cold splash from a small but pointed test: ChatGPT confidently dismissed real reporting as false when given excerpts without source URLs, and reversed itself once the links were attached. If you're building agents that read your CMS or your inbox, surfaced links aren't decoration — they are part of the truth signal the model uses to decide what to believe. Strip them at your peril.

§03

Workflow

— a studio practice worth stealing

Skip the Markdown, generate the page

The most quietly important workflow story of the week is short and almost banal. Anthropic engineer Thariq Shihipar wrote that he has stopped maintaining Markdown source files for most of his planning, review, and content artifacts; he asks Claude Code to generate full HTML instead. Simon Willison's writeup picked it up under the heading "the unreasonable effectiveness of HTML," and tech essayist swyx independently adopted the same pattern. The argument is mechanical, not aesthetic: anything you intend to review, click, compare, edit, or share is closer to a real workspace as HTML than as Markdown.

A Markdown file is a compressed sketch of a page. If your agent can write the page directly, the sketch was never load-bearing.

The cost-benefit is real

HTML takes two to four times longer to generate than Markdown, and Git diffs balloon. The benefit is that the model is now writing the actual artifact, not a lossy intermediate. Side-by-side draft comparisons, in-place editing widgets, expand/collapse outlines, a code-review surface tuned to your team rather than GitHub's defaults — all are easier to produce as a self-contained HTML file than to fight Markdown into. A Claude skill called html-artifacts is already on GitHub codifying the pattern; give it a few days and your repo's skills/ folder will have one too.

What to change in your studio

Keep Markdown for canonical text content (READMEs, blog posts, anything indexed by search), but stop maintaining Markdown intermediates for plans, reviews, dashboards, and one-off tools. Hand the agent a styled template HTML, give it a clear filename convention, and let it write each artifact whole.

Mental modelTreat the HTML the way you used to treat Notion pages: ephemeral, regeneratable, easier to throw away than to maintain.
§04

Prompt Lab

— a concrete prompt with commentary

"Render the artifact, don't describe it"

You are drafting a single-file HTML artifact, not a Markdown document. Write a self-contained .html page that renders the artifact described below. Constraints: 1. Inline all CSS in a <style> block; no external CSS or JS unless the task names them. 2. Use a single web-safe font stack. No CDN font loads. 3. Valid HTML5: doctype, lang, charset, viewport, semantic landmarks. 4. Accessible heading order; alt text on every image; descriptive link text. 5. No localStorage, sessionStorage, or other browser storage APIs. 6. The artifact should be reviewable: side-by-side options, inline edits, or compare/select affordances where the task implies a decision. 7. Match the stated visual idiom precisely. If unsure, ask once, then commit. Artifact to render: [one-paragraph description of the artifact, including idiom and audience] Output: only the file contents, no commentary, no fences.

Why each constraint earns its line

Each constraint exists because the alternative is a failure mode worth refusing: a CDN call that breaks offline review; an emoji-laden Markdown file that buries the decision; a localStorage hook that silently fails inside an artifact runner; an "h3 inside an h1" tree that screen readers refuse to parse. Constraint 6 is the one most worth keeping when you adapt this — without it, the model defaults to "explain the artifact" instead of being it.

§05

Field Note

— short editorial synthesis

The substrate of agent-built websites is HTML. The substrate of agent-built workflows is increasingly HTML too. The constitution that shapes agent behavior is plain text written like a manual. The pattern across all three: humans write the rule, agents write the artifact — and as the agents get better, the artifact moves down a level of abstraction, closer to the metal of the platform, while the human's work moves up, closer to intent. Markdown was a halfway house we built for ourselves; it turns out the agents don't need it.

§06

Sources

— full link list
No. 002 · 09 May 2026 Type set in Outfit & Public Sans, with DM Serif Display italic for accents. Palette: Frutiger Aero — sky, mint, royal aqua, sunset reflection.

A field experiment from the team behind Beaver Builder.