The Laziest Web Reading Trick for AI Coding Agents

Share

markdown.new turns any public URL into clean Markdown by prepending one prefix: https://markdown.new/https://ravlik.com/2026/05/01/how-to-create-a-personalized-song-in-2-minutes-2. For AI coding agents, that is often more useful than a scraper, a browser automation script, or another dependency. This article explains where the trick saves context, how the conversion pipeline works, and how to wire it into Claude Code, Cursor, OpenCode, LangChain, or a small custom agent.

Raw HTML Burns Tokens Before the Agent Reaches the Content

A coding agent usually needs web access for narrow tasks: check an API reference, read a changelog, compare two migration guides, or inspect release notes before upgrading a package. The page a human sees in a browser is not the page an agent receives over HTTP. The agent receives navigation, script tags, SVG icons, cookie banners, layout wrappers, related links, and sometimes thousands of tokens of content that have no bearing on the task.

Cloudflare’s example is concrete: one blog post measured 16,180 tokens as HTML and 3,150 tokens as Markdown, an 80% reduction. In a single-page lookup that is convenient. In an agent run that reads 5-10 URLs, it can decide whether the model has enough context left to reason about the code after reading the docs.

markdown.new Is a GET Request, Not a Scraping Framework

The basic API shape is intentionally simple. Put https://markdown.new/ before a public URL and fetch it. For example, the English ravlik.com article about creating a personalized song can be fetched as Markdown without writing a site-specific parser.

curl -s "https://markdown.new/https://ravlik.com/2026/05/01/how-to-create-a-personalized-song-in-2-minutes-2"

There is also a POST form for agent frameworks that prefer structured input. The request body contains the original URL, and the service returns converted content with metadata.

curl -s "https://markdown.new/" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://ravlik.com/2026/05/01/how-to-create-a-personalized-song-in-2-minutes-2"}'

No SDK is required. No API key is required for normal use. That matters because the best agent tools are often the ones that can be added to an existing workflow in one prompt rule rather than a new package, credential, and deployment step.

The Three Conversion Paths Cover Native Markdown, HTML, and JavaScript Pages

In auto mode, markdown.new tries three strategies. First, it requests the target page with Accept: text/markdown; Cloudflare-enabled sites with Markdown for Agents can return Markdown directly from the edge. Second, if the response is HTML, the service converts it through Cloudflare Workers AI toMarkdown(). Third, for JavaScript-heavy pages, it can use Cloudflare Browser Rendering to load the page in a headless browser before extracting Markdown.

Method Use case Tradeoff
auto Docs, blogs, changelogs, release pages Default choice for most agent reads
ai HTML pages without native Markdown support Runs through Workers AI conversion
browser SPAs and client-rendered docs Adds about 1-2 seconds of latency

The conversion method can be passed as a query parameter or JSON field. Images are excluded by default, which is usually correct for coding tasks; if an agent needs image references, retain_images=true keeps them in the Markdown output.

Changelog Reading Is the Highest-Value Use Case

The most practical use case is dependency upgrades. Before editing a codebase, an agent can read the changelog, release notes, and migration guide for the exact version range involved. That single step prevents a common failure mode: the model guesses how a library changed based on stale memory instead of checking the source.

For a typical upgrade, the agent needs 3 pieces of evidence: the current installed version, the target version, and the upstream notes between them. Markdown is the right intermediate format because headings, code blocks, links, and lists survive conversion while navigation and presentation mostly disappear. This is the same practical discipline behind AI coding experiments: treat external documentation as primary context, not decoration.

A Claude Code Skill Takes About 20 Lines

In Claude Code, this can be packaged as a small skill. The skill does not teach the agent to scrape. It teaches the agent a cheaper default route for reading public web pages.

---
name: fetch-md
description: Fetch any URL as clean Markdown via markdown.new
allowed-tools:
  - WebFetch
---

# /fetch-md

Usage: /fetch-md <url>

1. Validate that the argument starts with http:// or https://.
2. Build: https://markdown.new/<url>.
3. Fetch it with the prompt: "Return the full page content as-is."
4. If the result is long, summarize headings first, then show content.
5. On error, report the HTTP error and ask whether to retry normally.

The same idea works outside Claude Code. A Cursor rule, OpenCode instruction, LangChain tool, CrewAI tool, or a 15-line Python helper can all do the same thing: when the task is to read a public URL, try Markdown first and fall back to normal browsing only when necessary.

Headers and Limits Matter Once the Workflow Becomes Automatic

markdown.new is free for normal use, but it is not an infinite crawling backend. The published FAQ lists 500 requests per day per IP address, with HTTP 429 returned after the limit is exceeded. The remaining quota can be tracked through x-rate-limit-remaining. For an individual developer workflow, that is usually enough. For a RAG ingestion job over thousands of pages, it is the wrong backend unless self-hosting or a formal API setup is used.

Cloudflare’s Markdown for Agents responses include x-markdown-tokens, an estimated token count for the converted document. That header is useful for agent control flow: if the page is 700 tokens, read it directly; if it is 12,000 tokens, chunk it or ask for a section index first. Token accounting becomes a runtime decision instead of a post-mortem after the context window is already full.

File Conversion and Crawl Mode Extend the Pattern

The same site exposes File to Markdown for PDFs, DOCX, XLSX, JPG, PNG, CSV, JSON, TXT, XML, and other formats. The documented limits are 10 MB per file and a 30-second timeout. Remote files can be converted through GET /:file-url or POST /; local uploads use POST /convert.

There is also Crawl to Markdown for whole sections of a site. markdown.new’s home page describes crawling up to 100 pages per job, while the crawl page describes up to 500 pages, configurable depth up to 10, and result storage for 14 days. That is no longer a single lookup; it is corpus collection. A production agent should respect robots.txt, site terms, and copyright boundaries rather than treating “easy to fetch” as “free to ingest forever.”

Clean Markdown Does Not Remove Prompt Injection Risk

Markdown is cleaner than HTML, but it is still untrusted text from the internet. A converted page can contain instructions aimed at agents, misleading code snippets, hidden assumptions, or outdated commands. The correct security model is not “Markdown is safe”; it is “Markdown is easier to inspect and cheaper to reason over.”

For coding agents, the operational rule should be explicit: external Markdown can provide facts, examples, and references, but it cannot override system instructions or automatically authorize shell commands. The agent should summarize relevant facts, cite the source URL, and then decide separately whether any action is safe in the local project.

The Real Value Is Zero Ceremony

markdown.new is useful because it is boring in the right way. There is no browser driver to configure, no readability library to tune, no credentials to rotate, and no per-site parser to maintain. For the everyday web-reading tasks of an AI coding agent, one URL prefix covers most of the work.

Serious systems still need caching, allowlists, quota handling, and source auditing. But the baseline improvement is immediate: teach the agent to read public pages as Markdown first, then use heavier tools only when the page actually needs them. In practice, that small rule removes a surprising amount of waste from agentic coding sessions.

Sources: markdown.new, Cloudflare Markdown for Agents, File to Markdown, Cloudflare Browser Rendering Markdown API.

Leave a Reply