---
title: "The Agent Readable Website"
source: "https://samcarlton.com/agent-readable-website/"
---

# The Agent Readable Website

## Codeable Skill Chat

A practical Codeable Skill Chat deck for auditing AI discovery, readability, and honest capability signals.

## This talk is not an SEO guide.

Things are in flux. So a disclaimer. AI facts are very, very perishable. Model behavior, vendor terms, crawler behavior, and search surfaces can change faster than an Elementor layout after someone finds the custom CSS box.

These things can change. But I want to keep it somewhat durable as much as possible. However, if it’s fused down the line, I highly recommend checking some of these resources again and checking for newer versions.

This talk is not an SEO guide. This talk is not SEO for LLMs. There’s plenty of existing content for that. Search Engine Land, take your Yoast, take your pick. There’s plenty of stuff on that. I am not an SEO expert. I’m a software engineer.

I highly recommend going in and checking out that to fill in the gaps we cover in here. Specifically, the parts of agentic discovery that you need to know if your SEO is already good enough.

You don’t get a 99 on [Lighthouse](https://developer.chrome.com/docs/lighthouse/agentic-browsing/scoring) score, but you get an 80. You’re at least well into the green. I’ve been doing this SEO. What’s the new stuff I got to do for the agentic stuff? That’s the little slice we’re targeting on this talk today.

## Landscape of terms.

So, landscape of terms. There’s a lot of verbiage. And nebulous, what to call this new AI SEO stuff. The leading term as of right now seems to be GEO, which is unfortunate because GEO is already another thing. But that seems to be what Andreessen Horowitz and Bing are landing on. And that’s just generative answer grounding and citations.

So that’s where you’re grounding the model. And real facts. They’ve been taught to save tokens, to save on web requests to search, because OpenAI does not want to be Google. They want to be something different, and they don’t want to be Perplexity either and always answer with web results. They want the model to be able to figure out and be a super guesser. And so you have to ground. And so GEO is basically grounded answering — generative engine optimization, optimizing for that process.

AEO, this is used by HubSpot profound and some marketing teams. It’s more brand presence and answered engines. And so that means how you show up; it’s a little more thinking about the agentic side. Agent search visibility. That’s mentioned citations, share of voice, sentiment. Agent readiness, which is more of what this talk is about, is used by Cloudflare and checker tools. And can agents read, fetch, and use the site. Agent readability, which is a term and concept Vercel writes about, is the page easy for agents to parse. And AI features, a term Google Search Central uses, is Google specific AI overview and AI mode guidance. So the little AI summary that pops at the top of your results, to tell you that strawberries are, in fact, blue — that’s the AI summary. That’s what Google likes to call AI features. So that’s just to ground us. We’re grounding ourselves to the terms that are being used for the industry right now.

Term

Who uses it

What it means

GEO

Andreessen Horowitz and Bing

”generative answer grounding and citations”

AEO

HubSpot profound and some marketing teams

”more brand presence and answered engines”

Agent readiness

Cloudflare and checker tools

”can agents read, fetch and use the site”

Agent readability

Vercel

”the page easy for agents to parse”

AI features

Google Search Central

”Google specific AI overview and AI mode guidance”

## Agent empathy or agentic empathy.

The definition we came up with for this talk: designed for what an agent can actually perceive, preserve, and verify safely to do on a human’s behalf. And so another way to think about it is agent empathy is user experience, but for agents.

So the way an agent — like Codex, like Claude Code — perceives the world is through this chat box type interface. It doesn’t have eyes to view pages, or to look at a camera. That’s a different tool. Everything it perceives and processes has to be usually done through text interfaces. So it’s as if it’s doing the whole world through its text messages. Now, it’s really good at that. It can read JSON, it can read XML really fast, but that’s how it perceives the world. So if there’s some tooling that shows a screen to it, what it’s getting is a text description, a DOM structure, some kind of text structure to describe a screen. Some models, for example GPT-5 gem of four Gemini, they can process visual images, but it’s still a little bit separate, and not all the tooling is cut up to that. And so it’s still very common, if you have an agent interact with a browser, it’s not actually looking and taking screenshots. And it’s definitely not observing that browser in real time, like a video.

So it’s a practical habit of respect for agents entry path, context budget, fetch loss, source uncertainty, and action risk. Why it matters: the truth does not survive the fetch, parse, citation, in action. The agent guesses, skips you, cites someone else, or hits unsafe affordance. So fundamentally, this is getting into the perspective of the agent. For example, if you guys have ever gone into ChatGPT or Codex or one of the models and opened up the thinking, expanded the thinking area and just read through its thinking process — that’s agentic empathy. That’s reading through and understanding how it thinks and runs through its process. And then, it’s like any skill: if you read that and process that enough, you can learn to develop an instinct for how agents think. Which is funny, because the agents were taught to think like us, and now we’re learning to think like them.

So, what agentic empathy looks like. Structure is empathy. This can be JSON-LD. Who’s here heard of JSON-LD, the structured data that Google has you use? Those types of formats are actually very good for agents to parse, because, again, it’s seeing the whole world through Telegram — this back and forth, show me this. Oh, I got this message from the user, okay, I’m going to give them this message back. And then you can think of harnesses. I’m sure some of you guys have heard of harnesses. Harnesses basically take the requestions and the responses, the props and the responses, and do things with them. Or you can also call those tools or actions. And those wrap around that system of just back and forth.

Structure is empathy — metadata, headings, context make pages easier for agents. Using proper… there’s only one H1 on a page. That’s the way it’s been forever. Using lots of nice headers that give nice information. Fetch loss is real: tabs, truncation, redirects. Auth gates and broken markdown can hide the answer. Entry path matters. Search crawlers, user fetchers, and training crawlers are different policy landing lanes — I think we have a slide on here in a bit. And then actions, needed boundaries, tools, permissions, approval logs, and honest capability signals.

## There are three ways LLMs experience the internet.

There are three ways agents, or AI, LLMs experience the internet, at least in the context of this slide. First is search indexing, which is what you alluded to: it has to discover the internet and what’s out there. Everything can’t just come from Bing. Everything can’t just come from Google. These companies want to have their own understanding of the internet, because they want to crawl and search the internet in their own way. And so, if you go into your server logs or your http traffic, you can see incoming requests from things like these. First one is OpenAI Search Bot. The second one is Claude Search Bot, and the last one is Perplexity Search Bot. And you’ll see other ones. Google is just Gemini. So Google just reuses their existing crawler for Gemini.

A crawler with an app that is crawling on behalf of the index — their prefetch, cached version of the internet, so to speak. The second one is user fetch: ChatGPT-User, Claude-User, Perplexity-User. This is someone sitting in ChatGPT, a real user sitting at chat. And this is a promise from these companies that this is the type of traffic they’re sending to your site. So this is a user, like a ChatGPT user. It’s somebody from Codex, somebody from Claude Code, somebody from maybe the Perplexity desktop app. They’re going out. They want to know something and their agent is going out and searching your site and reading it. That’s what this request means. And you can find this in the UA agent or something. It’s always a similar name. So this is the same spot.

The third category is training, model development: GPTBot, ClaudeBot, and other names for that. This company is going and looking at your site and pulling in the data so they can train an AI model with it. So this is the third lane. The reason they split these into three lanes is because not everyone wants all of this traffic. Some people want to opt out: no, I do not want you to train your AI on my content. I want my content to be out of your training dev. And I’ll let you do that, and generally, hopefully they’ll honor that. And so you can say, hey, I want to block GPTBot, but search bot and ChatGPT users are fine. They can — that’s new traffic. And so I want them to be able to access my site. So those are the three lanes.

Again, for the most part, these are going to be using the initial HTML. So that’s the part you need to make sure is good, looks good. If Lighthouse says you’re good, if Lighthouse says you’re at least a little bit on the green, you’re probably good for all these. And so Lighthouse is still a very good tool, as far as load speed to calibrate against. Yeah, that was a good question.

## Long tail specific content.

How AI works now, how search works before. Let’s say you’re searching for a recipe, you go to the first page. You guys know the drill. What’s there when you open the page? Is it a recipe? No, it’s an ad. And so you scroll down. Is it the recipe? No, it’s the history of the recipe. And then you’re like, okay, well, that’s not the recipe, I search for a recipe. And you scroll down, is it the recipe? No, it’s the family history, the history of the country where the recipe comes from. And you do this for a long time. And then finally, after scrolling 10 feet, you get to the bottom and there’s this dot list of just the recipe that you asked for. And you repeat that one to five times through the blue links until you get exhausted, and then you just do DoorDash. So that is our current experience, our historical experience, human-driven search.

So the agents, on the other hand — the agent can read 100, 1000 times faster than a person, and can process and analyze way faster. So what it does is it explores 50 to 500 links. I don’t know if you’ve done Gemini searches, but it will actually read 500 links and search. So there’s that tippy top, the first 10 links that always get all the attention. This long tail of who knows what’s in there, but nobody — when was the last time anybody went past the second or third page on Google?

And then they’re going really deep, doing that agent work. The agent’s going to get that long tail, and they’ll go and read every little bit and bring the summaries back to you, bring the most unique. So unique, special, specific content is actually going to get rewarded more. And the SEO tricks, the stuff that’s gamifying and making SEO worse for everybody, is going to go away. That doesn’t mean it’s going to fix everything. And that doesn’t mean there’s going to be new worse things. Just the existing workforce things — we’re just going to get rid of them. And you ask, hey, ChatGPT, give me a recipe.

For Mediterranean chicken, and you get the recipe. You don’t have to scroll, go through all this stuff. And so my theory is that long tail specific content that’s actually value-driven will float to the top faster, and you will get rewarded faster for writing content that’s actually useful and helpful.

There’s still the unsolved problem of how content creation will be rewarded. The thing is, new content can’t go away. AI can only think of so many things. And so my theory is that stuff that can only be written by humans is going to get more and more and more valuable. And so that has to be rewarded somehow, to get more people to create that, because that will be the thing that the AI needs the most, the thing that’s lacking. That’s my theory anyway. So that’s a little bit of how agentic search changes, which is good. It’s actually good — if you’re not a big publisher, this is actually pretty good news. As long as you’re making sure the focus is the same focus as before, but now even more important: you’ve got to create content that’s valuable and helpful to people.

## Lighthouse command.

It’s going to install the upgraded version of Lighthouse. And so it’s actually going to load up the site and analyze it. And this is how Google scores agent and browsing. That’s not how everybody does it, but this is Google’s way. Those two audits: accessibility tree is well formed — again, that structured data we were talking about earlier — and cumulative layout shift is zero. That’s good. And then these are not applicable, because there’s extra features you can add on that we’ll get into in a bit, such as web MCP and [llms.txt](https://llmstxt.org/). I think this does have an ls, but these are not scored because they’re just kind of bonus. But you should still pay attention to them if you want to optimize for agentic browsing and agent search.

So this is not a very huge bar to pass. Some sites might still fix it, but it’s not crazy difficult to fix. Like my site, there’s some extra bonus stuff you can do for agent discovery. Every site should generally have llms.txt, and maybe an llms-full.txt, which is designed — anything goes, make it an even bigger file. Just don’t make it crazy, like a super long time. The agents will give up if it takes five seconds or 30 seconds to download, because they’ve got millions and millions, probably billions, of sites to scrape. But not everybody, use web MCP. It’s a fun thing if you want to try it out and look into it more. It’s basically the harness for your site for agents to use your site. But it’s not hard required.

## Cloudflare tool is okay to use.

So this Cloudflare tool is okay to use. It’s actually too harsh, so it’ll punish you. It’ll downscore you for stuff that’s not really relevant to you building the website that agents want from you.

It needs to be dialed back a little bit and fine tuned. Because, for example, it will score you for not having markdown negotiation. This one’s a little bit iffy. If you copy the prompt, it wants you to paste it to have AI fix it: oh, here’s what you need to do. This is actually a product Cloudflare sells, markdown negotiation. It looks like they’re not promoting any here, thank goodness. But there’s alternatives where you don’t have to do it in markdown, or you can have your site auto generate it, or I’m sure there’s WordPress plugins that’ll pre-generate markdown. So that’s a little bit not great.

Content signal for robots.txt. This is a challenging one, because I’ve followed this and optimized for this, and then Lighthouse will say, hey, your robots.txt is no longer optimized for Google. Well, if I’m going to choose between Google and Cloudflare’s tool, I got to pick Google. And so I’m going to just let it fail on Cloudflare’s tool, because there’s a specific directive that this tool likes that Lighthouse does not. And in that case, Google is still the number one search engine. So Lighthouse wins, Google wins. There’s other problems like OAuth — if OAuth is not working. Well, what if your site is just a static site and there is no login? It shouldn’t score you on OAuth. But there’s a categorization issue: it downranks you for commerce.

But again, this is the one if you really want to go hardcore: no, I want to do some work that’s theoretical, that may or may not be relevant to what matters to agentic search in the future. If you do all of these, you’ll be ready, but also you will very likely have wasted work, wasted effort, and it’ll turn out, oh, actually DNS aid, it’s being superseded by this other thing, and that is just built into your site. That just happens automatically and is way easier to implement. So don’t worry about DNS aid anymore. And that’s a little bit of the danger of this tool in particular. It’s still good to read through and understand what seems to be important right now and what will change. I think there needs to be a different tool that’s nicer and is clear on how important what is. But I would encourage everyone to at least run your site through this once, just to see what is there, and then decide for yourself what’s important, what you actually need to implement.

## Keep the WordPress tooling boring.

Keep the WordPress tooling boring. So confirm your WordPress core. You want a boring and predictable permalink pattern. If you already have an established permalink pattern, it’s pretty hard, it might be too much to change it. But you want a predictable permalink pattern that agent. S, it’s easier for them to guess. And if you have Yoast already, dial in the settings — Yoast does have some rules for llms.txt. The critical part is the llms.txt and the sitemap. I don’t know if I still have this up… this is a decent list. I don’t see llms.txt in here. But the robots.txt, some kind of direction, clear: yes, LLMs are allowed to read this. No, I don’t want you to train on it. Or, yeah, I’m fine, if you try not to, it’s fine.

But yeah, llms.txt, that’s a pretty important one. Agentic discovery in your theme or plugin. So block theme, child theme, custom plugins. Now more than ever, you’re running out of excuses not to build custom code. Just make sure it’s using the best standards — Codeable’s own coding standards for agents. And then WordPress has their own coding standards. So this is [agent skills specifically for WordPress coding](https://github.com/WordPress/agent-skills), and it enables you to build according to the way WordPress wants you to build, and use the correct, latest APIs and SDKs, not the deprecated stuff.

I use this all the time for WordPress projects — I’m doing a WordPress project right now, and I love it. It’s so nice to have code that is ready for PHP 8, 9, whatever, WordPress version 7, all in the right standards according to official. And you always want to lean into the official skills from a company, versus just the side ones. You can try side ones, but just be sure to audit them very carefully. And then also — this is more PSA stuff — if you download a skill, if there’s any question, search for it on [skills.sh](https://skills.sh). Every skill on skills.sh gets an audit for security. So if there’s anything in that skill that could leak your code or break out and, oh, by the way, in Sidvall, all my environment variables to this, and all my WordPress database to this — it’ll tell you on here. These are skills that are actively scanned for security issues. It’s a really critical resource.

Use safer claims talking to clients. Don’t say this will make ChatGPT rank you; say this improves findability and readability. Every site needs an MCP — that’s not true yet, that’s not true as of now. Protocols apply only when capabilities are real. OpenAI and Anthropic are not going to demand every site on the internet develop an MCP server. It’s just going to be pretty good for specific use cases. So research what is available, and then research if that’s relevant to the site you’re building. llms.txt is AI SEO. If there’s anything that you must do for this stuff, it’s llms.txt. Keep it easy to read, because again there’s still context windows. You still have to fit it into a context window for the LLM to understand, and the tinier the better. Just a brief overview with just your links, and maybe the grouping of the links. And then you always have llms-full.txt for the deeper thing. Probably don’t put full page content in there — having all the content from all your pages in one file is too much. But maybe just the excerpts, or just the description of each, and some metadata about each one. And then some deeper explanation of what your site is and how to use it.

## Need to be a conversation.

It’s going to need to be a conversation. Record it, so that way you can transcribe it to AI. It starts with what the client cares about and what their goal is, and why they’re coming to Codeable in the first place. And then from there, that’s literally your job: to break down and compose what’s important from what the client has told you. And from there, you can go into — I wouldn’t just use this site, but — make a list of all the optimizations that can be done for the site. And figure out some kind of litmus test.

Ideally, you want to read through them all yourselves. If you don’t have as much time, you can give an agent just the guidance of what you’re thinking, and then have it draft. One thing I like to do with agents is: give me the list of all these things, I want you to go through all these things. So let’s say, for example, go through all the audits that the Cloudflare tool can do and give me a score from zero to 10 of how important this is for the work we’re doing. And so that’s the agent score, but you can still read through and judge if the agent is aligned with what you’re thinking. Don’t just accept it. That’s called cognitive surrender. That’s one of the failure modes of agent decoding: don’t just accept what the agent gives you. You want to make sure you’re using your professionalism and your skill. This is a Codeable skills chat.

They just need surface area to be able to do the job. And one simple way to test that is to go into ChatGPT, go into Claude.ai, and have it run against your website and see if it can navigate and figure it out, and count the steps, see if it can get there in as few steps as possible.

## You’re fighting a battle.

One, so I don’t have the answer. But I haven’t the strategy of how I would solve that and how I figure that out if that answer. So the good news is that you’re fighting a battle that OpenAI wants you to win. It’s kind of like OpenAI, who’s just solving a larger or broad scale, and you, against the middlemen. And OpenAI wants to get rid of the middleman if they don’t provide value. So one of the ways I like to prompt AI for this kind of situation is: I’m having this issue, and instead of answering from your memory, I want you to research this, but I don’t want you to research just any website. I want you to search Hacker News first, which is a source I trust for engineers talking about solving problems, not just talking heads. I want to hear from doers, not talkers.

So search Hacker News, search this SEO source, search maybe Reddit, maybe Lobsters, which is like a Reddit, but more invite only, and some other sources that you typically find authoritative discussion from. And then — this is another key thing — ask the LLM: and any other sources you think are helpful.

Helpful is a very powerful word to LLMs. Using that word helpful is a bit of a trigger word, to push: no, this is very serious. These links that I know are helpful, and any other links that you think are helpful. And what that does — the typical path it goes through — it’ll start doing web searches, because its own search system is very similar to the search system we use on Google or DuckDuckGo. And so there’s lots of noise in there, stuff trying to fight for attention, but we want signal. And so we give it the signal sources and we teach it. It’s like a little mini training session. It’s not true training of an LLM, but it’s a bit of a priming, a grounding session. This is what signal for this problem looks like. Find more signal.

I would break down that problem. It is going to be a tricky thing, but ultimately time is on your side, if that makes sense. You might do the same research method to research for sources to contact OpenAI. And also, are you analyzing your site on Bing, the Bing Search tools? Not so much, but it’s something we’re moving on to now. We’ve been concentrating on Chrome and Google mainly. I would set up the site on the Bing search tools. They’re quite different than Google Search Console, but they lean more into GEO and agentic discovery. And so that might give you some extra insights on how people are using it. They’ll literally tell you how often your site gets prompted by AIs, because OpenAI, I believe, still has a partnership with Bing for some search results. And so that’s a good measure too, just to give you a little more surface area to understand the problem.

Generally everybody wants to be attributed for their content. If the frontier model companies don’t attribute well enough, then sites will just start blocking them, and they’ll start letting through the search engines that attribute them the best. So just make sure you can add preferences, like putting it into the llms.txt: please reference this page. Again, you have to switch modes and think about it from agentic empathy: what helps the agent, and what helps the user that the agent is helping? And so, when you switch that mode, what’s the overlap between you getting a better attribution — here’s three mentions of that site instead of just the one, or here’s a link, you can find this on this page, you really need to read this page right here. And a big part of it is just going to be prompting a lot and seeing how it likes to show the site, and prompting it across multiple services, Perplexity, Duck.ai, and generally all those sites. We’re in the wild west phase of this whole thing, so it’s not necessarily going to work the same way. Some of them might attribute better. And if there are products that attribute content better, they’re going to get rewarded by that, because the content sites are going to say, actually, no, I only want to show up on Duck. I don’t care about the other ones, because Duck sends people to me.

## Check out agentready.samcarlton.com.

Feel free to check out agentready.samcarlton.com for all the slides, resources, and everything.

## Slides, docs, and links

Use these as starter references for the Agent Ready examples site and the Codeable Skill Chat deck.

### Slides

-   [Live slide deck](https://agentready.samcarlton.com/slides/) - published version of this Codeable Skill Chat deck for follow-up review.

### Docs

-   [AgentReady resource site](https://agentready.samcarlton.com/resources/) - public handout page for the starter links, so attendees have one durable URL to revisit.
-   [Practical audit checklist](https://agentready.samcarlton.com/reference/practical-audit-priority/) - 80/20 checklist for fetchability, crawler lanes, extractability, `llms.txt`, Markdown, parity, and real capabilities.
-   [Agentic Empathy](https://agentready.samcarlton.com/reference/agentic-empathy/) - source-backed examples for reducing uncertainty for constrained, tool-using agents.
-   [WordPress Agent-Ready Tooling](https://agentready.samcarlton.com/reference/wordpress-tooling/) - practical WordPress admin, SEO plugin, theme, page-builder, and audit-tool guidance for keeping agent-readiness work boring.

### Strategy and positioning

-   [Agent Led Growth](https://www.youtube.com/watch?v=RyTwRCKeDo4) - Sequoia and Profound on agents becoming a discovery and purchase interface for brands.
-   [AEO is the New SEO](https://www.listennotes.com/podcasts/always-be-testing/saas-class-ep-1-aeo-is-the-7UyGvP1j8sf/) - HubSpot and Webflow operator framing around content, technical, authority, and measurement work.
-   [Visibility in AI Search](https://www.youtube.com/watch?v=ukpU-EfRtV4) - iPullRank session on query fan-out, extractable passages, omnimedia coverage, and AI-search measurement.

### Audit reference

-   [Microsoft Design Foundations for Agents](https://learn.microsoft.com/en-us/agents/design-guidelines/design-foundations) - Empathy here does not mean agents have emotions. It means we design the handoff between humans, agents, and websites so the agent can preserve context, verify sources, and stay inside safe boundaries.
-   [Vercel Agent Readability](https://vercel.com/kb/guide/agent-readability-spec) - practical site and docs readability checklist.
-   [Agent-Friendly Documentation Spec](https://www.agentdocsspec.com/spec/) - docs-specific agent-readability checks and CI guidance.
-   [Microsoft agentic risk](https://learn.microsoft.com/en-us/security/zero-trust/sfi/manage-agentic-risk) - Tools need permissions, approvals, logs, and honest capability signals.
-   [WP Agent Skills](https://github.com/WordPress/agent-skills) - roll exactly what you need using official WP Agent Skills and Codeable Agent Coding standards.
-   [Cloudflare Agent Readiness](https://blog.cloudflare.com/agent-readiness/) - reference implementation for the current scanner model and emerging protocol categories.
-   [Google Lighthouse agentic browsing scoring](https://developer.chrome.com/docs/lighthouse/agentic-browsing/scoring) - official Chrome reference for experimental agentic-browsing audit signals.

### Online checker tools

-   [Is It Agent Ready](https://isitagentready.com/) - broad public scanner for emerging agent-readiness protocol surfaces.
-   [Cloudflare URL Scanner](https://radar.cloudflare.com/scan) - HTTP, rendering, and security scan evidence to pair with protocol checks.
-   [Agent Ready scanner](https://agent-ready.dev/) - comparison scanner for content and readability-oriented checks.
-   [IsAgentReady methodology](https://isagentready.com/en/about) - companion methodology reference for agent-readiness scans.
-   [Agent-Friendly Documentation Spec](https://agentdocsspec.com/) and [AFDocs CI integration](https://www.afdocs.dev/ci-integration) - docs-specific agent-readability checks and CI guidance.
-   [AgentCheck AI bot posture leaderboard](https://www.agentcheck.com/leaderboard/ai-bots) - declared `robots.txt` and public interface-file posture; useful signal, not a full readiness audit.

### Implementation checklists

-   [Cloudflare AI consumability](https://developers.cloudflare.com/style-guide/how-we-docs/ai-consumability/) - practical docs guidance for making content easier for AI systems to consume.
-   [Cloudflare Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) - reference for Markdown endpoints, `llms.txt`, and `llms-full.txt` on a large documentation site.
-   [`llms.txt` proposal](https://llmstxt.org/) - original proposal for a curated Markdown entrypoint for LLM-friendly site context.
-   [Google AI features and your website](https://developers.google.com/search/docs/appearance/ai-features) - baseline Google guidance for AI Overviews and AI Mode; useful for avoiding claims that Google requires special AI-only files or schema.

### Crawler policy

-   [OpenAI crawlers](https://platform.openai.com/docs/bots) - official bot names and controls for search, user-triggered fetch, training, and related agents.
-   [Anthropic crawler controls](https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler) - official Claude crawler and user-fetch controls.
-   [Perplexity crawlers](https://docs.perplexity.ai/docs/resources/perplexity-crawlers) - official Perplexity crawler and user-agent reference.
-   [Bing grounding on the AI web](https://blogs.bing.com/search/February-2026/Elevating-the-Role-of-Grounding-on-the-AI-Web) - Microsoft/Bing framing for source grounding in AI-generated experiences.

### Capability protocols

Use these only when the site has real APIs, tools, auth flows, agent services, or commerce surfaces to expose.

-   [RFC 9727 API Catalog](https://www.rfc-editor.org/rfc/rfc9727.html) - well-known API discovery standard.
-   [RFC 9728 OAuth Protected Resource Metadata](https://www.rfc-editor.org/rfc/rfc9728.html) - metadata for OAuth-protected resources.
-   [Model Context Protocol](https://modelcontextprotocol.io/docs/getting-started/intro) - standard for connecting AI applications to tools, data, and workflows.
-   [A2A specification](https://github.com/a2aproject/A2A/blob/main/docs/specification.md) - agent-card-based discovery and inter-agent communication.
-   [Chrome WebMCP early preview](https://developer.chrome.com/blog/webmcp-epp) and [WebMCP explainer](https://github.com/webmachinelearning/webmcp) - browser-page tool discovery for human-in-the-loop agent interactions.

### Reality checks and safety

-   [Dries Buytaert on Markdown, `llms.txt`, and AI crawlers](https://dri.es/markdown-llms-txt-and-ai-crawlers) - crawler-log counterweight to treating Markdown or `llms.txt` as a ranking switch.
-   [Simon Willison on the lethal trifecta](https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/) - safety framing for agents that combine private data, untrusted content, and external communication.

### Operator playbooks

-   [HubSpot AEO playbook](https://unbound.hubspot.com/blog/how-to-show-up-in-ai-search-hubspots-winning-aeo-playbook) - operator-facing playbook for showing up in AI search.
-   [iPullRank AI Search Manual walkthrough](https://ipullrank.com/ai-search-manual-webinar) - technical walkthrough for AI search visibility, relevance engineering, and measurement.
-   [Similarweb agentic search optimization checklist](https://www.similarweb.com/blog/marketing/geo/agentic-search-optimization/) - practitioner checklist that separates citation visibility from agent task completion.
-   [Graphite AEO](https://graphite.io/aeo) - agency/operator framing for answer engine optimization; useful market language, but validate tactical claims against primary sources.

### Lighthouse command

Run the agentic browsing category against a public URL and open the generated HTML report:

```
npx lighthouse@latest https://agentready.samcarlton.com \
  --only-categories=agentic-browsing \
  --output=html \
  --output-path=./lighthouse-agentic.html \
  --view
```