# The State of the Video Agent Web

**Field guide · Vol. 1 · Updated June 2026 · By the AgentCDN team**

Canonical web edition: https://agentcdn.com/state-of-the-video-agent-web

More than 20 million people asked the video they were watching a question in December alone. YouTube is rolling out catalog-level answers. Google's answer surfaces passed a billion users and can now watch video. Even commerce learned to route people back to the owner's surface. The open web's video still has no answer layer. This guide maps what shipped, what it means, and what to do about it.

## At a glance

1. Asking video is now mainstream behavior. More than 20 million people used YouTube's in-player Ask in December alone, and catalog-level Ask YouTube is rolling out now. [1] [3]
2. The answer interfaces reached platform scale: ChatGPT at 900M weekly users, Google AI Mode past 1B monthly, AI Overviews at 2.5B. [6] [7]
3. The traffic math flipped. AI systems crawl up to tens of thousands of pages per visitor they send — but the visitors they do send now convert 42% better than any other channel. [11] [18]
4. The capability gap closed in the last six months — natively multimodal embeddings and timestamped video understanding shipped. The access gap did not. [30] [8] [31]
5. Commerce and publishing built their machine-access rails — protocols, licensing standards, marketplaces. Open video still has no equivalent. That gap is what this guide tracks. [17] [23] [25]

## The shift in one line

- **Before:** search the page.
- **Now:** ask the library.
- **Missing on the open web:** return the exact video moment with credit and playback.

## Market signals

| Signal | Value | Reading |
| --- | --- | --- |
| People who used YouTube's in-player Ask in December alone | 20M+ | Asking the video you are watching is already mainstream. [1] |
| Google AI Mode monthly users, May 2026 | 1B | Conversational search is a default surface, with queries doubling every quarter. [7] |
| AI Overviews monthly users, May 2026 | 2.5B | AI answers now sit on top of nearly every search. [7] |
| ChatGPT weekly users, Feb 2026 | 900M | The standalone answer interface keeps compounding. [6] |

## 1. The six months that changed the interface

- **2024 – mid 2025 — How we got here.** YouTube tests conversational AI in the player, Netflix ships generative AI search, Microsoft proposes NLWeb, and OpenAI and Google adopt MCP. Asking becomes a product direction everywhere at once. [14] [29]
- **Nov 2025 — In-player Ask goes broad.** YouTube rolls out the Gemini-powered Ask button on PC and mobile. Viewers ask the video they are watching, without leaving playback. [2]
- **Dec 2025 — Agent access becomes neutral infrastructure.** Anthropic donates MCP to the Linux Foundation's new Agentic AI Foundation, backed by OpenAI, Google, Microsoft, AWS, and Cloudflare — at 97M monthly SDK downloads and 10,000+ active servers. RSL 1.0 is ratified as a licensing standard. [26] [23]
- **Jan 2026 — The browser and the catalog go agentic.** Chrome ships agentic Gemini features. Shopify debuts the Universal Commerce Protocol at NRF. YouTube publishes the number: 20M+ people used in-player Ask in December alone. [28] [20] [1]
- **Feb 2026 — Licensed access becomes a marketplace.** Microsoft unveils its Publisher Content Marketplace with AP, Vox Media, and Condé Nast. ChatGPT reaches 900M weekly active users. [25] [6]
- **Mar 2026 — Commerce reverses into the owner's surface.** OpenAI deprecates Instant Checkout: discover in chat, transact on the merchant's own site. Google ships Gemini Embedding 2 — video and audio enter the same searchable vector space as text. [16] [30]
- **Apr 2026 — The traffic math goes public.** Adobe: AI-referred retail traffic up 393% in Q1 and converting 42% better. Cloudflare Radar: thousands of pages crawled per visit referred. TikTok ships AI Overviews in search. Video-MME-v2 puts frontier models at roughly half of human performance on long video. [18] [11] [15] [31] [12]
- **May 2026 — I/O makes video askable at platform scale.** Ask YouTube launches for the full catalog. Gemini 3.5 Flash brings native, timestamped video understanding to AI Mode at a billion users. WebMCP enters Chrome origin trial. Spotify ships prompted podcasts and Personal Podcasts. [3] [8] [27] [13]
- **Jun 2026 — The rollout continues. The open web waits.** Ask YouTube expands to TVs and consoles with broad rollout promised for the summer. Gemini 3.5 Pro arrives. Outside the platforms, the open web's video still has no answer layer. [5]

## 2. The traffic story

Machines read everything and send almost no one. The visitors they do send are the best on record. Both measurements point the same way: value now flows to the source that gets cited, credited, and routed to.

**Exhibit 1 — When an answer appears, clicks disappear.** Share of Google search visits where users clicked a result link (March 2025, U.S. panel) [9]:

| Condition | Click rate |
| --- | --- |
| No AI summary on the page | 15% |
| AI summary on the page | 8% |
| Click on the summary's cited source | 1% |

**Exhibit 2 — Pages crawled per visit referred.** Cloudflare Radar data, Q1 2026; Google search-era ratio for comparison, June 2025 [10] [11]:

| Platform | Pages crawled per visit referred |
| --- | --- |
| Google, search era | ≈14 : 1 |
| OpenAI GPTBot | 1,276 : 1 |
| Anthropic ClaudeBot | 23,951 : 1 |

> Being the cited, routable source is the new distribution.

## 3. Actual examples in the wild

These are public launches, published numbers, documentation, and policy moves — almost all from the last six months — that point at the same behavior change.

### Ask: player → catalog (YouTube, Nov 2025 → May 2026)

Asking video went from a button to the whole catalog. The Gemini-powered Ask button rolled out broadly in November 2025; in December alone, more than 20 million people used it to ask about whatever they were watching. At I/O 2026 the same interface grew to the full catalog: Ask YouTube answers natural-language questions with long-form video, Shorts, relevant clips, timestamps, and follow-ups — US Premium first, now expanding to TVs and consoles. **Why it matters for video:** twenty million people a month already treat a playing video as something you can question. The expectation shifts from "find me a video" to "answer me from this library and take me to the source moment" — and it will not stay inside one platform. [2] [1] [3] [4] [5]

### AI Mode at 1B (Google Search, May 2026)

The answer interface reached a billion people — and it can watch. AI Mode passed one billion monthly users with queries doubling every quarter, and the new experience runs on Gemini 3.5 Flash — which understands video natively, with timestamped insights. **Why it matters for video:** the mainstream answer surface can now look inside video. What it can find, cite, and play depends entirely on how a library is published. [7] [8]

### Prompted podcasts (Spotify, May 2026)

Audio libraries became promptable first. Spotify expanded prompt-led podcast discovery and announced Personal Podcasts: private episodes, daily briefings, deep dives, and weekly roundups generated from user prompts over the catalog. **Why it matters for video:** audio and video are converging on the same expectation — ask the archive, get a packaged answer back, with the source attached. [13]

### Conversational catalog (Netflix, May 2025)

A media brand made its own library askable. Netflix shipped OpenAI-powered conversational search — mood, intent, and context instead of titles and keywords. It remains the clearest example of a media brand building the askable layer inside its own product. **Why it matters for video:** this is the media-brand playbook — the askable layer lives in the owner's surface, on the owner's terms, not only inside aggregator platforms. [14]

### The askable web (Linux Foundation, Google, Microsoft, Dec 2025 → now)

Every surface is getting a way to receive a question. MCP — 97M monthly SDK downloads, 10,000+ servers — became neutral infrastructure under the Linux Foundation. WebMCP entered Chrome origin trial at I/O 2026, letting sites expose structured tools directly to agents. NLWeb seeded the pattern in 2025. **Why it matters for video:** the plumbing for asking any site is arriving in the browser itself. Libraries that expose structured video answers will be askable. Libraries that expose only pages will be summarized. [26] [27] [29]

## 4. The pattern: askable libraries, not just better search

| Pattern | Inside the platforms | Missing on the open web |
| --- | --- | --- |
| Interface | Search boxes, players, and browsers all become conversational. | Every owned library needs a way to receive a question. |
| Unit of answer | A clip, an episode, a timestamp, a generated briefing. | The web needs a durable way to address and return exact moments. |
| Source return | The platform controls playback, attribution, and the next action. | Credit and playback must survive outside any one platform. |
| Owner control | Platform policy decides what appears. | Rights, permissions, and destinations must be declared by the owner. |

> The unit is not a SKU. It is a source moment.

## 5. The capability gap closed. The access gap didn't.

The building blocks arrived in the last six months. Gemini Embedding 2 put video, audio, and text into one searchable vector space — the first natively multimodal embedding model from a major provider. Gemini 3.5 Flash reads video natively, with timestamped insights, and now powers AI Mode for a billion people. Word-level transcripts are a commodity. A machine can genuinely find and understand the moment inside a video — when the library is prepared for it. [30] [8]

Prepared is the operative word. Hand a frontier model raw footage and it still struggles: on Video-MME-v2, the strongest model scores 49.4 against a 90.7 human-expert baseline on long-horizon video reasoning, and per-query economics rule out streaming an hour of pixels through a model for every question. Useful video answers come from structure computed once at ingest — word-level transcripts, chapters, shot boundaries, multimodal embeddings — so the agent can localize the exact moment at query time. [31] [32]

**Exhibit 3 — Raw models still can't carry a long video on their own.** Video-MME-v2 consistency score for long-horizon video reasoning, April 2026 [31]:

| Evaluator | Score |
| --- | --- |
| Human experts | 90.7 |
| Gemini 3 Pro | 49.4 |
| Gemini 3 Flash | 42.5 |
| Best open-source model | 39.1 |

So the bottleneck moved. The question is no longer whether AI can understand video — it is whether a library is prepared and permissioned to be asked: discoverable sources, addressable moments, playable return paths, attribution that survives, rules the owner sets. W3C Media Fragments and Google's video markup solved pieces of the addressing years ago. What is missing is the layer that binds all of it for agents — on the open web, not just inside platforms. [33] [34]

> The question is no longer whether AI can understand video. It is whether your library is prepared to be asked.

## 6. What an askable video library needs

1. **Findable source pages** — pages, feeds, or manifests that a machine can discover without guessing.
2. **Moment-level timecodes** — word-level transcripts, chapters, and shot boundaries that point to the answer inside the video.
3. **Visual context** — evidence for what is shown when the answer depends on motion, screens, products, demos, or scenes.
4. **Playable return path** — a clip, embed, or deep link that brings the person back to the relevant source moment.
5. **Usage permissions** — owner-defined rules for summarization, display, clipping, transformation, and downstream use.
6. **Persistent attribution** — creator, brand, title, channel, and source links that travel with the answer.
7. **Owner destination** — a route back to the site, app, player, or approved viewing surface the owner controls.
8. **Demand visibility** — a record of what people asked, which moments answered, and which questions failed.

## 7. The rails next door: commerce and publishing already built theirs

Commerce moved first because products already had structure: catalogs, prices, availability, payment authority — and Shopify's Universal Commerce Protocol now frames that structure for agents. Within six months the vertical also discovered where the relationship belongs: OpenAI launched Instant Checkout in September 2025, then deprecated it in March 2026 in favor of discover-in-chat, transact-on-the-merchant's-site — and Adobe measured AI-referred visitors converting 42% better than any other channel. The most advanced agent vertical concluded that the answer belongs in the agent and the relationship belongs to the owner. [20] [16] [17] [18] [19]

Text publishing built the permission side: RSL 1.0 became a ratified licensing standard with 1,500+ endorsing organizations and CDN-level enforcement from Cloudflare, Akamai, and Fastly; Cloudflare added Content Signals and Pay Per Crawl; Microsoft opened its Publisher Content Marketplace. Machine access is becoming something owners declare, price, and audit — video needs the same declaration at the level of the moment, not the page. [23] [24] [22] [21] [25] [35]

**Exhibit 4 — The visitors AI sends became the most valuable on record.** AI-referred visit conversion vs. non-AI traffic, U.S. retail sites, Adobe Analytics [18]:

| Period | Conversion vs. non-AI traffic |
| --- | --- |
| Mar 2025 | −38% |
| Mar 2026 | +42% |

AI-referred traffic itself grew 393% year over year in Q1 2026.

- **Commerce:** product, price, inventory, payment authority — and the transaction back on the merchant's surface.
- **Text publishing:** crawler permission, licensing terms, access pricing, citation — enforced at the edge.
- **Video:** moment, transcript, visual context, credit, playback, owner destination. Still unbuilt on the open web.

## 8. What this means for you

### Media brands

Your catalog is the product. Netflix made its library conversational inside its own app — the same expectation is coming for every owned archive.

1. Inventory what your library can answer. The questions are being asked today; the answers go to whoever is citable.
2. Declare machine-access terms the way text publishers now do — per use, per surface, with pricing and rules.
3. Keep playback and the audience relationship on surfaces you control: your player, your app, your endpoints.

→ https://agentcdn.com/media-brands

### Creators

Twenty million people a month are already asking videos questions on one platform. Across AI surfaces, your library either answers with your name attached — or stays invisible.

1. Check what AI can currently find, cite, and play from your channel.
2. Make moments addressable: chapters, transcripts, and structure turn an hour of video into a hundred answerable questions.
3. Keep credit attached: attribution and links that travel with every answer, back to surfaces you own.

→ https://agentcdn.com/visibility

### Builders

The protocols matured — MCP under a neutral foundation, WebMCP in the browser, commerce standards on their second iteration. Video grounding is the missing capability in most AI apps.

1. Treat the checklist above as your integration spec: timestamps, playback, rights, and attribution belong in the response, not on the roadmap.
2. Build against open, permissioned access rather than scraping — it is where publisher infrastructure is already heading.
3. Return source moments, not summaries. Cited, playable answers are measurably more valuable to users.

→ https://agentcdn.com/open-standard

### Won't the platforms just solve this?

For their own catalogs, they already are — Ask YouTube, AI Mode, and Netflix's conversational search are exactly that. The open question is everything that lives outside a platform: media-brand archives, podcast networks, course libraries, owned sites. There is no Ask button for the open web's video. And the commerce precedent suggests the platforms will not own the whole loop either — even OpenAI moved transactions back to the merchant's surface. The owner's surface matters. If it is prepared.

## 9. A living field guide

The state of the video agent web is still forming. The useful question stays practical: when a person asks an AI system for an answer, can the right video library be discovered, credited, played, and controlled?

- Can AI discover the library at all?
- Can it identify the useful moments?
- Can it route a person to playback?
- Does it credit the right creator or brand?
- Does the owner keep the audience relationship?
- Which questions should the library answer, but currently cannot?

Vol. 2 will add original measurement: the Video Answer Gap — how often AI answers should include source video, and don't.

AgentCDN makes owned video libraries discoverable, playable, credited, and controlled across AI interfaces — for media brands, creators, and the builders connecting them. Start at https://agentcdn.com/visibility, https://agentcdn.com/media-brands, or https://agentcdn.com/open-standard.

## Sources

1. YouTube, Neal Mohan annual letter: 20M+ used Ask in December, Jan 21, 2026 — https://blog.youtube/inside-youtube/the-future-of-youtube-2026/
2. Business Standard, YouTube rolls out Gemini-powered Ask on PC and mobile, Nov 12, 2025 — https://www.business-standard.com/technology/tech-news/youtube-release-gemini-powered-ask-feature-pc-smartphone-how-it-works-125111200565_1.html
3. TechCrunch, Ask YouTube brings conversational search to the catalog, May 19, 2026 — https://techcrunch.com/2026/05/19/ask-youtube-brings-ai-powered-conversational-search-to-video-adds-gemini-omni-to-shorts/
4. YouTube Blog, YouTube news at Google I/O 2026, May 19, 2026 — https://blog.youtube/news-and-events/youtube-news-google-io-2026/
5. Tech-ish, Ask YouTube expands to smart TVs and consoles, May 30, 2026 — https://tech-ish.com/2026/05/30/gemini-powered-ask-youtube-search-smart-tv/
6. TechCrunch, ChatGPT reaches 900M weekly active users, Feb 27, 2026 — https://techcrunch.com/2026/02/27/chatgpt-reaches-900m-weekly-active-users/
7. Google, A new era for AI Search: AI Mode passes 1B monthly users, May 19, 2026 — https://blog.google/products-and-platforms/products/search/search-io-2026/
8. Google, Gemini 3.5: frontier intelligence with action, May 19, 2026 — https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/
9. Pew Research Center, Google AI summary link behavior, Jul 22, 2025 — https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/
10. Cloudflare, The crawl before the fall of referrals, Jul 1, 2025 — https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/
11. SEOmator, GEO Data Report: Cloudflare Radar crawl-to-refer ratios, Q1 2026 — https://seomator.com/blog/crawl-to-refer-ratio-ai-crawlers-llm-bots
12. Cloudflare, Moving past bots vs. humans, Apr 21, 2026 — https://blog.cloudflare.com/past-bots-and-humans/
13. Spotify Newsroom, prompted podcasts and Personal Podcasts, May 21, 2026 — https://newsroom.spotify.com/2026-05-21/investor-day-podcast-features-updates/
14. TechCrunch, Netflix debuts its generative AI-powered search tool, May 7, 2025 — https://techcrunch.com/2025/05/07/netflix-debuts-its-generative-ai-powered-search-tool/
15. TikTok Newsroom, TikTok World 2026: AI-powered discovery, May 2026 — https://newsroom.tiktok.com/tiktok-world-26-turning-discovery-into-business-growth-with-ai-powered-innovations-vertical-experiences-and-high-impact-brand-solutions
16. OpenAI, Instant Checkout and the Agentic Commerce Protocol, Sep 29, 2025 — https://openai.com/index/buy-it-in-chatgpt/
17. Forrester, The leader in agentic commerce just pulled back, Mar 2026 — https://www.forrester.com/blogs/what-it-means-that-the-leader-in-agentic-commerce-just-pulled-back/
18. TechCrunch, AI traffic to US retailers rose 393% in Q1, converting 42% better, Apr 16, 2026 — https://techcrunch.com/2026/04/16/ai-traffic-to-us-retailers-rose-393-in-q1-and-its-boosting-their-revenue-too/
19. Adobe, AI traffic surges across industries; media trails retail, Jan 2026 — https://business.adobe.com/blog/ai-driven-traffic-surges-across-industries
20. Shopify Engineering, Building the Universal Commerce Protocol, Jan 11, 2026 — https://shopify.engineering/ucp
21. Cloudflare, Introducing pay per crawl, Jul 1, 2025 — https://blog.cloudflare.com/introducing-pay-per-crawl/
22. Cloudflare, Giving users choice with the Content Signals Policy, Sep 24, 2025 — https://blog.cloudflare.com/content-signals-policy/
23. RSL, RSL 1.0 ratified as an industry licensing standard, Dec 2025 — https://rslstandard.org/press/rsl-1-specification-2025
24. WAN-IFRA, RSL Collective compensation plan for news publishers, Apr 2026 — https://wan-ifra.org/2026/04/rsls-ai-use-compensation-plan-for-news-we-think-this-is-a-100-billion-opportunity-for-publishers/
25. Dataconomy, Microsoft unveils Publisher Content Marketplace, Feb 4, 2026 — https://dataconomy.com/2026/02/04/microsoft-unveils-ai-licensing-hub-to-pay-publishers-for-model-training/
26. Anthropic, MCP donated to the Agentic AI Foundation under the Linux Foundation, Dec 9, 2025 — https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
27. Chrome for Developers, WebMCP origin trial in Chrome 149, May 18, 2026 — https://developer.chrome.com/docs/ai/webmcp
28. TechCrunch, Chrome adds agentic Gemini features for autonomous tasks, Jan 28, 2026 — https://techcrunch.com/2026/01/28/chrome-takes-on-ai-browsers-with-tighter-gemini-integration-agentic-features-for-autonomous-tasks/
29. Microsoft, Introducing NLWeb: conversational interfaces for the web, May 19, 2025 — https://news.microsoft.com/source/features/company-news/introducing-nlweb-bringing-conversational-interfaces-directly-to-the-web/
30. Google, Gemini Embedding 2: first natively multimodal embedding model, Mar 10, 2026 — https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-embedding-2/
31. arXiv, Video-MME-v2: next-stage benchmark for video understanding, Apr 2026 — https://arxiv.org/abs/2604.05015
32. Google AI for Developers, Gemini video understanding — https://ai.google.dev/gemini-api/docs/video-understanding
33. W3C, Media Fragments URI 1.0 — https://www.w3.org/TR/media-frags/
34. Google Search Central, VideoObject, Clip, and SeekToAction structured data — https://developers.google.com/search/docs/appearance/structured-data/video
35. Press Gazette, publisher AI deals and lawsuits tracker — https://pressgazette.co.uk/platforms/news-publisher-ai-deals-lawsuits-openai-google/