What is the Open Knowledge Format (OKF)?

OKF is an open spec from Google for representing organizational knowledge in plain markdown files with YAML frontmatter. Each file describes one concept (a dataset, metric, or playbook) and markdown links between files form a knowledge graph. The only required field in the frontmatter is 'type'. No SDK, no vendor platform required.

How is OKF different from a company wiki?

A wiki is designed for humans browsing and editing pages. OKF is designed so both humans and AI agents can consume the same files. The file path is the concept's identity, links are explicit relationships, and the YAML frontmatter gives structured metadata agents can parse without guessing. It also lives in version control next to your code.

Is OKF safe to use? Does Google see our internal knowledge?

The format itself is vendor-neutral and safe: a directory of markdown files you control. Google's static HTML visualizer also keeps data local (no data leaves the page). The thing to evaluate carefully is Google's hosted tooling. The BigQuery Enrichment Agent runs LLM passes over your data inside Google Cloud. Adopt the format freely; vet what flows into any managed pipeline.

Why do 95% of enterprise AI pilots fail?

According to MIT's The GenAI Divide: State of AI in Business 2025, the barriers are organizational, not technological. Tools that work in a demo fail to integrate with enterprise workflows and data realities. The knowledge agents need is scattered across catalogs, wikis, drives, code comments, and senior engineers' heads, none of it in a form an agent can reliably navigate.

Where can I find the OKF spec and reference implementations?

The GoogleCloudPlatform/knowledge-catalog repo on GitHub has the v0.1 spec, a BigQuery Enrichment Agent that drafts OKF docs from datasets, a static HTML visualizer, and three sample bundles covering GA4 e-commerce, Stack Overflow, and Bitcoin public datasets.

NOTE· 2026-06-13· 6 min

Agent OS: Why Google's Open Knowledge Format Is the Missing Layer

Google's Open Knowledge Format is a plain-markdown spec for company knowledge that AI agents can actually read. Here's why that matters more than a better model.

95% of AI pilots fail, and it is rarely a model problem

MIT's The GenAI Divide: State of AI in Business 2025 looked at 52 executive interviews, 153 leader surveys, and 300 public AI deployments. The finding: 95% of enterprise generative-AI pilots delivered zero measurable P&L impact. Only about one in twenty produced something a CFO would call a result.

The instinct is to blame the model, but that's the wrong layer to look at.

The authors found the barriers are organizational: tools that work in a demo fail to integrate with actual enterprise workflows and data. The model is fine. The knowledge it needs to do anything useful is a mess.

That problem existed before AI. Poor documentation, siloed wikis, processes living in senior engineers' heads: this is a decades-old complaint. What AI does is make the cost of it visible and immediate. An agent that answers "how do I compute weekly active users from our event stream?" and gives a generic response isn't stupid. It just has no idea how your company actually works.

What fragmented knowledge looks like in practice

Company knowledge is spread across multiple places simultaneously: data catalogs, Confluence pages, shared drives, code comments. Each source was built for a different consumer and speaks a different format. None of it connects. And then there's the long-term memory of whoever has been there longest, which doesn't live in any system at all.

An agent trying to answer a real business question has to stitch all of this together without a map. Sometimes it works. More often it hallucinates a plausible-sounding answer because the real one wasn't findable.

This is not a retrieval problem. Better vector search helps on the margin. The underlying issue is that the knowledge was never structured in a form agents can navigate reliably. That's the gap OKF is trying to fill.

If you've been building with agents and wondering why they keep producing generic answers, the context engineering post covers the technical side of this. The short version: data hygiene fixes most agent problems before any model tuning is needed.

What the Open Knowledge Format actually is

Google published OKF v0.1 as an open spec alongside the knowledge-catalog GitHub repo. The pitch: a portable, vendor-neutral standard for representing organizational knowledge in a form both humans and AI agents can consume.

The format is deliberately minimal. Three building blocks:

Each markdown file is one "concept" (a dataset, a metric, a playbook, a process). The file path is the concept's identity.
Markdown links between files form a knowledge graph. If a "weekly active users" concept file links to an "event stream schema" file, that relationship is explicit and navigable.
YAML frontmatter with type as the only required field. Optional fields include title, description, resource (a URL), tags, and timestamp. That's it.

A bundle is a directory of these files. Optional index.md provides hierarchy; optional log.md tracks history. No SDK needed. The whole thing lives in a folder.

The OKF spec cites Andrej Karpathy: LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The format makes that actually useful by giving agents a consistent structure to navigate.

Why the "Agent OS" framing holds up

Compare it to an operating system: the layer every process boots from, shared, versioned, consistent. Right now, most company AI projects have no equivalent. Each agent starts from scratch, scraping whatever it can find.

OKF proposes a shared knowledge layer: a directory of concept files that sits in version control next to your code, gets updated like code, and is readable by humans and agents from the same source. When an agent needs to understand how your data pipeline works, it navigates the OKF bundle. When a new engineer joins, they read the same files.

That's the actual value: a single structured layer both audiences consume without translation, not a shinier data catalog.

This connects directly to the difference between an AI assistant and an AI agent. Assistants can muddle through with vague context. Agents running autonomous loops cannot. They need reliable structure or they drift. OKF is designed for the agent case.

For what it looks like when a structured knowledge layer gets wired into an actual pipeline, the Content Machine case shows how a five-agent system handles research, drafting, and formatting. A lot of the reliability comes from inputs being well-structured before any agent touches them.

What to adopt freely, and what to vet

The format itself is safe. Markdown files, YAML frontmatter, lives in your own repo. There is no vendor dependency in the spec. Google's static HTML visualizer explicitly keeps data local ("no data leaves the page"). Nothing about OKF requires Google.

The thing to think carefully about is Google's hosted tooling. The BigQuery Enrichment Agent walks your datasets, drafts OKF docs, and runs LLM passes to enrich them with schemas and joins, all inside Google Cloud. That's a different decision than adopting the file format.

Adopt the open format freely; vet what internal knowledge flows into any managed pipeline that generates or hosts it. This is the same evaluation you'd run for any vendor that processes your internal data. The format being open doesn't make the pipeline neutral.

This comes up in content work too. Vendor lock-in often enters through the data layer. The format of your knowledge matters as much as which model processes it.

The 5% who succeed probably didn't find a better model

Back to that MIT number. A 95% failure rate is about organizations that bolt capable agents onto knowledge that was never designed to be consumed by anything.

The companies doing well aren't running different LLMs. They've done the unglamorous work of making their internal knowledge findable and structured. OKF is a concrete proposal for what that looks like: plain files, explicit relationships, no proprietary format.

Whether OKF becomes a standard or stays a useful idea from Google, the underlying principle holds. AI projects that fail on knowledge quality will keep failing regardless of model generation. Fixing the knowledge layer is harder than upgrading the model, and it doesn't show up on a demo. My guess is that's why most teams skip it.

If you're thinking about where to start, the market research automation case shows what happens when the research layer gets structured before automation. Most of that output quality difference came from the quality of the inputs.

For the full OKF spec and sample bundles, the GoogleCloudPlatform/knowledge-catalog repo has everything. Three sample bundles cover GA4 e-commerce, Stack Overflow, and Bitcoin public datasets.

FAQ

What is the Open Knowledge Format (OKF)?: OKF is an open spec from Google for representing organizational knowledge in plain markdown files with YAML frontmatter. Each file describes one concept (a dataset, metric, or playbook) and markdown links between files form a knowledge graph. The only required field in the frontmatter is 'type'. No SDK, no vendor platform required.
How is OKF different from a company wiki?: A wiki is designed for humans browsing and editing pages. OKF is designed so both humans and AI agents can consume the same files. The file path is the concept's identity, links are explicit relationships, and the YAML frontmatter gives structured metadata agents can parse without guessing. It also lives in version control next to your code.
Is OKF safe to use? Does Google see our internal knowledge?: The format itself is vendor-neutral and safe: a directory of markdown files you control. Google's static HTML visualizer also keeps data local (no data leaves the page). The thing to evaluate carefully is Google's hosted tooling. The BigQuery Enrichment Agent runs LLM passes over your data inside Google Cloud. Adopt the format freely; vet what flows into any managed pipeline.
Why do 95% of enterprise AI pilots fail?: According to MIT's The GenAI Divide: State of AI in Business 2025, the barriers are organizational, not technological. Tools that work in a demo fail to integrate with enterprise workflows and data realities. The knowledge agents need is scattered across catalogs, wikis, drives, code comments, and senior engineers' heads, none of it in a form an agent can reliably navigate.
Where can I find the OKF spec and reference implementations?: The GoogleCloudPlatform/knowledge-catalog repo on GitHub has the v0.1 spec, a BigQuery Enrichment Agent that drafts OKF docs from datasets, a static HTML visualizer, and three sample bundles covering GA4 e-commerce, Stack Overflow, and Bitcoin public datasets.

Written by

Eric Hinzpeter

Eric Hinzpeter, Senior B2B Content Strategist. He builds production AI agents and marketing automation, and documents the results here.

About LinkedIn

More notes4 of many

Agent OS: Why Google's Open Knowledge Format Is the Missing Layer

95% of AI pilots fail, and it is rarely a model problem

What fragmented knowledge looks like in practice

What the Open Knowledge Format actually is

Why the "Agent OS" framing holds up

What to adopt freely, and what to vet

The 5% who succeed probably didn't find a better model

Automated SEO Reports with Google Search Console and n8n

Why I Built My Own Thumbnail Generator Instead of Using Canva

Why I Built My Own Readability Checker Instead of an English-Only Score

Why I Stopped Uploading My Photos to Random Online Converters