Beyond HTML: Making the Web Truly Machine-Readable with Block Protocol

Introduction

Since the early days of the World Wide Web, content has been primarily designed for human consumption. HTML, the backbone of web pages, provides basic structural cues like paragraphs and emphasis, but it lacks the depth needed for machines to understand the meaning behind the text. For example, when you mention a book title by making it bold, a computer sees only styling—not a book reference. This fundamental limitation has long frustrated those who dream of a web where intelligent agents can automatically process data, not just display it.

Beyond HTML: Making the Web Truly Machine-Readable with Block Protocol — Source: www.joelonsoftware.com

The Human-Readable Web: A Limitation in Disguise

Consider a typical web page that lists a book:

Goodnight Moon by Margaret Wise Brown
Illustrated by Clement Hurd
Harper & Brothers, 1947
ISBN 0-06-443017-0

To a human, that's a clear bibliographic entry. But a naive program scanning the page has no reliable way to know that this is a book, let alone identify the author, illustrator, or ISBN. The HTML provides only presentational markup—bold for the title, line breaks for each field—but no semantic clues. This is the core problem: the web is great at displaying information to people, but lousy at describing that information to machines.

The Semantic Web Vision: A Dream from 1999

Back in 1999, Tim Berners-Lee painted a compelling picture of a Semantic Web where computers could analyze content, links, and transactions to handle everyday tasks autonomously. In his book Weaving the Web, he wrote:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A 'Semantic Web', which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.”

To realize this, standards like RDF (Resource Description Framework) and JSON-LD were developed, often using vocabularies from schema.org. For instance, you could annotate the book example with structured data to tell a computer: “This is a Book, author is Margaret Wise Brown, ISBN is 0-06-443017-0.” The theory is sound—but the practice has been anything but.

Why Semantic Markup Never Took Off

Despite the vision, adoption of semantic markup remains abysmally low. The main reason? It's hard and it's homework. After writing a beautiful, human-readable blog post, the last thing most authors want to do is dive into arcane syntax like RDF/XML or JSON-LD, look up the right schema.org classes, and meticulously embed them into their HTML. Unless there's an immediate, visible payoff—like a search engine using that data—authors usually skip it. The result is a web that remains, for all practical purposes, semantically barren.

This is where the Block Protocol enters the scene, aiming to flip the script.

Introducing the Block Protocol: A Simpler Path

The Block Protocol is an open standard that reimagines how structured data is added to the web. Instead of demanding that content creators master complex semantic formats, it provides a straightforward system for embedding machine-readable blocks directly into web pages. Think of it as a way to write content in a structure that is both human-friendly and computer-friendly, without the steep learning curve.

How It Works

At its core, the Block Protocol defines a set of well-known blocks—standardized components like a “Book” block, an “Event” block, or a “Person” block. Each block comes with a predefined schema that describes its properties. When you insert a block into your page (for example, via a simple code snippet), you automatically provide structured data that any compatible system can parse. The protocol uses a lightweight JSON format under the hood, but you never have to write raw JSON-LD or RDF by hand.

Benefits Over Traditional Approaches

Lower barrier to entry: Authors use pre-built blocks; no need to learn semantic web standards.
Consistency: Blocks ensure data follows a predictable schema, reducing errors.
Interoperability: Any application that understands the Block Protocol can read and reuse the data, from AI agents to traditional databases.
Progressive enhancement: You can start with simple blocks and add complexity over time.

Conclusion: A Practical Step Toward the Semantic Web

The dream of a fully machine-readable web is still alive, but it needs practical tools. The Block Protocol addresses the biggest adoption hurdle: making semantic markup easy enough that people will actually use it. By wrapping complexity inside simple, reusable components, it lowers the effort required to publish structured data alongside human content. If widely adopted, this could finally turn Berners-Lee's 1999 dream into a reality—one block at a time.

Tags: