Reviving the Dream of a Machine-Readable Web: The Case for Simplified Structured Data

By ⚡ min read
<h2 id="web-shortcoming">The Web's Original Shortcoming</h2><p>Since its inception in the 1990s, the World Wide Web has predominantly served as a platform for publishing human-readable documents. These documents are primarily created using HTML, which offers basic structural cues—like marking a paragraph or emphasizing a word—but little more. CSS then adds stylistic flair, such as making paragraphs appear in tiny gray sans-serif text—a design choice that may appeal to some but alienates older readers or those with visual impairments. This superficial level of “structure” barely scratches the surface of what’s needed for sophisticated machine interpretation.</p><figure style="margin:20px 0"><img src="https://www.joelonsoftware.com/wp-content/uploads/2022/12/IMG_0203-scaled.webp" alt="Reviving the Dream of a Machine-Readable Web: The Case for Simplified Structured Data" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: www.joelonsoftware.com</figcaption></figure><p>Consider a simple example: you mention a book on a web page—say, <em>Goodnight Moon</em> by Margaret Wise Brown, illustrated by Clement Hurd, published by Harper & Brothers in 1947 with ISBN 0-06-443017-0. To a human, the formatting (bold title, line breaks) conveys the information. But a naive computer program scanning that page sees only a jumble of text; it has no way to recognize that this is a book, let alone extract its author, illustrator, publisher, or ISBN. The web lacks the semantic richness necessary for machines to understand content beyond its visual presentation.</p><h2 id="semantic-web-vision">A Glimpse at the Semantic Web Vision</h2><p>As early as 1999, Tim Berners-Lee, the inventor of the web, articulated a grand vision in his book <em>Weaving the Web</em>: “I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.”</p><p>To realize this dream, the Semantic Web community proposed adding explicit structure to web content using standards like RDF (Resource Description Framework) and JSON-LD (JavaScript Object Notation for Linked Data). The idea was to embed metadata—often derived from vocabularies such as <a href="https://schema.org">schema.org</a>—directly into HTML pages. For instance, a book listing could include properties like <code>author</code>, <code>illustrator</code>, and <code>isbn</code> in a machine-readable format. In theory, this would enable automated agents to collect, compare, and process data from across the web with ease.</p><h2 id="why-unfulfilled">Why the Vision Remains Unfulfilled</h2><p>Despite its promise, the Semantic Web has seen only limited adoption in practice. The primary barrier is complexity: adding structured data requires developers to learn specialized markup languages, understand intricate ontologies, and manually embed code snippets into their content. For most web publishers—especially bloggers, small businesses, and individual creators—this extra work feels like tedious homework. Once a beautiful, human-readable post is live, the motivation to invest additional time in machine-readable annotations evaporates. As a result, semantic markup remains rare in the wild, and the web continues to be a largely unstructured document repository.</p><h2 id="enter-block-protocol">Enter the Block Protocol</h2><p>We believe it’s time to bridge this gap. Human progress depends on making information readily accessible not only to people but also to AI systems and traditional computer programs. The challenge is to make adding structured data as effortless as writing content in the first place. That’s why we’re developing the <strong>Block Protocol</strong>—a new approach that reimagines how structured data is created and consumed on the web.</p><figure style="margin:20px 0"><img src="https://www.joelonsoftware.com/wp-content/uploads/2016/12/11969842-1.jpg" alt="Reviving the Dream of a Machine-Readable Web: The Case for Simplified Structured Data" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px">Source: www.joelonsoftware.com</figcaption></figure><p>The core insight of the Block Protocol is simple: <em>people will only add semantic markup if doing so is easy and intuitive</em>. Instead of requiring manual annotation, the protocol enables content creators to use interactive “blocks” that automatically generate and manage structured data in the background. These blocks are modular, reusable components that can be embedded into any web page, app, or CMS. For example, a Book Block would let you input a title, author, ISBN, and other details through a friendly form, then output both a human-readable display and a machine-readable JSON-LD snippet—all without any extra effort on your part.</p><h3 id="how-it-works">How the Protocol Works</h3><p>Under the hood, each Block Protocol block is a self-contained unit that communicates with the host application via a standardized API. When a block is instantiated, it registers its data schema and begins emitting structured data in a format that any compliant system can consume. The host can aggregate data from multiple blocks to create a rich, interlinked dataset. Because the protocol is <strong>open and extensible</strong>, any developer can create new types of blocks—for events, recipes, products, or virtually any structured entity—and share them with the community.</p><p>One of the key advantages of this approach is that it decouples data from presentation. A block can render its content in multiple ways (e.g., a table, a card, or a list) while still producing the same underlying structured data. This makes the web more accessible for both users and machines, and it allows content to be reused across different contexts without modification.</p><h2 id="road-ahead">The Road Ahead</h2><p>We are actively designing and prototyping the Block Protocol, aiming to release a draft specification later this year. We envision a future where anyone can build a webpage that is simultaneously beautiful for humans and richly structured for machines—without needing to master RDF or JSON-LD. By lowering the barrier to entry, the Block Protocol can finally fulfill the promise of the Semantic Web and unlock a new era of automated data exchange, intelligent search, and seamless interoperability.</p><p>If you share our vision, we invite you to join the conversation. Visit our website, contribute to the spec, or build your own blocks. Together, we can make the web truly machine-readable, one block at a time.</p>