It is 30 years since the invention of the World-Wide Web.
Tim Berners-Lee stood on the shoulders of giants, but the Web wasn’t just an amalgamation of existing ideas. He and Robert Cailliou created:
- A HyperText Markup Language, human-readable but easy enough for machines to parse and generate. A web page is a complete HTML document.
- A means of referencing HTML pages across the Internet, called Uniform Resource Locators. But a URL can refer to any kind of resource, not just web pages but also plain text, pictures, other files, and even notions like “the last 7 blog posts by skierpage.” (And actually URLs are a specialization of Uniform Resource Identifiers, that let you refer to other protocols, e.g.
mailto:skierpage@example.com?Subject=hello
.) - A specification of the protocol by which a client requests a URL from a server computer and the server responds with the document requested, called HyperText Transfer Protocol
- Free open source software that implemented all this:
- a software library implementing the protocol
- software for an HTTP server, called httpd (hypertext transfer protocol “daemon”)
- software to display and edit HTML pages; we now call the display part a “web browser.”
Nothing new here?
As this great 30-year summary makes clear, there was a ton of prior art.
Markup languages weren’t new
HTML identifies blocks of text as<P>(paragraph), <H1> (heading level 1), etc. and spans of text as <B> (bold), etc. The idea of marking up blocks of text instead of inserting typesetter codes for a particular printer wasn’t new, and HTML was a simplification of the existing SGML.
Hypertext wasn’t new
Hypertext wasn’t new. In fact in a related article John Allsopp says “Tim Berners-Lee proposed a paper about the WWW for Hypertext ’91, a conference for hypertext theory and research. It was rejected! It was considered very simple in comparison with what hypertext systems were supposed to do.” This is a fantastic historical footnote, and the conference organizers weren’t stupid!
The moment you put technical information on a screen, it is completely obvious that the reader should be able to jump to an explanation of a technical term, to jump from an entry in the table of contents or index to the section that’s referenced, and to jump from the phrase “See How to Install the Widget 9000” to… the How to Install chapter. In a former life writing technical documentation printed on paper I looked at electronically publishing manuals using hypertext systems like Folio and OWL. The programs that displayed hypertext resembled web browsers, they even had conventions like underlines for links and as I recall a back button to go back after following a link.
Yet another protocol…
Protocols to access remote computers over the Internet weren’t new. There was File Transfer Protocol to transfer files, Simple Mail Transfer Protocol to retrieve your new emails, and even a Gopher protocol to browse information on that computer. (Many people using the Internet at the time thought Gopher would be the glue linking between computers.)
At nearly the same time, “Wide Area Information Server (WAIS) is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications” (Z39.50:1988) to search index databases on remote computers.”
So what was new?
In a nutshell, linking within hypertext to another computer system, to possibly get more hypertext, blew people’s fragile little minds.
Those hypertext systems I mentioned operated within a local file. You opened Widget9000Setup.NFO in the viewer program and happily jumped around between sections, index, and paragraphs, but there was no “jump to manufacturer’s server on the Internet for latest service bulletins,” there was no “Here’s a hypertext list of other hypertext files on the Internet about Widget 9000 customizations.” The companies selling hypertext authoring software probably fantasized about getting thousands of people to buy their proprietary software to author their own parts of a federated set of hypertexts, but they didn’t have the vision, and a single commercial vendor would have really struggled to establish their file format as a network standard.
A server is hard but powerful
Because links in HTML can go to other computers, the Web requires a separate program running on that remote computer to respond to requests (although you can open local files on your computer in your browser without involving a server). The hypertext software makers must have laughed. “So in addition to our hypertext viewer program running on the user’s computer, you have to get the I.T. department to install and run an extra software daemon to respond to all these requests for bits of hypertext? That is the stupidest and most overkill approach imaginable! Just give people the file with all 70 pages and illustrations of our Widget 9000 instruction manual in it.” Remember, by definition there were no manufacturers’ web sites yet, and people and computers communicated over slow modems. Making requests to other computers was theoretically useful, but not just to get the next little chunk of hypertext.
Because everything Sir Tim developed at CERN was open source, and because the HTTP protocol was relatively simple, and because Unix was very common on servers, it turned out that having to run an HTTP server wasn’t a big barrier.
Uniform/Universal/Ubiquitous Resource Locator
The URL itself is genius. There were computers on the web that you could contact, mostly run by computer companies and universities and labs like CERN. Many of them supported File Transfer Protocol so you might be able to get a list of public files to download. Some of them even supported the Gopher and WAIS protocols I mentioned above that presented a friendlier list of files. But mostly if you connected to a computer, it was to login and type computer commands. You could imagine a hypertext page having a “check for Widget 9000 availability” link that would connect to the company’s server as a remote terminal and maybe even simulate pressing C(heck for inventory) then typing Widget [Tab] 9000 [Enter] – all the pecking away at keyboards that staff used to type when you asked for a book at the library or checked in to a flight. But the poor hypertext author would have to write a little script for every single computer server. A URL can encode the request as a single thing that fits into the HTML page, it’s like a hypertext system’s “Jump to the Troubleshooting section” link but infinitely richer.
It’s human visible
The Widget 9000 availability URL is probably quite complex, maybe http://acmecorp.com/coyote/stock check.asp?part=widget9000
. But you can see it in your browser’s location field, it probably makes sense, and is irresistibly tempting to fiddle with it, aka hack on it: what if if I substitute “ferrari355” for “widget9000”?
Similarly, you can view the source “code” of an HTML page. I taught myself the rudiments of HTML just by guessing or recognizing what tags like TITLE, P, A HREF=, etc. did. You could write the markup for something simple by hand; the home page of my web site and some other sections are still prehistoric hand-written HTML.. (Those golden days are gone now that most web pages are generated on-site by over-complex content management systems and each loads 10 JavaScript libraries and 7 ad networks and Like this on Facebook / Tweet this / Pin it buttons.)
The Web could subsume other systems
Because the client (usually a browser under the command of a person) makes requests to a server program, the Web can subsume or impersonate other systems. A simple computer program can output a Gopher category list or a directory listing as a basic HTML page with a bulleted list of links (more on this). As the Web gained mindshare among developers, people built the bridges to all the other protocols, and so a browser turned into a do-anything tool, and URLs became the lingua franca for any kind of request across the Internet. Thirty years of innovation built upon the simple clear underlying ideas of the Web.