XHTML is a family of XML markup languages that mirror or extend versions of the widely-used Hypertext Markup Language , the language in which web pages are written....
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
and other attributes in web pages and other contexts that support (X)HTML, such as RSS
RSS
-Mathematics:* Root-sum-square, the square root of the sum of the squares of the elements of a data set* Residual sum of squares in statistics-Technology:* RSS , "Really Simple Syndication" or "Rich Site Summary", a family of web feed formats...
In computer science, a software agent is a piece of software that acts for a user or other program in a relationship of agency, which derives from the Latin agere : an agreement to act on one's behalf...
An address book or a name and address book is a book or a database used for storing entries called contacts. Each contact entry usually consists of a few standard fields...
A geographic coordinate system is a coordinate system that enables every location on the Earth to be specified by a set of numbers. The coordinates are often chosen such that one of the numbers represent vertical position, and two or three of the numbers represent horizontal position...
, calendar events, and the like) automatically.
Although the content of web pages is technically already capable of "automated processing", and has been since the inception of the web, such processing is difficult because the traditional markup tags
Markup language
A markup language is a modern system for annotating a text in a way that is syntactically distinguishable from that text. The idea and terminology evolved from the "marking up" of manuscripts, i.e. the revision instructions by editors, traditionally written with a blue pencil on authors' manuscripts...
used to display information on the web do not describe what the information means. Microformats can bridge this gap by attaching semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
, and thereby obviate other, more complicated, methods of automated processing, such as natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
or screen scraping. The use, adoption and processing of microformats enables data items to be indexed, searched for, saved or cross-referenced, so that information can be reused or combined.
, microformats allow the encoding and extraction of events, contact information, social relationships and so on. More are being developed.
Background
Microformats emerged as part of a grassroots movement to make recognizable data items (such as events, contact details or geographical locations) capable of automated processing by software, as well as directly readable by end-users. Link-based microformats emerged first. These include vote links that express opinions of the linked page, which search engines can tally into instant polls.
CommerceNet is a 5016 organization established in 1994 to promote electronic commerce on the Internet. The organisation initially focused on industry-wide research and programs that have advanced the commercial use of the Internet.-History:...
, a nonprofit organization that promotes electronic commerce on the Internet, helped sponsor and promote the technology and support the microformats community in various ways. CommerceNet also helped co-found the Microformats.org community site.
Neither CommerceNet nor Microformats.org operates as a standards body. The microformats community functions through an open wiki, a mailing list, and an Internet relay chat (IRC
Internet Relay Chat
Internet Relay Chat is a protocol for real-time Internet text messaging or synchronous conferencing. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer, including file...
) channel. Most of the existing microformats were created at the Microformats.org wiki and the associated mailing list, by a process of gathering examples of web publishing behaviour, then codifying it. Some other microformats (such as rel=nofollow
Nofollow
nofollow is a value that can be assigned to the rel attribute of an HTML a element to instruct some search engines that a hyperlink should not influence the link target's ranking in the search engine's index...
According to its website, unAPI is:a tiny HTTP API any web application may use to co-publish discretely identified objects in both HTML pages and disparate bare object formats...
) have been proposed, or developed, elsewhere.
Technical overview
XHTML and HTML standards allow for the embedding and encoding of semantics within the attributes of markup tags
HTML element
An HTML element is an individual component of an HTML document. HTML documents are composed of a tree of HTML elements and other nodes, such as text nodes. Each element can have attributes specified. Elements can also have content, including other elements and text. HTML elements represent...
. Microformats take advantage of these standards by indicating the presence of metadata using the following attributes:
class
rel
rev (in one case, otherwise deprecated in microformats)
For example, in the text "The birds roosted at 52.48, -1.89" is a pair of numbers which may be understood, from their context, to be a set of geographic coordinates
Geographic coordinate system
A geographic coordinate system is a coordinate system that enables every location on the Earth to be specified by a set of numbers. The coordinates are often chosen such that one of the numbers represent vertical position, and two or three of the numbers represent horizontal position...
In HTML, the span and div elements are used where parts of a document cannot be semantically described by other HTML elements.Most HTML elements carry semantic meaning – i.e. the element describes, and can be made to function according to, the type of data contained within...
(or other HTML elements) with specific class names (in this case geo, latitude and longitude, all part of the geo microformat
Geo (microformat)
Geo is a microformat used for marking up WGS84 geographical coordinates in HTML. Although termed a "draft" specification, this is a formality, and the format is stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is...
specification):
The birds roosted at 52.48, -1.89
software agents can recognize exactly what each value represents and can then perform a variety of tasks such as indexing, locating it on a map and exporting it to a GPS device.
Example
In this example, the contact information is presented as follows:
With hCard microformat markup, that becomes:
Here, the formatted name (fn), organisation (org), telephone number (tel) and web address
Uniform Resource Locator
In computing, a uniform resource locator or universal resource locator is a specific character string that constitutes a reference to an Internet resource....
(url) have been identified using specific class names and the whole thing is wrapped in class="vcard", which indicates that the other classes form an hCard (short for "HTML vCard
VCard
vCard is a file format standard for electronic business cards. vCards are often attached to e-mail messages, but can be exchanged in other ways, such as on the World Wide Web or Instant Messaging...
") and are not merely coincidentally named. Other, optional, hCard classes also exist. Software, such as browser plug-ins, can now extract the information, and transfer it to other applications, such as an address book.
Specific microformats
Several microformats have been developed to enable semantic markup of particular types of information.
hAtom is a draft Microformat for marking up HTML, using classes and rel attributes, content on web pages that contain blog entries or similar chronological content...
The name Atom applies to a pair of related standards. The Atom Syndication Format is an XML language used for web feeds, while the Atom Publishing Protocol is a simple HTTP-based protocol for creating and updating web resources.Web feeds allow software programs to check for updates published on a...
hCalendar is a microformat standard for displaying a semantic HTML representation of iCalendar-format calendar information about an event, on web pages, using HTML classes and rel attributes....
Geo is a microformat used for marking up WGS84 geographical coordinates in HTML. Although termed a "draft" specification, this is a formality, and the format is stable and in widespread use; not least as a sub-set of the published hCalendar and hCard microformat specifications, neither of which is...
In geography, the latitude of a location on the Earth is the angular distance of that location south or north of the Equator. The latitude is an angle, and is usually measured in degrees . The equator has a latitude of 0°, the North pole has a latitude of 90° north , and the South pole has a...
Longitude is a geographic coordinate that specifies the east-west position of a point on the Earth's surface. It is an angular measurement, usually expressed in degrees, minutes and seconds, and denoted by the Greek letter lambda ....
hNews is a microformat for news content developed by the Associated Press and the Media Standards Trust. hNews extends hAtom, introducing a number of fields that more completely describe a journalistic work. hNews also introduces rel-principles...
hProduct is a microformat for publishing details of products, on web pages, using HTML classes and rel attributes..On 12 May 2009, Google announced that they would be parsing the hProduct, hCard and hReview microformats, and using them to populate search result pages....
hRecipe is a draft microformat for publishing details of recipes using HTML on web pages, using HTML classes and rel attributes. In its simplest form, it can be used to identify individual foodstuffs, because the only required properties are fn and an ingredient, which can be the same:...
hResume is a microformat for publishing résumé or Curriculum Vitae information using HTML on web pages. Like many other microformats, hResume uses HTML classes and rel attributes to make an otherwise non-semantic document more meaningful...
hReview is a microformat for publishing reviews of books, music, films, restaurants, businesses, holidays, etc. using HTML on web pages, using HTML classes and rel attributes.....
In computing, a folder, directory, catalog, or drawer, is a virtual container originally derived from an earlier Object-oriented programming concept by the same name within a digital file system, in which groups of computer files and other folders can be kept and organized.A typical file system may...
– for distributed directory creation and inclusion
rel-enclosure – for multimedia attachments to web pages
nofollow is a value that can be assigned to the rel attribute of an HTML a element to instruct some search engines that a hyperlink should not influence the link target's ranking in the search engine's index...
, an attempt to discourage third-party content spam (e.g. spam in blogs
Spam in blogs
Spam in blogs is a form of spamdexing. It is done by automatically posting random comments or promoting commercial services to blogs, wikis, guestbooks, or other publicly...
In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...
A folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging...
XHTML Friends Network is an HTML microformat developed by Global Multimedia Protocols Group that provides a simple way to represent human relationships using links. XFN enables web authors to indicate relationships to the people in their blogrolls by adding one or more keywords as the rel...
XOXO is an XML microformat for outlines built on top of XHTML. Developed by several authors as an attempt to reuse XHTML building blocks instead of inventing unnecessary new XML elements/attributes, XOXO is based on existing conventions for publishing outlines, lists, and blogrolls on the Web.The...
– for lists and outlines
Microformats under development
Among the many proposed microformats, the following are undergoing active development:
hAudio – for audio files and references to released recordings
citation – for citing references
currency – for amounts of money
figure – for associating captions with images
geo extensions – for places on Mars, the Moon, and other such bodies; for altitude; and for collections of waypoint
Waypoint
A waypoint is a reference point in physical space used for purposes of navigation.-Concept:Waypoints are sets of coordinates that identify a point in physical space. Coordinates used can vary depending on the application. For terrestrial navigation these coordinates can include longitude and...
Route may refer to:* Route or thoroughfare for transportation* Route number or road number*Trade route, a commonly used path for the passage of goods*Scenic route, a thoroughfare designated as scenic based on the scenery through which it passes...
Borders define geographic boundaries of political entities or legal jurisdictions, such as governments, sovereign states, federated states and other subnational entities. Some borders—such as a state's internal administrative borders, or inter-state borders within the Schengen Area—are open and...
species – for the names of living things (already used by Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
The British Broadcasting Corporation is a British public service broadcaster. Its headquarters is at Broadcasting House in the City of Westminster, London. It is the largest broadcaster in the world, with about 23,000 staff...
Wildlife Finder)
measure – for physical quantities, structured data-values
Uses of microformats
Using microformats within HTML code provides additional formatting and semantic data that applications can use. For example, applications such as web crawlers can collect data about on-line resources, or desktop applications such as e-mail clients or scheduling software can compile details. The use of microformats can also facilitate "mash ups" such as exporting all of the geographical locations on a web page into (for example) Google Maps
Google Maps
Google Maps is a web mapping service application and technology provided by Google, free , that powers many map-based services, including the Google Maps website, Google Ride Finder, Google Transit, and maps embedded on third-party websites via the Google Maps API...
Operator is an extension for the Mozilla Firefox web browser. It parses and acts upon a number of microformats, as well as validating them.Operator lets the user access microformats through a number of methods, all of which are optional: a toolbar, a toolbar button, a status bar icon, a location...
Windows Internet Explorer is a series of graphical web browsers developed by Microsoft and included as part of the Microsoft Windows line of operating systems, starting in 1995. It was first released as part of the add-on package Plus! for Windows 95 that year...
, provide the ability to detect microformats within an HTML document. When hCard or hCalendar are involved, such browser extensions allow to export them into formats compatible with contact management and calendar utilities, such as Microsoft Outlook
Microsoft Outlook
Microsoft Outlook is a personal information manager from Microsoft, available both as a separate application as well as a part of the Microsoft Office suite...
. When dealing with geographical coordinates, they allow to send the location to maps applications such as Google Maps. Yahoo! Query Language
Yahoo! query language
Yahoo! query language is an SQL-like query language created by Yahoo! as part of their Developer Network. YQL is designed to retrieve and manipulate data from APIs through a single Web interface, thus allowing mashups that enable developers to create their own applications.Initially launched in...
can be used to extract microformats from web pages. On 12 May 2009, Google
Google search
Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....
announced that they would be parsing the hCard, hReview and hProduct microformats, and using them to populate search result pages. They have since extended this to use hCalendar for events and hRecipe for cookery recipes. Similarly, microformats are also consumed by Bing
Bing
Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils...
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...
. Together, these are the world's top three search engines.
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
expressed a desire to incorporate Microformats into upcoming projects; as have other software companies.
Alex Faaborg summarizes the arguments for putting the responsibility for microformat user interfaces in the web browser rather than making more complicated HTML:
Only the web browser knows what applications are accessible to the user and what the user's preferences are
It lowers the barrier to entry for web site developers if they only need to do the markup and not handle "appearance" or "action" issues
Retains backwards compatibility with web browsers that don't support microformats
The web browser presents a single point of entry from the web to the user's computer, which simplifies security issues
Evaluation of microformats
Various commentators have offered review and discussion on the design principles and practical aspects of microformats. Additionally, microformats have been compared to other approaches that seek to serve the same or similar purpose. From time to time, there is criticism of a single, or all, microformats. Documented efforts to advocate both the spread and use of microformats are known to exist as well. Opera Software
Opera Software
Opera Software ASA is a Norwegian software company, primarily known for its Opera family of web browsers with over 220 million users worldwide. Opera Software is also involved in promoting Web standards through participation in the W3C. The company has its headquarters in Oslo, Norway and is...
Håkon Wium Lie is a web pioneer, a standards activist, and, , the Chief Technology Officer of Opera Software.He is best known for proposing the concept of Cascading Style Sheets while working with Tim Berners-Lee and Robert Cailliau at CERN in 1994. As an employee at W3C, he developed CSS into a...
said in 2005 "We will also see a bunch of microformats being developed, and that’s how the semantic web
Semantic Web
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
will be built, I believe." However, as of August 2008, Toby Inkster, author of the "Swignition" (formerly "Cognition") microformat parsing service pointed out that no new microformat specifications had been published for over three years.
Rohit Khare is computer science entrepreneur who has been active in many aspects of the development of the World Wide Web. He is the founder of Ångströ, co-founder of KnowNow, former Director of CommerceNet Labs, and a key player in the microformats community. He holds a Ph.D. from the University...
stated that reduce, reuse, and recycle is "shorthand for several design principles" that motivated the development and practices behind microformats. These aspects can be summarized as follows:
Reduce: favor the simplest solutions and focus attention on specific problems;
Reuse: work from experience and favor examples of current practice;
Recycle: encourage modularity and the ability to embed, valid XHTML can be reused in blog posts, RSS feeds, and anywhere else you can access the web.
Accessibility
Because some microformats make use of title attribute of HTML's abbr element to conceal machine-readable data (particularly date-times and geographical coordinates) in the "abbr design pattern", the plain text content of the element is inaccessible to those screen reader
Screen reader
A screen reader is a software application that attempts to identify and interpret what is being displayed on the screen . This interpretation is then re-presented to the user with text-to-speech, sound icons, or a Braille output device...
s that expand abbreviations. In June 2008, the BBC announced that it would be dropping use of microformats using the abbr design pattern because of accessibility concerns.
Comparison with alternative approaches
Microformats are not the only solution for providing "more intelligent data" on the web. Alternative approaches exist and are under development as well. For example, the use of XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
The Semantic Web is a collaborative movement led by the World Wide Web Consortium that promotes common formats for data on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web of unstructured documents into a "web of...
are cited as alternative approaches. Some contrast these with microformats in that they do not necessarily coincide with the design principles of "reduce, reuse, and recycle", at least not to the same extent.
Tantek Çelik is a computer scientist of Turkish-American descent and was the Chief Technologist at Technorati. He is one of the principal editors of several CSS Specifications....
, characterized a problem with alternative approaches:
For some applications the use of other approaches may be valid. If one wishes to use microformat-style embedding but the type of data one wishes to embed does not map to an existing microformat, one can use RDFa
RDFa
RDFa is a W3C Recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents...
to embed arbitrary vocabularies into HTML, for example: embedding domain-specific scientific data on the Web like zoological or chemical data where no microformat for such data exists. Furthermore, standards such as W3C's GRDDL
GRDDL
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to...
allow microformats to be converted into data compatible with the Semantic Web.
Another advocate of microformats, Ryan King, put the compatibility of microformats with other approaches this way:
ContextObjects in Spans, commonly abbreviated COinS, is a method to embed bibliographic metadata in the HTML code of web pages. This allows bibliographic software to publish machine-readable bibliographic items and client reference management software to retrieve bibliographic metadata. The...
Embedded RDF is a syntax for writing HTML in such a way that the information in the HTML document can be extracted into Resource Description Framework...
GRDDL is a markup format for Gleaning Resource Descriptions from Dialects of Languages. It is a W3C Recommendation, and enables users to obtain RDF triples out of XML documents, including XHTML. The GRDDL specification shows examples using XSLT, however it was intended to be abstract enough to...
In artificial intelligence, an intelligent agent is an autonomous entity which observes through sensors and acts upon an environment using actuators and directs its activity towards achieving goals . Intelligent agents may also learn or use knowledge to achieve their goals...
Microdata is a WHATWG HTML specification used to nest semantics within existing content on web pages. Search engines, web crawlers, and browsers can extract and process Microdata from a web page and use it to provide a richer browsing experience for users. Microdata use a supporting vocabulary to...
Schema.org is an initiative launched on 2 June 2011 by Bing, Google and Yahoo! to introduce the concept of the Semantic Web to websites. On 1 November Yandex joined the initiative . The operators of the world's largest search engines propose to mark up website content as metadata about itself,...
In the semantic web, Simple HTML Ontology Extensions are a small set of HTML extensions designed to give web pages semantic meaning by allowing information such as class, subclass and property relationships....
In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...
XHTML Meta Data Profiles is a format for defining metadata 'profiles' or formats in a machine-readable fashion, while also enabling people to see a description of the definition visually in a web browser. XMDP definitions are expressed in XHTML...