Tag cloud
Encyclopedia
A tag cloud is a visual representation for text data, typically used to depict keyword metadata (tags)
on websites, or to visualize free form text. 'Tags' are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
's Microserfs
(1995). A German appearance occurred in 1992.
The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early Web 2.0
websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid.
The first tag clouds on a high-profile website were on the photo sharing site Flickr
, created by Flickr co-founder and interaction designer Stewart Butterfield
in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, a visualization of Web site referrers. Tag clouds were also popularized around the same time by Del.icio.us
and Technorati
, among others.
Over-saturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a noted decline of usage among these early adopters. (Flickr would later "apologize" to the web-development community in their five-word acceptance speech for the 2006 "Best Practices" Webby Award, where they simply stated "sorry about the tag clouds.")
A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Most notably, the method was adapted for visualizing word frequency in free-form natural language texts, first by TagCrowd, created by Stanford University researcher and designer Daniel Steinbock in 2006, and further popularized by Wordle, created by IBM researcher Jonathan Feinberg in 2008.
, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.
In the first type, size represents the number of times that tag has been applied to a single item. This is useful as a means of displaying metadata
about an item that has been democratically
'voted' on and where precise results are not desired. Examples of such use include Last.fm
(to indicate genres attributed to bands) and LibraryThing
(to indicate tags attributed to a book).
In the second, more commonly used type, size represents the number of items to which a tag has been applied, as a presentation of each tag's popularity
. Examples of this type of tag cloud are used on the image
-hosting service Flickr
, blog
aggregator Technorati
and on Google
search results with DeeperWeb
.
In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category.
There are some approaches to construct tag clusters instead of tag clouds, e.g. by applying tag co-occurrences in documents.
More generally, the same visual technique can be used to display non-tag data, as in a word cloud or a data cloud.
The term keyword cloud is sometimes used as a search engine marketing
(SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in search engine optimization
of web pages. Tag clouds as navigation tools make the website appear more interlinked, when crawled by a search engine spider, which may improve the site's search engine rank
.
prices.
. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates
are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language.
for ; else
Since the number of indexed items per descriptor is usually distributed according to a power law
, for larger ranges of values, a logarithm
ic representation makes sense.
Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation.
There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.
Tag (metadata)
In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information . This kind of metadata helps describe an item and allows it to be found again by browsing or searching...
on websites, or to visualize free form text. 'Tags' are usually single words, and the importance of each tag is shown with font size or color. This format is useful for quickly perceiving the most prominent terms and for locating a term alphabetically to determine its relative prominence. When used as website navigation aids, the terms are hyperlinked to items associated with the tag.
History
In the language of visual design, a tag cloud (or word cloud) is one kind of "weighted list", as commonly used on geographic maps to represent the relative size of cities in terms of relative typeface size. An early printed example of a weighted list of English keywords was the "subconscious files" in Douglas CouplandDouglas Coupland
Douglas Coupland is a Canadian novelist. His fiction is complemented by recognized works in design and visual art arising from his early formal training. His first novel, the 1991 international bestseller Generation X: Tales for an Accelerated Culture, popularized terms such as McJob and...
's Microserfs
Microserfs
Microserfs, published by HarperCollins in 1995, is an epistolary novel by Douglas Coupland. It first appeared in short story form as the cover article for the January 1994 issue of Wired magazine and was subsequently expanded to full novel length...
(1995). A German appearance occurred in 1992.
The specific visual form and common use of the term "tag cloud" rose to prominence in the first decade of the 21st century as a widespread feature of early Web 2.0
Web 2.0
The term Web 2.0 is associated with web applications that facilitate participatory information sharing, interoperability, user-centered design, and collaboration on the World Wide Web...
websites and blogs, used primarily to visualize the frequency distribution of keyword metadata that describe website content, and as a navigation aid.
The first tag clouds on a high-profile website were on the photo sharing site Flickr
Flickr
Flickr is an image hosting and video hosting website, web services suite, and online community that was created by Ludicorp in 2004 and acquired by Yahoo! in 2005. In addition to being a popular website for users to share and embed personal photographs, the service is widely used by bloggers to...
, created by Flickr co-founder and interaction designer Stewart Butterfield
Stewart Butterfield
Stewart Butterfield is a Canadian-born entrepreneur and businessman. He co-founded the photo sharing website Flickr and its parent company Ludicorp with then-wife Caterina Fake. In March 2005 Ludicorp was acquired by Yahoo!, where Butterfield continued as the General Manager of Flickr until he...
in 2004. That implementation was based on Jim Flanagan's Search Referral Zeitgeist, a visualization of Web site referrers. Tag clouds were also popularized around the same time by Del.icio.us
Del.icio.us
Delicious is a social bookmarking web service for storing, sharing, and discovering web bookmarks. The site was founded by Joshua Schachter in 2003 and acquired by Yahoo! in 2005, and by the end of 2008, the service claimed more than 5.3 million users and 180 million unique bookmarked URLs...
and Technorati
Technorati
Technorati is an Internet search engine for searching blogs. By June 2008, Technorati was indexing 112.8 million blogs and over 250 million pieces of tagged social media...
, among others.
Over-saturation of the tag cloud method and ambivalence about its utility as a web-navigation tool led to a noted decline of usage among these early adopters. (Flickr would later "apologize" to the web-development community in their five-word acceptance speech for the 2006 "Best Practices" Webby Award, where they simply stated "sorry about the tag clouds.")
A second generation of software development discovered a wider diversity of uses for tag clouds as a basic visualization method for text data. Most notably, the method was adapted for visualizing word frequency in free-form natural language texts, first by TagCrowd, created by Stanford University researcher and designer Daniel Steinbock in 2006, and further popularized by Wordle, created by IBM researcher Jonathan Feinberg in 2008.
Types
There are three main types of tag cloud applications in social softwareSocial software
Social software applications include communication tools and interactive tools. Communication tools typically handle the capturing, storing and presentation of communication, usually written but increasingly including audio and video as well. Interactive tools handle mediated interactions between a...
, distinguished by their meaning rather than appearance. In the first type, there is a tag for the frequency of each item, whereas in the second type, there are global tag clouds where the frequencies are aggregated over all items and users. In the third type, the cloud contains categories, with size indicating number of subcategories.
In the first type, size represents the number of times that tag has been applied to a single item. This is useful as a means of displaying metadata
Metadata
The term metadata is an ambiguous term which is used for two fundamentally different concepts . Although the expression "data about data" is often used, it does not apply to both in the same way. Structural metadata, the design and specification of data structures, cannot be about data, because at...
about an item that has been democratically
Democracy
Democracy is generally defined as a form of government in which all adult citizens have an equal say in the decisions that affect their lives. Ideally, this includes equal participation in the proposal, development and passage of legislation into law...
'voted' on and where precise results are not desired. Examples of such use include Last.fm
Last.fm
Last.fm is a music website, founded in the United Kingdom in 2002. It has claimed 30 million active users in March 2009. On 30 May 2007, CBS Interactive acquired Last.fm for UK£140m ....
(to indicate genres attributed to bands) and LibraryThing
LibraryThing
LibraryThing is a social cataloging web application for storing and sharing book catalogs and various types of book metadata. It is used by individuals, authors, libraries and publishers....
(to indicate tags attributed to a book).
In the second, more commonly used type, size represents the number of items to which a tag has been applied, as a presentation of each tag's popularity
Popularity
Popularity is the quality of being well-liked or common, or having a high social status. Popularity figures are an important part of many people's personal value systems and form a vital component of success in people-oriented fields such as management, politics, and entertainment, among...
. Examples of this type of tag cloud are used on the image
Image
An image is an artifact, for example a two-dimensional picture, that has a similar appearance to some subject—usually a physical object or a person.-Characteristics:...
-hosting service Flickr
Flickr
Flickr is an image hosting and video hosting website, web services suite, and online community that was created by Ludicorp in 2004 and acquired by Yahoo! in 2005. In addition to being a popular website for users to share and embed personal photographs, the service is widely used by bloggers to...
, blog
Blog
A blog is a type of website or part of a website supposed to be updated with new content from time to time. Blogs are usually maintained by an individual with regular entries of commentary, descriptions of events, or other material such as graphics or video. Entries are commonly displayed in...
aggregator Technorati
Technorati
Technorati is an Internet search engine for searching blogs. By June 2008, Technorati was indexing 112.8 million blogs and over 250 million pieces of tagged social media...
and on Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
search results with DeeperWeb
DeeperWeb
DeeperWeb is a free search engine tool for Google users that allows navigating through search results by employing tag cloud techniques. Additional technologies such as Topic-Mapping methods are implemented to assist users in identifying relevant results upon a specific topic.The tool allows...
.
In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category.
There are some approaches to construct tag clusters instead of tag clouds, e.g. by applying tag co-occurrences in documents.
More generally, the same visual technique can be used to display non-tag data, as in a word cloud or a data cloud.
The term keyword cloud is sometimes used as a search engine marketing
Search engine marketing
Search engine marketing, , is a form of Internet marketing that seeks to promote websites by increasing their visibility in search engine result pages through the use of paid placement, contextual advertising, and paid inclusion...
(SEM) term that refers to a group of keywords that are relevant to a specific website. In recent years tag clouds have gained popularity because of their role in search engine optimization
Search engine optimization
Search engine optimization is the process of improving the visibility of a website or a web page in search engines via the "natural" or un-paid search results...
of web pages. Tag clouds as navigation tools make the website appear more interlinked, when crawled by a search engine spider, which may improve the site's search engine rank
PageRank
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...
.
Visual appearance
Tag clouds are typically represented using inline HTML elements. The tags can appear in alphabetical order, in a random order, they can be sorted by weight, and so on. Sometimes, further visual properties are manipulated in addition to font size, such as the font color, intensity, or weight.Most popular is a rectangular tag arrangement with alphabetical sorting in a sequential line-by-line layout. The decision for an optimal layout should be driven by the expected user goals. Some prefer to cluster the tags semantically so that similar tags will appear near each other. Heuristics can be used to reduce the size of the tag cloud whether or not the purpose is to cluster the tags.Data clouds
A data cloud or cloud data is a data display which uses font size and/or color to indicate numerical values It is similar to a tag cloud but instead of word count, displays data such as population or stock marketStock market
A stock market or equity market is a public entity for the trading of company stock and derivatives at an agreed price; these are securities listed on a stock exchange as well as those only traded privately.The size of the world stock market was estimated at about $36.6 trillion...
prices.
Text clouds
A text cloud or word cloud is a visualization of word frequency in a given text as a weighted list. The technique has recently been popularly used to visualize the topical content of political speeches.Collocate clouds
Extending the principles of a text cloud, a collocate cloud provides a more focused view of a document or corpusText corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
. Instead of summarising an entire document, the collocate cloud examines the usage of a particular word. The resulting cloud contains the words which are often used in conjunction with the search word. These collocates
Collocation
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...
are formatted to show frequency (as size) as well as collocational strength (as brightness). This provides interactive ways to browse and explore language.
Perception of tag clouds
Tag clouds have been subject of investigation in several usability studies. The following summary is based on an overview of research results given by Lohmann et al.:- Tag size: Large tags attract more user attention than small tags (effect influenced by further properties, e.g., number of characters, position, neighboring tags).
- Scanning: Users scan rather than read tag clouds.
- Centering: Tags in the middle of the cloud attract more user attention than tags near the borders (effect influenced by layout).
- Position: The upper left quadrant receives more user attention than the others (Western reading habits).
- Exploration: Tag clouds provide suboptimal support when searching for specific tags (if these do not have a very large font size).
Creation of a tag cloud
In principle, the font size of a tag in a tag cloud is determined by its incidence. For a word cloud of categories like weblogs, the frequency of use for example, corresponds to the number of weblog entries that are assigned to a category. For small frequencies it's sufficient to indicate directly for any number from one to a maximum font size. For larger values, a scaling should be made. In a linear normalization, the weight of a descriptor is mapped to a size scale of 1 through f, where and are specifying the range of available weights.for ; else
- : display fontsize
- : max. fontsize
- : count
- : min. count
- : max. count
Since the number of indexed items per descriptor is usually distributed according to a power law
Power law
A power law is a special kind of mathematical relationship between two quantities. When the frequency of an event varies as a power of some attribute of that event , the frequency is said to follow a power law. For instance, the number of cities having a certain population size is found to vary...
, for larger ranges of values, a logarithm
Logarithm
The logarithm of a number is the exponent by which another fixed value, the base, has to be raised to produce that number. For example, the logarithm of 1000 to base 10 is 3, because 1000 is 10 to the power 3: More generally, if x = by, then y is the logarithm of x to base b, and is written...
ic representation makes sense.
Implementations of tag clouds also include text parsing and filtering out unhelpful tags such as common words, numbers, and punctuation.
There are also websites creating artificially or randomly weighted tag clouds, for advertising, or for humorous results.
External links
- Understanding Tag Clouds - an information design analysis of tag clouds
- Tag Clouds Gallery: Examples and Good Practices - comparison of tag cloud visual designs
- Design tips for building tag clouds - software development guide from O'Reilly's ONLamp
- Wordle - web application for creating artistic word clouds from user-supplied text
- TagCrowd - the first web-based word cloud generator, specialized for analysis and visualization of user-supplied text