Thesaurus
Encyclopedia
A thesaurus is a reference work
that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary
, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than 920,000 entries.
authored the first text that could now be called a thesaurus. In Sanskrit
, the Amarakosha
is a thesaurus in verse form, written in the 4th century. The first example of the modern genre
, Roget's Thesaurus
, was compiled in 1805 by Peter Mark Roget, and published in 1852. Entries in Roget's Thesaurus are listed conceptually rather than alphabetically.
Although including synonyms, a thesaurus should not be taken as a complete list of all the synonyms for a particular word. The entries are also designed for drawing distinctions between similar words and assisting in choosing exactly the right word. Unlike a dictionary
, a thesaurus entry does not give the definition of words.
The word "thesaurus" is derived from 16th-century New Latin
, in turn from Latin
thesaurus, which is the latinisation
of the Greek
(thēsauros), literally "treasure store", generally meaning a collection of things which are of big importance or value (and thus the medieval rank of thesaurer was a synonym for treasurer
). This meaning has been largely supplanted by Roget's usage of the term.
, Library Science
, and Information Technology
, specialized thesauri are designed for information retrieval. They are a type of controlled vocabulary
, for indexing or tagging purposes. Such a thesaurus can be used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the Canadian
Information retrieval thesauri are formally organized so that existing relationships between concepts are made explicit. As a result, they are more complex than simpler controlled vocabularies such as authority lists and synonym ring
s. Each term is placed in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. Following international standards, they are generally arranged hierarchically by themes, topics or facets. Unlike a literary thesaurus, these specialized thesauri typically focus on one discipline, subject or field of study.
In information technology
, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence
, a thesaurus may sometimes be referred to as an ontology.
Thesauri for information retrieval are typically constructed by information specialists, and have their own unique vocabulary defining different kinds of terms and relationships:
Terms
are the basic semantic units for conveying concept
s. They are usually single-word noun
s, since nouns are the most concrete part of speech. Verbs can be converted to nouns – "cleans" to "cleaning", "reads" to "reading", and so on. Adjectives and adverbs, however, seldom convey any meaning useful for indexing. When a term is ambiguous
, a “scope note” can be added to ensure consistency, and give direction on how to interpret the term. Not every term needs a scope note, but their presence is of considerable help in using a thesaurus correctly and reaching a correct understanding of the given field of knowledge.
"Term relationships" are links between terms. These relationships can be divided into three types: hierarchical, equivalency or associative.
}
For multilingual vocabularies, the ISO 5964 Guidelines for the establishment and development of multilingual thesauri can be applied.
Thesaurus Construction and Use: a practical manual. Jean Aitchison, Allan Gilchrist and David Bawden. London and New York: Europa Publications (2000).
Reference work
A reference work is a compendium of information, usually of a specific type, compiled in a book for ease of reference. That is, the information is intended to be quickly found when needed. Reference works are usually referred to for particular pieces of information, rather than read beginning to end...
that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary
Dictionary
A dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
, which contains definitions and pronunciations. The largest thesaurus in the world is the Historical Thesaurus of the Oxford English Dictionary, which contains more than 920,000 entries.
History and use of term
In antiquity, Philo of ByblosPhilo of Byblos
Philo of Byblos was an antiquarian writer of grammatical, lexical and historical works in Greek. He is chiefly known for his Phoenician history assembled from the writings of Sanchuniathon.-Life:...
authored the first text that could now be called a thesaurus. In Sanskrit
Sanskrit
Sanskrit , is a historical Indo-Aryan language and the primary liturgical language of Hinduism, Jainism and Buddhism.Buddhism: besides Pali, see Buddhist Hybrid Sanskrit Today, it is listed as one of the 22 scheduled languages of India and is an official language of the state of Uttarakhand...
, the Amarakosha
Amarakosha
The Amarakosha from amara "immortal" and kosha "treasure, casket, pail, collection, dictionary", also Namalinganushasana from nama-linga-anu-shasana "instruction concerning nouns and gender") is a thesaurus of Sanskrit written by the Jain or Buddhist scholar Amarasimha...
is a thesaurus in verse form, written in the 4th century. The first example of the modern genre
Genre
Genre , Greek: genos, γένος) is the term for any category of literature or other forms of art or culture, e.g. music, and in general, any type of discourse, whether written or spoken, audial or visual, based on some set of stylistic criteria. Genres are formed by conventions that change over time...
, Roget's Thesaurus
Roget's Thesaurus
Roget's Thesaurus is a widely-used English language thesaurus, created by Dr. Peter Mark Roget in 1805 and released to the public on 29 April 1852. The original edition had 15,000 words, and each new edition has been larger...
, was compiled in 1805 by Peter Mark Roget, and published in 1852. Entries in Roget's Thesaurus are listed conceptually rather than alphabetically.
Although including synonyms, a thesaurus should not be taken as a complete list of all the synonyms for a particular word. The entries are also designed for drawing distinctions between similar words and assisting in choosing exactly the right word. Unlike a dictionary
Dictionary
A dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
, a thesaurus entry does not give the definition of words.
The word "thesaurus" is derived from 16th-century New Latin
New Latin
The term New Latin, or Neo-Latin, is used to describe the Latin language used in original works created between c. 1500 and c. 1900. Among other uses, Latin during this period was employed in scholarly and scientific publications...
, in turn from Latin
Latin
Latin is an Italic language originally spoken in Latium and Ancient Rome. It, along with most European languages, is a descendant of the ancient Proto-Indo-European language. Although it is considered a dead language, a number of scholars and members of the Christian clergy speak it fluently, and...
thesaurus, which is the latinisation
Latinisation (literature)
Latinisation is the practice of rendering a non-Latin name in a Latin style. It is commonly met with for historical personal names, with toponyms, or for the standard binomial nomenclature of the life sciences. It goes further than Romanisation, which is the writing of a word in the Latin alphabet...
of the Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...
(thēsauros), literally "treasure store", generally meaning a collection of things which are of big importance or value (and thus the medieval rank of thesaurer was a synonym for treasurer
Treasurer
A treasurer is the person responsible for running the treasury of an organization. The adjective for a treasurer is normally "tresorial". The adjective "treasurial" normally means pertaining to a treasury, rather than the treasurer.-Government:...
). This meaning has been largely supplanted by Roget's usage of the term.
Thesauri in IT
In Information ScienceInformation science
-Introduction:Information science is an interdisciplinary science primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information...
, Library Science
Library science
Library science is an interdisciplinary or multidisciplinary field that applies the practices, perspectives, and tools of management, information technology, education, and other areas to libraries; the collection, organization, preservation, and dissemination of information resources; and the...
, and Information Technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...
, specialized thesauri are designed for information retrieval. They are a type of controlled vocabulary
Controlled vocabulary
Controlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems...
, for indexing or tagging purposes. Such a thesaurus can be used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the Canadian
Information retrieval thesauri are formally organized so that existing relationships between concepts are made explicit. As a result, they are more complex than simpler controlled vocabularies such as authority lists and synonym ring
Synonym ring
In metadata a synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. These data elements are frequently found in different metadata registries...
s. Each term is placed in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. Following international standards, they are generally arranged hierarchically by themes, topics or facets. Unlike a literary thesaurus, these specialized thesauri typically focus on one discipline, subject or field of study.
In information technology
Information technology
Information technology is the acquisition, processing, storage and dissemination of vocal, pictorial, textual and numerical information by a microelectronics-based combination of computing and telecommunications...
, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
, a thesaurus may sometimes be referred to as an ontology.
Thesauri for information retrieval are typically constructed by information specialists, and have their own unique vocabulary defining different kinds of terms and relationships:
Terms
Terminology
Terminology is the study of terms and their use. Terms are words and compound words that in specific contexts are given specific meanings, meanings that may deviate from the meaning the same words have in other contexts and in everyday language. The discipline Terminology studies among other...
are the basic semantic units for conveying concept
Concept
The word concept is used in ordinary language as well as in almost all academic disciplines. Particularly in philosophy, psychology and cognitive sciences the term is much used and much discussed. WordNet defines concept: "conception, construct ". However, the meaning of the term concept is much...
s. They are usually single-word noun
Noun
In linguistics, a noun is a member of a large, open lexical category whose members can occur as the main word in the subject of a clause, the object of a verb, or the object of a preposition .Lexical categories are defined in terms of how their members combine with other kinds of...
s, since nouns are the most concrete part of speech. Verbs can be converted to nouns – "cleans" to "cleaning", "reads" to "reading", and so on. Adjectives and adverbs, however, seldom convey any meaning useful for indexing. When a term is ambiguous
Ambiguity
Ambiguity of words or phrases is the ability to express more than one interpretation. It is distinct from vagueness, which is a statement about the lack of precision contained or available in the information.Context may play a role in resolving ambiguity...
, a “scope note” can be added to ensure consistency, and give direction on how to interpret the term. Not every term needs a scope note, but their presence is of considerable help in using a thesaurus correctly and reaching a correct understanding of the given field of knowledge.
"Term relationships" are links between terms. These relationships can be divided into three types: hierarchical, equivalency or associative.
- Hierarchical relationships are used to indicate terms which are narrower and broader in scope. A "Broader Term" (BT) or hyperonym is a more general term, e.g. “Apparatus” is a generalization of “Computers”. Reciprocally, a Narrower Term (NT) or hyponym is a more specific term, e.g. “Digital Computer” is a specialization of “Computer”. BT and NT are reciprocals; a broader term necessarily implies at least one other term which is narrower. BT and NT are used to indicate class relationships, as well as part-whole relationships (meronyms and holonyms).
- The equivalency relationship is used primarily to connect synonyms and near-synonyms. Use (USE) and Used For (UF) indicators are used when an authorized term is to be used for another, unauthorized, term; for example, the entry for the authorized term "Frequency" could have the indicator "UF Pitch". Reciprocally, the entry for the unauthorized term "Pitch" would have the indicator "USE Frequency". Unauthorized terms are often called "entry vocabulary", "entry points", "lead-in terms", or "non-preferred terms", pointing to the authorized term (also referred to as the Preferred Term or Descriptor) that has been chosen to stand for the concept. As such, their presence in text can be use by automated indexing software to suggest the Preferred Term being used as an Indexing Term.
- Associative relationships are used to connect two related terms whose relationship is neither hierarchical nor equivalent. This relationship is described by the indicator "Related Term" (RT). Associative relationships should be applied with caution, since excessive use of RTs will reduce specificity in searches. Consider the following: if the typical user is searching with term "A", would they also want resources tagged with term "B"? If the answer is no, then an associative relationship should not be established.
Literary thesauri
- Thesaurus of English Words & Phrases (ed. P. Roget); ISBN 0-06-272037-6, see: Roget's ThesaurusRoget's ThesaurusRoget's Thesaurus is a widely-used English language thesaurus, created by Dr. Peter Mark Roget in 1805 and released to the public on 29 April 1852. The original edition had 15,000 words, and each new edition has been larger...
. - World Thesaurus (ed. C. Laird); ISBN 0-671-51983-2. This edition has been used in successive editions since 1971 by Webster's:
}
- Oxford American Desk Thesaurus (ed. C. Lindberg); ISBN 0-19-512674-2
- Oxford Paperback Thesaurus: Third Edition; ISBN 978-0-19-861425-8
- Random House Word Menu by Stephen Glazier; ISBN 0-679-40030-3
- Historical Thesaurus of EnglishHistorical Thesaurus of EnglishThe Historical Thesaurus of the Oxford English Dictionary is the largest thesaurus in the world, conceived and compiled by the English Language Department of the University of Glasgow. The HTOED is a complete database of all the words in the second edition of The Oxford English Dictionary,...
(HTE), http://www.arts.gla.ac.uk/SESLL/EngLang/thesaur/toe1.htm - WordNetWordNetWordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets...
- OpenThesaurusOpenThesaurusOpenThesaurus is an open source thesaurus project whose data is available under the GNU Lesser General Public License. It can be used directly online and with a free account users that are logged in can also add and alter entries. All entries have to be checked at least once before a release is made...
- The Well-Spoken ThesaurusThe Well-Spoken ThesaurusThe Well-Spoken Thesaurus by Tom Heehler , is an American style guide and speaking aid. The Chicago Tribune calls The Well-Spoken Thesaurus "a celebration of the spoken word." The book has also been reviewed in the Winnipeg Free Press, and by bloggers at the Fayetteville Observer, and the Seattle...
by Tom Heehler; ISBN 978-1402243059
Specialized thesauri for information retrieval
- NAL Agricultural Thesaurus, (United States National Agricultural LibraryUnited States National Agricultural LibraryThe United States National Agricultural Library is one of the world's largest agricultural research libraries, and serves as a National Library of the United States and as the library of the United States Department of Agriculture...
, United States Department of AgricultureUnited States Department of AgricultureThe United States Department of Agriculture is the United States federal executive department responsible for developing and executing U.S. federal government policy on farming, agriculture, and food...
) - European Thesaurus on International Relations and Area StudiesEuropean Thesaurus on International Relations and Area StudiesThe European Thesaurus on International Relations and Area Studies is a multilingual, interdisciplinary thesaurus covering the subject fields of International Relations and Area Studies. The European Thesaurus consists of about 8.200 descriptors organised in 24 subdomains...
; ISBN 978-3-927674-11-0 - Evaluation Thesaurus (by. M. Scriven); ISBN 0-8039-4364-4
- Thesaurus of Psychological Index Terms (APA); ISBN 1-55798-775-0
- Clinician's Thesaurus, (by E.Zuckerman); ISBN 1-57230-569-X
- Art and Architecture Thesaurus, (Getty Institute)
- EurovocEurovocEurovoc is a multilingual thesaurus maintained by the Publications Office of the European Union. It exists in 22 official languages of the European Union , as well as Basque, Catalan,...
Thesaurus, (Europa Publications Office) - AGROVOCAGROVOCAGROVOC was first developed in the 1980s as a multilingual structured thesaurus for all subject fields in agriculture, forestry, fisheries, food and related domains . Its main purpose was to standardize the indexing process for the AGRIS database in order to make searching simpler and more...
Thesaurus, (Food and Agriculture OrganizationFood and Agriculture OrganizationThe Food and Agriculture Organization of the United Nations is a specialised agency of the United Nations that leads international efforts to defeat hunger. Serving both developed and developing countries, FAO acts as a neutral forum where all nations meet as equals to negotiate agreements and...
of the United NationsUnited NationsThe United Nations is an international organization whose stated aims are facilitating cooperation in international law, international security, economic development, social progress, human rights, and achievement of world peace...
) - GEMET - GEneral Multilingual Environmental Thesaurus, (European Environment AgencyEuropean Environment AgencyEuropean Environment Agency is an agency of the European Union. Its task is to provide sound, independent information on the environment. It is a major information source for those involved in developing, adopting, implementing and evaluating environmental policy, and also the general public...
) - Medical Subject HeadingsMedical Subject HeadingsMedical Subject Headings is a comprehensive controlled vocabulary for the purpose of indexing journal articles and books in the life sciences; it can also serve as a thesaurus that facilitates searching...
, (United States National Library of MedicineUnited States National Library of MedicineThe United States National Library of Medicine , operated by the United States federal government, is the world's largest medical library. Located in Bethesda, Maryland, the NLM is a division of the National Institutes of Health...
) - Global Legal Information NetworkGlobal Legal Information NetworkThe Global Legal Information Network is a cooperative, not-for-profit federation of government agencies or their designees that contribute national legal information to the GLIN database. It is an automated database of statutes, regulations and related material that originate from countries in the...
Thesaurus, GLIN Subject Term Index
Standards and manuals
The ANSI/NISO Z39.19 Standard of 2005 defines guidelines and conventions for the format, construction, testing, maintenance, and management of monolingual controlled vocabularies including lists, synonym rings, taxonomies, and thesauruses.For multilingual vocabularies, the ISO 5964 Guidelines for the establishment and development of multilingual thesauri can be applied.
Thesaurus Construction and Use: a practical manual. Jean Aitchison, Allan Gilchrist and David Bawden. London and New York: Europa Publications (2000).
See also
- AGRISAGRISAGRIS is a global public domain Database with 2.6 million structured bibliographical records on agricultural science and technology. The Database is maintained by FAO, and its content is provided by more than 150 participating institutions from 65 countries...
- Controlled vocabularyControlled vocabularyControlled vocabularies provide a way to organize knowledge for subsequent retrieval. They are used in subject indexing schemes, subject headings, thesauri, taxonomies and other form of knowledge organization systems...
- DictionaryDictionaryA dictionary is a collection of words in one or more specific languages, often listed alphabetically, with usage information, definitions, etymologies, phonetics, pronunciations, and other information; or a book of words in one language with their equivalents in another, also known as a lexicon...
- Knowledge Organization SystemsKnowledge Organization SystemsKnowledge Organization Systems is a generic term used in Knowledge organization about authority lists, classification systems, thesauri, topic maps, ontologies etc.-See also:*Controlled vocabulary*Ontology...
- Ontology (computer science)Ontology (computer science)In computer science and information science, an ontology formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts. It can be used to reason about the entities within that domain and may be used to describe the domain.In theory, an ontology is...
- Simple Knowledge Organisation System
External links
- Aiksaurus: open source and online thesaurus
- Asadz Online Thesaurus
- Macmillan Dictionary thesaurus
- Online thesaurus based on the OpenOffice.orgOpenOffice.orgOpenOffice.org, commonly known as OOo or OpenOffice, is an open-source application suite whose main components are for word processing, spreadsheets, presentations, graphics, and databases. OpenOffice is available for a number of different computer operating systems, is distributed as free software...
spell checker HunspellHunspellHunspell is a spell checker and morphological analyzer designed for languages with rich morphology and complex word compounding and character encoding, originally designed for the Hungarian language.... - Snappy Words Free English Dictionary and Thesaurus
- Sinonimi: open source online thesaurus
- Synonym Finder
- TemaTres: open source thesaurus management
- Thesaurus Builder: full multilingual thesaurus management software
- Thesaurus.com
- Thesaurus.net
- How to say big online synonym finder
- United Dictionary User Submitted Thesaurus
- Yahoo!Education: Thesaurus
- voting-based thesaurus with extra semantic relations and word definitions