Concordancer
Encyclopedia
A concordancer is a computer program
that automatically constructs a concordance
. The output of a concordancer may serve as input to a translation memory
system for computer-assisted translation
, or as an early step in machine translation
.
Concordancers are also used in corpus linguistics
to retrieve alphabetically or otherwise sorted lists of linguistic data from the corpus in question, which the corpus linguist
then analyzes. Some concordancers used in corpus linguistics
are AntConc
(freeware), ApSIC Xbench, WordSmith
, MonoConc, GlossaNet/Unitex (open-source free software), AdTAT(free software developed by The University of Adelaide), CorpusEye, and Linguistic Toolbox (freeware). The latter has an integrated part-of-speech tagger that allows the user creating his/her own pos-annotated corpora to conduct various type of searches adopted in corpus linguistics.
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
that automatically constructs a concordance
Concordance (publishing)
A concordance is an alphabetical list of the principal words used in a book or body of work, with their immediate contexts. Because of the time and difficulty and expense involved in creating a concordance in the pre-computer era, only works of special importance, such as the Vedas, Bible, Qur'an...
. The output of a concordancer may serve as input to a translation memory
Translation memory
A translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
system for computer-assisted translation
Computer-assisted translation
Computer-assisted translation, computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process....
, or as an early step in machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
.
Concordancers are also used in corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
to retrieve alphabetically or otherwise sorted lists of linguistic data from the corpus in question, which the corpus linguist
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
then analyzes. Some concordancers used in corpus linguistics
Corpus linguistics
Corpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
are AntConc
AntConc
AntConc is a UNICODE compliant freeware concordance program for Windows, Mac OS X, and Linux systems developed by Laurence Anthony of Waseda University, Japan. AntConc can generate KWIC concordance lines and concordance distribution plots...
(freeware), ApSIC Xbench, WordSmith
WordSmith
WordSmith Tools is a collection of corpus linguistics tools for looking for patterns in a language. The software was devised by Mike Scott at the University of Liverpool and for versions 1 to 4 was sold by Oxford University Press...
, MonoConc, GlossaNet/Unitex (open-source free software), AdTAT(free software developed by The University of Adelaide), CorpusEye, and Linguistic Toolbox (freeware). The latter has an integrated part-of-speech tagger that allows the user creating his/her own pos-annotated corpora to conduct various type of searches adopted in corpus linguistics.
See also
- Cross-referenceCross-referenceA cross-reference is an instance within a document which refers to related or synonymous information elsewhere, usually within the same work. To cross-reference or to cross-refer is to make such connections. The term "cross-reference" is often abbreviated as x-ref, xref, or, in computer science,...
- CtagsCtagsCtags is a program that generates an index file of names found in source and header files of various programming languages.Depending on the language,functions,variables,class members,macros and so onmay be indexed....
- KWIC
- Language industryLanguage industryThe language industry is the sector of activity dedicated to designing, producing, and marketing tools, products, or services related to computerized language processing...
External links
- Glossanet/Unitex (LGPL/LGPLLR license)
- AdTAT
- MonoConc (Commercial license)
- ApSIC Xbench