Phonetic algorithm
Encyclopedia
A phonetic algorithm is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...

 for indexing
Index (publishing)
An index is a list of words or phrases and associated pointers to where useful material relating to that heading can be found in a document...

 of word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...

s by their pronunciation
Pronunciation
Pronunciation refers to the way a word or a language is spoken, or the manner in which someone utters a word. If one is said to have "correct pronunciation", then it refers to both within a particular dialect....

. Most phonetic algorithms were developed for use with the English language
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

; consequently, applying the rules to words in other languages might not give a meaningful result.

They are necessarily complex algorithms with many rules and exceptions, because English spelling
Spelling
Spelling is the writing of one or more words with letters and diacritics. In addition, the term often, but not always, means an accepted standard spelling or the process of naming the letters...

 and pronunciation is complicated by historical changes in pronunciation and words borrowed
Loanword
A loanword is a word borrowed from a donor language and incorporated into a recipient language. By contrast, a calque or loan translation is a related concept where the meaning or idiom is borrowed rather than the lexical item itself. The word loanword is itself a calque of the German Lehnwort,...

 from many language
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

s.

Among the best-known phonetic algorithms are:
  • Soundex
    Soundex
    Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless...

    , which was developed to encode surnames for use in censuses. Soundex codes are four-character strings composed of a single letter followed by three numbers.
  • Daitch–Mokotoff Soundex, which is a refinement of Soundex designed to better match surnames of Slavic and Germanic origin. Daitch–Mokotoff Soundex codes are strings composed of six numeric digits. Kölner Phonetik: This is similar to Soundex, but more suitable for German words.
  • Metaphone
    Metaphone
    Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate...

     and Double Metaphone, which is suitable for use with most English words, not just names. Metaphone algorithms are the basis for many popular spell checkers.
  • Miracode -
  • New York State Identification and Intelligence System
    New York State Identification and Intelligence System
    The New York State Identification and Intelligence System Phonetic Code, commonly known as NYSIIS, is a phonetic algorithm devised in 1970 as part of the New York State Identification and Intelligence System...

     (NYSIIS), which maps similar phonemes to the same letter. The result is a string that can be pronounced by the reader without decoding.
  • Match Rating Approach
    Match Rating Approach
    A phonetic algorithm developed by Western Airlines in 1977 for the indexation and comparison of homophonous names.The algorithm itself has a simple set of encoding rules but a more lengthy set of comparison rules....

     developed by Western Airlines in 1977 - this algorithm has an encoding and range comparison technique.
  • Caverphone
    Caverphone
    The Caverphone phonetic matching algorithm was created by David Hood in the Caversham Project at the University of Otago in New Zealand in 2002. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a...

    , created to assist in data matching between late 19th century and early 20th century electoral rolls, optimized for accents present in parts of New Zealand.

Common Uses

  • Spell checkers can often contain phonetic algorithms. The Metaphone
    Metaphone
    Metaphone is a phonetic algorithm, an algorithm published in 1990 for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English spelling and pronunciation to produce a more accurate...

     algorithm, for example, can take an incorrectly spelt word and create a code. The code is then looked up in directory for words with the same or similar Metaphone. Words that have the same or similar Metaphone become possible alternative spellings.
  • Search
    Search engine technology
    Modern web search engines are complex software systems using the technology that has evolved over the years. There are several categories of search engine software: Web search engines , database or structured data search engines , and mixed search engines or enterprise search...

     functionality will often use phonetic algorithms to find results that don't match exactly the term(s) used in the search. Searching for names can be difficult as there are often multiple alternative spellings for names. An example is the name Claire. It has two alternatives, Clare/Clair, which are all pronounced the same. Searching for one spelling wouldn't show results for the two others. Using Soundex
    Soundex
    Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly encodes consonants; a vowel will not be encoded unless...

     all three variations produce the same Soundex code, C460. By searching names based on the Soundex code all three variations will be returned.

See also

  • Approximate string matching
    Approximate string matching
    In computing, approximate string matching is the technique of finding strings that match a pattern approximately...

  • Hamming distance
    Hamming distance
    In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...

  • Levenshtein distance
    Levenshtein distance
    In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...

  • Damerau–Levenshtein distance
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK