Edit distance
Encyclopedia
In information theory
Information theory
Information theory is a branch of applied mathematics and electrical engineering involving the quantification of information. Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and...

 and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

, the edit distance between two strings of characters
String (computer science)
In formal languages, which are used in mathematical logic and theoretical computer science, a string is a finite sequence of symbols that are chosen from a set or alphabet....

 generally refers to the Levenshtein distance
Levenshtein distance
In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...

. However, according to Nico Jacobs, “The term ‘edit distance’ is sometimes used to refer to the distance in which insertions and deletions have equal cost and replacements have twice the cost of an insertion”.

It may also refer to the whole class of string metric
String metric
In mathematics, string metrics are a class of metric that measure similarity or dissimilarity between two text strings for approximate string matching or comparison and in fuzzy string searching. For example the strings "Sam" and "Samuel" can be considered to be similar...

s that measure distance as the (weighted or unweighted) number of operations required to transform a string into another. There are several different ways to define an edit distance, depending on which edit operations are allowed: replace, delete, insert, transpose, and so on. There are algorithms to calculate its value under various definitions:
  • Hamming distance
    Hamming distance
    In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different...

  • Levenshtein distance
    Levenshtein distance
    In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...

     (the most common definition, calculated by Hirschberg's algorithm
    Hirschberg's algorithm
    Hirschberg's algorithm, named after its inventor, Dan Hirschberg, is a dynamic programming algorithm that finds the least cost sequence alignment between two strings, where cost is measured as Levenshtein distance, defined to be the sum of the costs of insertions, replacements, deletions, and null...

     or the Wagner–Fischer algorithm)
  • Damerau–Levenshtein distance
  • Jaro–Winkler distance

See also

  • Approximate string matching
    Approximate string matching
    In computing, approximate string matching is the technique of finding strings that match a pattern approximately...

  • Levenshtein distance
    Levenshtein distance
    In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences...

  • Longest common subsequence problem
    Longest common subsequence problem
    The longest common subsequence problem is to find the longest subsequence common to all sequences in a set of sequences . Note that subsequence is different from a substring, see substring vs. subsequence...

  • String-to-string correction problem
    String-to-string correction problem
    The string-to-string correction problem refers to the minimum number of edit operations necessary to change one string into another. A single edit operation may be changing a single symbol of the string into another, deleting, or inserting a symbol...

  • String metric
    String metric
    In mathematics, string metrics are a class of metric that measure similarity or dissimilarity between two text strings for approximate string matching or comparison and in fuzzy string searching. For example the strings "Sam" and "Samuel" can be considered to be similar...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK