Caverphone
Encyclopedia
The Caverphone phonetic matching algorithm was created by David Hood in the Caversham Project at the University of Otago
University of Otago
The University of Otago in Dunedin is New Zealand's oldest university with over 22,000 students enrolled during 2010.The university has New Zealand's highest average research quality and in New Zealand is second only to the University of Auckland in the number of A rated academic researchers it...

 in New Zealand
New Zealand
New Zealand is an island country in the south-western Pacific Ocean comprising two main landmasses and numerous smaller islands. The country is situated some east of Australia across the Tasman Sea, and roughly south of the Pacific island nations of New Caledonia, Fiji, and Tonga...

 in 2002. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches). The algorithm is optimised for accents present in the study area (southern part of the city of Dunedin
Dunedin
Dunedin is the second-largest city in the South Island of New Zealand, and the principal city of the Otago Region. It is considered to be one of the four main urban centres of New Zealand for historic, cultural, and geographic reasons. Dunedin was the largest city by territorial land area until...

, New Zealand).

The rules of the algorithm are applied consecutively to any particular name, as a series of replacements.

The exact algorithm is as follows:
  1. Convert to lowercase
  2. Remove anything not A-Z
  3. If the name starts with
    1. cough make it cou2f
    2. rough make it rou2f
    3. tough make it tou2f
    4. enough make it enou2f
    5. gn make it 2n
  4. If the name ends with
    1. mb make it m2
  5. Replace
    1. cq with 2q
    2. ci with si
    3. ce with se
    4. cy with sy
    5. tch with 2ch
    6. c with k
    7. q with k
    8. x with k
    9. v with f
    10. dg with 2g
    11. tio with sio
    12. tia with sia
    13. d with t
    14. ph with fh
    15. b with p
    16. sh with s2
    17. z with s
    18. any initial vowel with an A
    19. all other vowels with a 3
    20. 3gh3 with 3kh3
    21. gh with 22
    22. g with k
    23. groups of the letter s with a S
    24. groups of the letter t with a T
    25. groups of the letter p with a P
    26. groups of the letter k with a K
    27. groups of the letter f with a F
    28. groups of the letter m with a M
    29. groups of the letter n with a N
    30. w3 with W3
    31. wy with Wy
    32. wh3 with Wh3
    33. why with Why
    34. w with 2
    35. any initial h with an A
    36. all other occurrences of h with a 2
    37. r3 with R3
    38. ry with Ry
    39. r with 2
    40. l3 with L3
    41. ly with Ly
    42. l with 2
    43. j with y
    44. y3 with Y3
    45. y with 2
  6. remove all
    1. 2s
    2. 3s
  7. put six 1s on the end
  8. take the first six characters as the code

Examples


Lee -> lee
lee -> l33
l33 -> L33
L33 -> L
L -> L111111
L111111 -> L11111


Thompson -> thompson
thompson -> th3mps3n
th3mps3n -> th3mpS3n
th3mpS3n -> Th3mpS3n
Th3mpS3n -> Th3mPS3n
Th3mPS3n -> Th3MPS3n
Th3MPS3n -> Th3MPS3N
Th3MPS3N -> T23MPS3N
T23MPS3N -> TMPSN
TMPSN111111 -> TMPSN1

External links

  • Caversham Project http://caversham.otago.ac.nz/
  • Original (2002) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp060902.pdf
  • Revised (2004) Caverphone algorithm http://caversham.otago.ac.nz/files/working/ctp150804.pdf
  • Implementation in the Apache Commons Codec project
  • C# Revised Implementation: http://sounditout.codeplex.com/
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK