Lexical similarity
Encyclopedia
In linguistics
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....

, lexical similarity is a measure of the degree to which the word sets of two given language
Language
Language may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...

s are similar. A lexical similarity of 1 (or 100%) would mean a total overlap between vocabularies, whereas 0 means there are no common words.

There are different ways to define the lexical similarity and the results vary accordingly. For example, Ethnologue
Ethnologue
Ethnologue: Languages of the World is a web and print publication of SIL International , a Christian linguistic service organization, which studies lesser-known languages, to provide the speakers with Bibles in their native language and support their efforts in language development.The Ethnologue...

's method of calculation consists in comparing a standardized set of wordlists and counting those forms that show similarity in both form and meaning. Using such a method, English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

 was evaluated to have a lexical similarity of 60% with German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

 and 27% with French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

.

Lexical similarity can be used to evaluate the degree of genetic relationship
Language family
A language family is a group of languages related through descent from a common ancestor, called the proto-language of that family. The term 'family' comes from the tree model of language origination in historical linguistics, which makes use of a metaphor comparing languages to people in a...

 between two languages. Percentages higher than 85% usually indicate that the two languages being compared are likely to be related dialect
Dialect
The term dialect is used in two distinct ways, even by linguists. One usage refers to a variety of a language that is a characteristic of a particular group of the language's speakers. The term is applied most often to regional speech patterns, but a dialect may also be defined by other factors,...

s.

The lexical similarity is only one indication of the mutual intelligibility
Mutual intelligibility
In linguistics, mutual intelligibility is recognized as a relationship between languages or dialects in which speakers of different but related languages can readily understand each other without intentional study or extraordinary effort...

 of the two languages, since the latter also depends on the degree of phonetical, morphological, and syntactical similarity. It is worth noting that the variations due to differing wordlists weigh on this. For example, lexical similarity between French and English is considerable in lexical fields relating to culture, whereas their similarity is smaller as far as basic (function) words are concerned. Unlike mutual intelligibility, lexical similarity can only be symmetrical.

Indo-European languages

The table below shows some lexical similarity values for pairs of selected Romance, Germanic, and Slavic languages, as collected and published by Ethnologue
Ethnologue
Ethnologue: Languages of the World is a web and print publication of SIL International , a Christian linguistic service organization, which studies lesser-known languages, to provide the speakers with Bibles in their native language and support their efforts in language development.The Ethnologue...

.
Lang.
code
Language 1
| Lexical similarity coefficients
cat Catalan
Catalan language
Catalan is a Romance language, the national and only official language of Andorra and a co-official language in the Spanish autonomous communities of Catalonia, the Balearic Islands and Valencian Community, where it is known as Valencian , as well as in the city of Alghero, on the Italian island...

1 - - - 0.87 0.85 0.73 0.76 - 0.75 0.85
eng English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

- 1 0.27 0.60 - - - - 0.24 - -
fra French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

- 0.27 1 0.29 0.89 0.75 0.75 0.78 - 0.80 0.75
deu German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

- 0.60 0.29 1 - - - - - - -
ita Italian
Italian language
Italian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...

0.87 - 0.89 - 1 - 0.77 0.78 - 0.85 0.82
por Portuguese
Portuguese language
Portuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...

0.85 - 0.75 - - 1 0.72 0.74 - - 0.89
ron Romanian
Romanian language
Romanian Romanian Romanian (or Daco-Romanian; obsolete spellings Rumanian, Roumanian; self-designation: română, limba română ("the Romanian language") or românește (lit. "in Romanian") is a Romance language spoken by around 24 to 28 million people, primarily in Romania and Moldova...

0.73 - 0.75 - 0.77 0.72 1 0.72 - - 0.71
roh Romansh 0.76 - 0.78 - 0.78 0.74 0.72 1 - 0.74 0.74
rus Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

- 0.24 - - - - - - 1 - -
srd Sardinian
Sardinian language
Sardinian is a Romance language spoken and written on most of the island of Sardinia . It is considered the most conservative of the Romance languages in terms of phonology and is noted for its Paleosardinian substratum....

0.75 - 0.80 - 0.85 - - 0.74 - 1 0.76
spa Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...

0.85 - 0.75 - 0.82 0.89 0.71 0.74 - 0.76 1
Language 2 → cat eng fra deu ita por ron roh rus srd spa


Notes:
  • Language codes are from standard ISO 639-3
    ISO 639-3
    ISO 639-3:2007, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive coverage of languages, is an international standard for language codes in the ISO 639 series. The standard describes three‐letter codes for identifying languages. It extends the ISO 639-2...

    .
  • Ethnologue does not specify for which Sardinian
    Sardinian language
    Sardinian is a Romance language spoken and written on most of the island of Sardinia . It is considered the most conservative of the Romance languages in terms of phonology and is noted for its Paleosardinian substratum....

    variety the lexical similarity was calculated.
  • "-" denotes that comparison data are not available.
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK