Internationalized Resource Identifier
Encyclopedia
On the Internet
, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier
(URI). While URIs are limited to a subset of the ASCII
character set, IRIs may contain characters from the Universal Character Set
(Unicode/ISO 10646), including Chinese or Japanese kanji
, Korean, Cyrillic
characters, and so forth. It is defined by RFC 3987.
system more worldly and accessible.
URI
s can make it much easier to do phishing
attacks that trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay
.com or www.paypal
.com with an internationalized look-alike "a" character, and point that IRI to a malicious site. This is known as an IDN homograph attack
.
While a URI does not provide people with a way to specify Web resources using their own alphabets, an IRI does not make clear how Web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters.
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
(URI). While URIs are limited to a subset of the ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
character set, IRIs may contain characters from the Universal Character Set
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...
(Unicode/ISO 10646), including Chinese or Japanese kanji
Kanji
Kanji are the adopted logographic Chinese characters hanzi that are used in the modern Japanese writing system along with hiragana , katakana , Indo Arabic numerals, and the occasional use of the Latin alphabet...
, Korean, Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
characters, and so forth. It is defined by RFC 3987.
Advantages
There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A-Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the URIÚri
Úriis a village and commune in the comitatus of Pest in Hungary....
system more worldly and accessible.
Disadvantages
Mixing IRIs and ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
URI
Uniform Resource Identifier
In computing, a uniform resource identifier is a string of characters used to identify a name or a resource on the Internet. Such identification enables interaction with representations of the resource over a network using specific protocols...
s can make it much easier to do phishing
Phishing
Phishing is a way of attempting to acquire information such as usernames, passwords, and credit card details by masquerading as a trustworthy entity in an electronic communication. Communications purporting to be from popular social web sites, auction sites, online payment processors or IT...
attacks that trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay
EBay
eBay Inc. is an American internet consumer-to-consumer corporation that manages eBay.com, an online auction and shopping website in which people and businesses buy and sell a broad variety of goods and services worldwide...
.com or www.paypal
PayPal
PayPal is an American-based global e-commerce business allowing payments and money transfers to be made through the Internet. Online money transfers serve as electronic alternatives to paying with traditional paper methods, such as checks and money orders....
.com with an internationalized look-alike "a" character, and point that IRI to a malicious site. This is known as an IDN homograph attack
IDN homograph attack
The internationalized domain name homograph attack is a way a malicious party may deceive computer users about what remote system they are communicating with, by exploiting the fact that many different characters look alike,...
.
While a URI does not provide people with a way to specify Web resources using their own alphabets, an IRI does not make clear how Web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters.
See also
- XRI (Extensible Resource Identifier)
- IDNInternationalized domain nameAn internationalized domain name is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Russian, Hindi or the Latin alphabet-based characters with diacritics,...
(Internationalized Domain Name) - PunycodePunycodeIn computing, Punycode is an instance of a general encoding syntax by which a string of Unicode characters is transformed uniquely and reversibly into a smaller, restricted character set....