International Components for Unicode
Encyclopedia
International Components for Unicode (ICU) is an open source
project of mature C
/C++
and Java
libraries for Unicode
support, software internationalization
and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++ and Java software. The ICU project is an open source development project that is sponsored, supported and used by IBM and many other companies.
Some of the services that it provides are the following.
ICU provides more extensive internationalization facilities than the standard libraries for C and C++.
in the mid 1990s. Soon after Taligent became part of IBM
in early 1996, Sun Microsystems
decided that Java, then in its infancy, "was missing international support. Taligent had great international technology, talented engineers, and a location about 100 meters from Sun's JavaSoft division in Cupertino, California. IBM arranged for Taligent's Text and International group to contribute international classes to Sun's Java Development Kit
."
Some of the code for text processing, date formatting, etc., was rewritten in Java and became the JDK 1.1 internationalization APIs. A large portion of this code still exists in the and packages. Further internationalization features were added with each later release of Java.
IBM programmers then rewrote the Java internationalization classes in C++ and later ported some classes to C functions. The C++/C version of ICU is known as ICU4C. The ICU project also provides ICU4J ("ICU for Java"), which adds features not present in the standard Java libraries. ICU4C and ICU4J are kept as similar as possible, though not identical. For example, ICU4C includes a Regular Expression API. Both frameworks have been enhanced over time to support new facilities and new features of Unicode and CLDR
. ICU was released as an open source project in 1999 under the name "IBM Classes for Unicode." It was later renamed to "International Components For Unicode."
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
project of mature C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
/C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
libraries for Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
support, software internationalization
Internationalization
In economics, internationalization has been viewed as a process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization or international entrepreneurship...
and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++ and Java software. The ICU project is an open source development project that is sponsored, supported and used by IBM and many other companies.
Some of the services that it provides are the following.
- Text: UnicodeUnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
text handling, full character properties and character set conversions - Analysis: Unicode regular expressionRegular expressionIn computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...
s; full Unicode sets; character, word and line boundaries - Comparison: Language sensitive collationCollationCollation is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet...
and searching - Transformations: normalization, upper/lowercase, script transliterationTransliterationTransliteration is a subset of the science of hermeneutics. It is a form of translation, and is the practice of converting a text from one script into another...
s - Locales: Comprehensive localeLocaleIn computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...
data and resource bundle architecture, via the Common Locale Data RepositoryCommon Locale Data RepositoryThe Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is... - Complex Text Layout: Arabic, Hebrew, Indic and Thai
- Time: Multi-calendarCalendarA calendar is a system of organizing days for social, religious, commercial, or administrative purposes. This is done by giving names to periods of time, typically days, weeks, months, and years. The name given to each day is known as a date. Periods in a calendar are usually, though not...
and time zoneTime zoneA time zone is a region on Earth that has a uniform standard time for legal, commercial, and social purposes. In order for the same clock time to always correspond to the same portion of the day as the Earth rotates , different places on the Earth need to have different clock times... - Formatting and Parsing: dates, times, numbers, currencies, messages and rule based
ICU provides more extensive internationalization facilities than the standard libraries for C and C++.
Origin and Development
ICU is descended from C++ frameworks produced by TaligentTaligent
Taligent was the name of an object-oriented operating system and the company dedicated to producing it...
in the mid 1990s. Soon after Taligent became part of IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
in early 1996, Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...
decided that Java, then in its infancy, "was missing international support. Taligent had great international technology, talented engineers, and a location about 100 meters from Sun's JavaSoft division in Cupertino, California. IBM arranged for Taligent's Text and International group to contribute international classes to Sun's Java Development Kit
Java Development Kit
The Java Development Kit is an Oracle Corporation product aimed at Java developers. Since the introduction of Java, it has been by far the most widely used Java SDK. On 17 November 2006, Sun announced that it would be released under the GNU General Public License , thus making it free software...
."
Some of the code for text processing, date formatting, etc., was rewritten in Java and became the JDK 1.1 internationalization APIs. A large portion of this code still exists in the and packages. Further internationalization features were added with each later release of Java.
IBM programmers then rewrote the Java internationalization classes in C++ and later ported some classes to C functions. The C++/C version of ICU is known as ICU4C. The ICU project also provides ICU4J ("ICU for Java"), which adds features not present in the standard Java libraries. ICU4C and ICU4J are kept as similar as possible, though not identical. For example, ICU4C includes a Regular Expression API. Both frameworks have been enhanced over time to support new facilities and new features of Unicode and CLDR
Common Locale Data Repository
The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is...
. ICU was released as an open source project in 1999 under the name "IBM Classes for Unicode." It was later renamed to "International Components For Unicode."
See also
- UniscribeUniscribeUniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, especially complex text layout. They are implemented in the DLL USP10.DLL. USP10.dll became available to the public with Windows 2000 and Internet Explorer 5.0...
- OpenTypeOpenTypeOpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior...
- Apple Type Services for Unicode ImagingApple Type Services for Unicode ImagingThe Apple Type Services for Unicode Imaging is the set of services for rendering Unicode-encoded text starting with Mac OS 8.5 and in Mac OS X.It replaced the WorldScript engine for legacy encodings....
- Apple Advanced TypographyApple Advanced TypographyApple Advanced Typography is Apple Inc's computer software for advanced font rendering, supporting internationalization and complex features for typographers, a successor to Apple's little-used QuickDraw GX font technology of the mid-1990s...
- PangoPangoPango is an LGPL licensed open source computing library used by software developers for laying out and rendering text in high quality, emphasising support for multilingual text...
- Graphite (SIL)Graphite (SIL)Graphite is a programmable Unicode-compliant smart-font technology and rendering system developed by SIL International. It is free software, distributed under the terms of the GNU Lesser General Public License and the Common Public License....
- GNU GetText