International Components for Unicode
Encyclopedia
International Components for Unicode (ICU) is an open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

 project of mature C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....

/C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...

 and Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...

 libraries for Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 support, software internationalization
Internationalization
In economics, internationalization has been viewed as a process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization or international entrepreneurship...

 and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C, C++ and Java software. The ICU project is an open source development project that is sponsored, supported and used by IBM and many other companies.

Some of the services that it provides are the following.
  • Text: Unicode
    Unicode
    Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

     text handling, full character properties and character set conversions
  • Analysis: Unicode regular expression
    Regular expression
    In computing, a regular expression provides a concise and flexible means for "matching" strings of text, such as particular characters, words, or patterns of characters. Abbreviations for "regular expression" include "regex" and "regexp"...

    s; full Unicode sets; character, word and line boundaries
  • Comparison: Language sensitive collation
    Collation
    Collation is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet...

     and searching
  • Transformations: normalization, upper/lowercase, script transliteration
    Transliteration
    Transliteration is a subset of the science of hermeneutics. It is a form of translation, and is the practice of converting a text from one script into another...

    s
  • Locales: Comprehensive locale
    Locale
    In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface...

     data and resource bundle architecture, via the Common Locale Data Repository
    Common Locale Data Repository
    The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is...

  • Complex Text Layout: Arabic, Hebrew, Indic and Thai
  • Time: Multi-calendar
    Calendar
    A calendar is a system of organizing days for social, religious, commercial, or administrative purposes. This is done by giving names to periods of time, typically days, weeks, months, and years. The name given to each day is known as a date. Periods in a calendar are usually, though not...

     and time zone
    Time zone
    A time zone is a region on Earth that has a uniform standard time for legal, commercial, and social purposes. In order for the same clock time to always correspond to the same portion of the day as the Earth rotates , different places on the Earth need to have different clock times...

  • Formatting and Parsing: dates, times, numbers, currencies, messages and rule based


ICU provides more extensive internationalization facilities than the standard libraries for C and C++.

Origin and Development

ICU is descended from C++ frameworks produced by Taligent
Taligent
Taligent was the name of an object-oriented operating system and the company dedicated to producing it...

 in the mid 1990s. Soon after Taligent became part of IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

 in early 1996, Sun Microsystems
Sun Microsystems
Sun Microsystems, Inc. was a company that sold :computers, computer components, :computer software, and :information technology services. Sun was founded on February 24, 1982...

 decided that Java, then in its infancy, "was missing international support. Taligent had great international technology, talented engineers, and a location about 100 meters from Sun's JavaSoft division in Cupertino, California. IBM arranged for Taligent's Text and International group to contribute international classes to Sun's Java Development Kit
Java Development Kit
The Java Development Kit is an Oracle Corporation product aimed at Java developers. Since the introduction of Java, it has been by far the most widely used Java SDK. On 17 November 2006, Sun announced that it would be released under the GNU General Public License , thus making it free software...

."
Some of the code for text processing, date formatting, etc., was rewritten in Java and became the JDK 1.1 internationalization APIs. A large portion of this code still exists in the and packages. Further internationalization features were added with each later release of Java.

IBM programmers then rewrote the Java internationalization classes in C++ and later ported some classes to C functions. The C++/C version of ICU is known as ICU4C. The ICU project also provides ICU4J ("ICU for Java"), which adds features not present in the standard Java libraries. ICU4C and ICU4J are kept as similar as possible, though not identical. For example, ICU4C includes a Regular Expression API. Both frameworks have been enhanced over time to support new facilities and new features of Unicode and CLDR
Common Locale Data Repository
The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is...

. ICU was released as an open source project in 1999 under the name "IBM Classes for Unicode." It was later renamed to "International Components For Unicode."

See also

  • Uniscribe
    Uniscribe
    Uniscribe is the Microsoft Windows set of services for rendering Unicode-encoded text, especially complex text layout. They are implemented in the DLL USP10.DLL. USP10.dll became available to the public with Windows 2000 and Internet Explorer 5.0...

  • OpenType
    OpenType
    OpenType is a format for scalable computer fonts. It was built on its predecessor TrueType, retaining TrueType's basic structure and adding many intricate data structures for prescribing typographic behavior...

  • Apple Type Services for Unicode Imaging
    Apple Type Services for Unicode Imaging
    The Apple Type Services for Unicode Imaging is the set of services for rendering Unicode-encoded text starting with Mac OS 8.5 and in Mac OS X.It replaced the WorldScript engine for legacy encodings....

  • Apple Advanced Typography
    Apple Advanced Typography
    Apple Advanced Typography is Apple Inc's computer software for advanced font rendering, supporting internationalization and complex features for typographers, a successor to Apple's little-used QuickDraw GX font technology of the mid-1990s...

  • Pango
    Pango
    Pango is an LGPL licensed open source computing library used by software developers for laying out and rendering text in high quality, emphasising support for multilingual text...

  • Graphite (SIL)
    Graphite (SIL)
    Graphite is a programmable Unicode-compliant smart-font technology and rendering system developed by SIL International. It is free software, distributed under the terms of the GNU Lesser General Public License and the Common Public License....

  • GNU GetText

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK