What is Unicode?

What is Unicode?

First, we have to understand what character encoding is. Computer deals with numbers only. Thus, to let computer handle character, people assign a number to each character. The entire set of character/number mapping is called character encoding. The most common used encoding is called ASCII. e.g. 65 is assigned to letter 'A' in ASCII..

Hundreds of character encoding standard have been invented. Unfortunately, no single encoding could cover all the characters used by the people around the world. Another major problem is that all these character encoding may conflict with one another. i.e. The same number may represent different characters in different character encodings. Consequently, data may be corrupted when data is exchanged between computer systems.

To solve all these problems, The Unicode Consortium was formed to develop a unified character encoding - Unicode. Unicode cover all commonly used languages around the world, including Chinese, Japanese and Korean. Unicode assign a unique number (called Code Point) to every character. The encoding of a character remains the same as long as the platform and application conform to Unicode Standard. Over 70,000 ideographic characters have been covered in Unicode Standard Version 3.1.

See Also
Unicode Format
Little-Endian vs. Big-Endian