You say UNICODE, I say UNI-what? Posted By Adam Reilly on July 13, 2009

An old Apple computer walks into a Chinese bar…

The old Chinese bartender computer yells “我們這裡不提供水果!”
The old Apple computer is confused…

Ancient Symbols


As information flows over cultural boundaries, issues of text encoding have become increasingly important.  While I will not attempt to capture the full history of text encoding, here are the over-simplified facts:

When data needs to be shared between different machines, the shortcomings become apparent.  Sensible data generated by one implementation mapped to different characters on another.  The old Apple computer was ‘confused’ because it received a set of numbers from the bartender which it literally reads as ”§Ú­Ì³o¸Ì¤£´£¨Ñ¤ôªG!”  The disconnect arises because the computers are communicating via incompatible legacy encodings (In this case, Big5 Traditional Chinese and ISO-8859-1 Western European).

The Punchline

If both of the computers had been ‘speaking’ Unicode, the old Apple would have realized that the old Chinese computer had yelled:

“We don’t serve fruit here!” in Chinese.

To which he could have unambiguously replied:

“That’s ok, I just came in for a drink”

Unicode and it’s variants were conceived to solve this problem.  Rather than worry about reuse of the same code space, Unicode’s designer’s decided to take a much larger space and divide it into chunks that could be assigned to different cultures.  This way, Unicode character streams can move freely between systems without ambiguity.

However contrived this scenario sounds, it does represent a potential problem in data interchange.  Computers have spoken non-Unicode long enough to effect Internet architecture, file formats and legacy data.  We’ll explore some of these issues in additional posts and also why being Unicode-aware doesn’t necessarily equate to foreign language processing.

Comments

Post A Comment

Categories

Jul 2010

S M T W T F S
       1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Sign me up for Logik news!