RSS
 

UNICODE vs. MBCS in Windows

17 Mar

Everyone seems to have a take on this encoding thing. Here is my 2 cents.

MBCS has been in use long before UNICODE was born. I view UNICODE was invented just because there are too many MBCS out there. I mean, really too many – different languages, different revisions within a language, etc. – talking about pressure to standardize! Actually, UNICODE enjoyed the maturity of MBCS, although itself has evolved and still has different encoding schemes.

Microsoft supports UNICODE from Windows NT. Windows 95/98/Me has what Microsoft calls “Layer for UNICODE” support. What that really means is that everything is MBCS/SBCS unless your application deals with UNICODE using special set of APIs. Well, nowadays, there are not many people still supporting Windows 9x with their new development anyway.

Microsoft’s UNICODE implementation is actually UTF-16, that is, every character is 2 bytes or 4 bytes (very rare). The reason for this is simple, at the time Microsoft had to decide which UNICODE scheme to support, there were only UCS-2 (2 bytes) and UCS-4 (4 bytes) to choose from. If every character is 4 bytes, you may think that is really a big waste of everything, even in today’s environment. So, they went with UCS-2, well, with some M(odification)s. Some may argue that UTF-8 might be a better choice, especially in Windows CE and alike. The problem is, Microsoft did not know there would be UTF-8, just like about most of the other goodies.

With all that said, I have to give Microsoft credits for supporting UNICODE in core. It really makes my life as a developer, a lot easier.

 
 

Leave a Reply

You must be logged in to post a comment.