top of page

Unicode and Windows


In order for there to be data exchange between computer systems created by divergent programmers from different regions of the world, characters need to be encoded using a common standard that everyone knows to reference. Unicode is the most widely used system for assigning codes to the text characters and symbols used in English and thousands of other languages. Unicode is the product of the Unicode Consortium which continues to upgrade the code with new languages and formatting characters. There is a single code for the Latin alphabet that is used for Spanish, English, German, and many other Western languages. Greek and Cyrillic have their alphabets assigned unique codes, but some characters used in common in the Chinese, Japanese and Korean languages share the same unicode designations.

Most people working in North America, Europe and Australia will only need to refer to the Basic Latin set and the Latin supplement set. Note that the Basic Latin set is the same as the ASCII hexadecimal encoding set. Unicode surpassed ASCII as the most widely used character encoding set about 8 years ago. The Windows-1252 encoding of the Latin alphabet largely follows Unicode, but MS Office applications are set-up to reference the initial decimal ASCII table.

For example in MS Word, if you go to the Insert tab and select 'Symbol' in order to add a section symbol, §, for a reference in a statutory code, you will see a reference to the Unicode, 00A7, in the 'character code' box. (See Fig. 1) Figure 2 shows where this symbol appears in the Latin supplement set on the Unicode site. [Note that in ASCII § would be 167 in the ASCII decimal set, 245 in the octet (eight byte), but A7 in the hexadecimal set. ASCII got more complex over time. Unicode puts § in binary as 11000010 10100111 - the ultimate goal is to have binary for all characters from all languages]. If you switch in the Symbol dialog box to ASCII decimal, you come with 167. These values may be the ones most helpful to reference in Word and Excel. For example in Word, you can enter a section symbol by holding down the ALT key, and on the numeric keypad (with NUM LOCK activated) pressing 0245, § will appear when ALT is released. But if you enter a Unicode and then press ALT + X, the Unicode character will appear. In Excel, the CHAR formula is set-up to reference the decimal ASCII values. So as shown in Figure 3, the formula =CHAR(167) brings up §.


Sean O'Shea has more than 20 years of experience in the litigation support field with major law firms in New York and San Francisco.   He is an ACEDS Certified eDiscovery Specialist and a Relativity Certified Administrator.

​

The views expressed in this blog are those of the owner and do not reflect the views or opinions of the owner’s employer.

​

If you have a question or comment about this blog, please make a submission using the form to the right. 

Your details were sent successfully!

© 2015 by Sean O'Shea . Proudly created with Wix.com

bottom of page