top of page

Unicode and Windows

In order for there to be data exchange between computer systems created by divergent programmers from different regions of the world, characters need to be encoded using a common standard that everyone knows to reference. Unicode is the most widely used system for assigning codes to the text characters and symbols used in English and thousands of other languages. Unicode is the product of the Unicode Consortium which continues to upgrade the code with new languages and formatting characters. There is a single code for the Latin alphabet that is used for Spanish, English, German, and many other Western languages. Greek and Cyrillic have their alphabets assigned unique codes, but some characters used in common in the Chinese, Japanese and Korean languages share the same unicode designations.

Most people working in North America, Europe and Australia will only need to refer to the Basic Latin set and the Latin supplement set. Note that the Basic Latin set is the same as the ASCII hexadecimal encoding set. Unicode surpassed ASCII as the most widely used character encoding set about 8 years ago. The Windows-1252 encoding of the Latin alphabet largely follows Unicode, but MS Office applications are set-up to reference the initial decimal ASCII table.

For example in MS Word, if you go to the Insert tab and select 'Symbol' in order to add a section symbol, §, for a reference in a statutory code, you will see a reference to the Unicode, 00A7, in the 'character code' box. (See Fig. 1) Figure 2 shows where this symbol appears in the Latin supplement set on the Unicode site. [Note that in ASCII § would be 167 in the ASCII decimal set, 245 in the octet (eight byte), but A7 in the hexadecimal set. ASCII got more complex over time. Unicode puts § in binary as 11000010 10100111 - the ultimate goal is to have binary for all characters from all languages]. If you switch in the Symbol dialog box to ASCII decimal, you come with 167. These values may be the ones most helpful to reference in Word and Excel. For example in Word, you can enter a section symbol by holding down the ALT key, and on the numeric keypad (with NUM LOCK activated) pressing 0245, § will appear when ALT is released. But if you enter a Unicode and then press ALT + X, the Unicode character will appear. In Excel, the CHAR formula is set-up to reference the decimal ASCII values. So as shown in Figure 3, the formula =CHAR(167) brings up §.

bottom of page