Advertisement
Blogs
Advertisement

Entering Any Key

Mon, 07/13/2009 - 7:43am
John R. Joyce, Ph.D.

Entering Any Key

Special character entry using Windows



Windows Character Map
Figure 1: Windows Character Map

When entering text using Microsoft Windows, there will come a time when the key you are looking for is just not there. This isn't Window's fault, there's just a limited amount of room on the keyboard. This means that many characters, like the mythical Any Key, have to be left off. Fortunately, you are not reduced to a desperate search for other documents from which to copy and paste them. However, with many applications now using the Unicode standard to support a larger range of languages and characters, not all tools will work in all situations.

If you only need to enter the occasional odd character, such as a degree or cents symbol, it may be easiest to use Window's built in Character Map tool. This is usually located in the Accessories folder under Accessories >> System Tools. Assuming that you are using a relatively recent version of Windows, it is capable of displaying the DOS, Windows and Unicode character sets. While our interest is in using it to be able to insert non-keyboard characters into a document, it also provides the useful function of allowing you to see which characters are supported by a particular font.

The most simplistic way of using this tool is to
1. scroll through the characters displayed until you find the one you want
2. click on it
3. click on Select or simply double click on it.

This will add the character to the Characters to Copy field. You can add more than one character to this field — simply double click on the additional characters. Once all of the characters you want are selected, click on Copy to copy them to the Windows clip board. You can then select the target position in your document and click paste to insert the characters into your document. With the Advanced view option checked, you can also search for Unicode characters by Name or Group. Some programs, such as Wordpad, allow you to actually click and drag characters from Character Map into your other application.

Character combinations for degree symbol
Figure 2: Character combinations for degree symbol

If you don't want to resort to using a separate application, there are additional options built directly into the operating system. With current versions of Microsoft Windows, first toggle Num Lock on. Now, while keeping the Alt key depressed, type in the numeric value of the Unicode character using the number pad keys. This is obviously somewhat more restrictive, as most people are not inclined to memorize the four-digit code for all of the possible Unicode characters. However, if you frequently use just a few symbols, you might be inclined to learn, for example, that 0176 was the decimal code for the Degree Sign (°).

The potential gotcha here is that Microsoft has changed their encoding scheme several times over the years. In the original version of their Disk Operating System (DOS), they supported just the 128 character ASCII character set. With Windows 3.1, this was expanded to a 224-character set, frequently refereed to as the Windows ANSI character set, though it was never actually an official ANSI standard. This character set is also know as Code Page 1252 and Latin 1 Windows to differentiate it from the original proposed standard ISO/IEC 8859-1 which is also informally known as Latin 1. It incorporates support for many of the special characters used in Western European languages, but still omits many used in a variety of Eastern European languages. Languages without European roots weren’t even considered. Support for the ANSI character set was later included in DOS with the addition of their ANSI.SYS driver. In Windows 95 through Windows 98, Microsoft expand font support with what was called Windows Glyph List 4 (WGL4) to include 652 distinct characters. With Windows 2000 through Windows Vista, Microsoft has been including expanded support for what is known as Unicode. Supported by a variety of international standard bodies, this was originally conceived to support 65,000 characters. This initial implementation was designated as the Basic Multilingual Plane, sometimes referred to as just Plane 0.

BabelMap
Figure 3: BabelMap

As this is still insufficient to represent all known characters, Unicode was expanded to include 16 additional planes. Note that, while around 100,000 characters have been defined in Unicode, no single font exists which can display all of them. More detail on Microsoft’s evolving font support can be found in Brian Liningston’s and Paul Thurrott’s book Windows Vista Secrets from John Wiley and Sons [ISBN: 0764577042, ©2007, 647 pp, $39.99]. Another excellent reference is Jukka "Yucca" Korpela’s extensive exploration of the topic in his article “A Tutorial on Character Code Issues.” Reading through Jukka's article is strongly recommended for anyone intending to do more than just enter a character into a document. Specifically, it goes through a careful examination of characters and the frequently misused terminology associated with them, such as the differences between character repertoire, character code, character set and character encoding. This frequent misuse, even by experts in the field, has made a complicated topic even more confusing.

If each of the above encoding schemes was just a superset of the previous one, the issue might be relatively minor but, as you know from experience, that’s not how Microsoft works. The original ASCII character set, which stands for American Standard Code for Information Interchange, reserved values 0-31 and 127 as control characters, such as Line Feed, Carriage Return and Bell, so only values 32-126 were actually available for encoding principle characters. The formal standard definition is ANSI X3.4-1986. 

When Microsoft transitioned to the Windows ANSI character set, they based it on a proposed ANSI character standard, but included their own ‘enhancements’. In particular, the actual ANSI standard reserved codes 128 through 159 for additional control characters, while Microsoft used them to display a variety of other characters and symbols. For example, Microsoft used 148 (Decimal) to represent the o dieresis character (ö) when using the ANSI character set, but in the actual standard it represented the control character Cancel Character (CCH). This same code value was later used to represents the close double quote (”) as well. If you look at other operating systems, this code can mean something else altogether. As you can imagine, this can result in a wide range of data conversion errors when passing files between operating systems and applications. Many sources recommend that you avoid using characters in this block. Where you must use them, they recommend explicitly entering them using their decimal or hex Unicode values.

This can become even more confusing depending on the application in which you are working. A good example is Microsoft Word. If you have autocorrect activated in Word, it will auto convert some words/symbols from their ASCII version to distinct symbol glyphs. A commonly used example of this is converting the trade mark abbreviation to the ™ symbol (Unicode symbol 2122).

This brings us back to the original topic of inserting a character using the Alt key and the numeric keypad. If you type the three digit decimal Windows ANSI character code of 176 while holding down the Alt key, you will get a shaded character block (?), which can be useful in generating some ANSI graphics displays, though it is seldom used these days. If you enter the four digit decimal Unicode value of 0176 while holding down the Alt key, you will get the previously mentioned degree symbol (°).

At the risk of confusing things further, Windows XP and more recent versions of Windows also support another method for directly entering Unicode characters into at least some applications. To do this, enter the character U followed by the plus sign (+) character and the four digit hexadecimal value (‘U+xxxx’) or simply enter the four digit hexadecimal value. With the cursor to the immediate right of the hexadecimal value, press the Alt-X key sequence. In supported applications, this will convert the character sequence you entered into the Unicode character it represents. If the Unicode functionality is fully supported, pressing the Alt-X key sequence again will convert the Unicode character back to its hexadecimal representation.

The advantage of using the ‘U+’ sequence is that it helps flag that the string you entered is intended to be converted to a Unicode character. If you make the mistake of entering the U character and the Unicode value without a plus sign, pressing Alt-X will convert the character code to the represented character, but the U will remain in the document (e.g. U00B0 -> U°). As not all applications support this, you need to carefully test this process with the applications you wish to use. In addition, not all Unicode characters support a keystroke equivalent.

In addition, many applications include their own character lookup function, or at least hooks to allow them to access Character Map from inside the application. Common examples of these are OpenOffice and Microsoft Office. The menu option to access this function may vary with the application, but a common sequence is Insert > Symbol.

Many other third party applications are available that incorporate the same basic function as Microsoft’s Character Map utility. A popular free one for Unicode is BabelMap. This utility is capable of showing all 100,713 assigned characters from Unicode 5.1, as well as the current list of 137,468 private use characters. In addition to scrolling through the character map, you can search by character name or by either the decimal or hexadecimal code value. It comes packaged with a number of interesting and useful utilities. These include the ability to perform advanced searches for specific character criteria, information about the currently selected font, and in which version of Unicode a particular character was first introduced. It also can save selected characters in a variety of encoding formats, from UTF-8 through UTF-32, in either Big or Little Endian (i.e. the byte sequence).

I think you'll find it very useful to be familiar with these tools. Oh, and if anyone is still scrolling through the list of symbols looking for one called Any Key, you've fallen victim to one of the classic examples of bad documentation writing. In reality, the system is just saying "Okay, I'm ready and waiting for you. When you're ready, press a key to tell me to proceed, and I don't care which key it is." The only real Any Key you'll find are the specialty advertising key caps a few enterprising companies had made up, along with one reading Panic, for customers to stick on their keyboard!

John Joyce is the LIMS manager for Virginia's State Division of Consolidated Laboratory Services. He may be contacted at editor@ScientificComputing.com.

Related Resources
A Tutorial on Character Code Issues: www.cs.tut.fi/~jkorpela/chars.html 
BabelMap: www.babelstone.co.uk/Software/BabelMap.html 
Character Sets according to Microsoft: www.concordancesoftware.co.uk/manual/hs3260.htm 
Characters and encodings: www.cs.tut.fi/~jkorpela/chars/index.html
ISO/IEC 8859-1: en.wikipedia.org/wiki/ISO_8859-1 
John Wiley and Sons: www.wiley.com 
The Unicode Consortium: unicode.org 

Advertisement

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading