Information that is easy to digitize:
Digitizing Letters & Words

Letters can be digitized simply by assigning a binary code to each letter. The most common scheme for doing this mapping is ASCII (American Standard Code for Information Interchange). Many word processors can save documents using this simple format (sometimes it is called "plain text"). This encoding only stores the letters of the document, without any formatting (like bold, centered, footnotes, etc.)

Here are some selected parts of the ASCII table:

Letter Code
# 00100011
$ 00100100
% 00100101
   
2 00110010
3 00110011
   
J 01001010
K 01001011
L 01001100
   
i 01101001
j 01101010
k 01101011

Each letter is assigned a unique pattern of 1s and 0s. Notice that uppercase letters are assigned a different pattern than lowercase letters (the patterns are related, however). Also notice that punctuation symbols and numerical digits are assigned codes as well. Thus the number two in binary is 10, but the character "2" in the ASCII table is assigned the code 00110010. Conversely, the pattern 00100011 is the character "#" if interpreted according to the ASCII table, but is the number 35 (base-10) if interpreted as a binary number. It is up to the software program to interpret the data appropriately.

If you see a C, you might interpret it as a alphabetic letter; however, you could also interpret that same symbol as a musical note. If you heard this symbol pronounced, you might also interpret the sound as the word see or sea. The meaning of the symbol depends on the interpretation. This is also true for the computer: the meaning of the bits (the 1s and 0s) depends on the interpretation invoked by the software. The human programmer who wrote the software is thus interpreting the bits according to some scheme (perhaps ASCII) that he or she may have created, or perhaps using a scheme that has been agreed upon by a group of computer programmers (and then it is called a standardized format). The ASCII standard is an early scheme adopted for textual interpretations of bits. Rich Text Format (RTF) is a more recent standard that represents textual information, including formatting such as bold or centering. Sometimes there is an economic advantage for a software company that uses a proprietary scheme to encode information, because then users may feel compelled to use only products from that company (since no other software can read those files). Of course, this strategy can backfire if the company's private standard doesn't take hold in the marketplace, because some users will avoid using software that produces files that cannot be read by other software.

Previous Page Next Page

 

 

 


These pages were written by Steven H. VanderLeest and Jeffrey Nyhoff and edited by Nancy Zylstra
©2005 Calvin University (formerly Calvin College), All Rights Reserved

If you encounter technical errors, contact computing@calvin.edu.