Information that is difficult to digitize

Information That Is Difficult to Digitize

Here are some examples of kinds of information that are more difficult to digitize:

expertise (inference engines, expert rules)
models, simulations
abstract concepts: love, justice, hope, faith
creativity
common sense, context
very large, very time-consuming data such as the human genome

Some information is difficult to digitize because it is difficult to identify the essential parts of the information. For example, computer researchers that work on "expert systems" attempt to build an artificially intelligent system to duplicate the expertise of a professional, such as a doctor. The expert system is programmed to mimic the diagnostic skills of the physician through an intensive interview process with good doctors as well as constructing a database of basic medical facts. The challenging part is determining what is important in making a diagnosis. Often doctors themselves do not necessarily know the explict rules by which they operate when examining a patient and making a diagnosis. Quantifying the intuition and judgment of the physician by drawing out this information through interviews can be a long process.

Other kinds of information are difficult to digitize because the concepts are not well defined. Abstract ideas such as love, justice, hope, faith, or creativity are concepts that are tough to encapsulate in an objective definition that can be programmed into a machine.

It turns out that early programmers found it relatively easy to build computer systems and programs that could perform certain tasks that humans often consider very difficult, such as calculus. On the other hand, common sense, which seems to come naturally to us, can be difficult to program into a computer. This is because common sense requires an understanding of context. Having a frame of reference in which to evaluate simple problems is easy for humans to understand (we do it almost without thinking about it) but not so easy for a computer. For example, an "intelligent" robot system might be placed in a room and given the following problem: "There is a bomb with a timer in the room, set to explode in 5 minutes. Do what is necessary to survive." The robot system might solve the problem by wheeling itself out of the room. However, the computer program may not recognize enough of the context to consider whether the bomb is actually sitting on top of the robot itself; in that case, leaving the room would not help the robot survive.

Some information is difficult to digitize because of the sheer volume of information. The human genome project, for example, is an initiative that has mapped the entire structure of human DNA. It only became feasible because of recent advances in computer technology that could deal with the massive amount of data associated with something as complex as human DNA. Another example of a complex system for which we have a keen interest is the weather. Modeling every single air molecule in the atmosphere (position, velocity) is simply not feasible, so computer models must model some larger element of the weather, such as a storm fronts, high pressure areas, and so forth.

Information that is not easily quantified is also not easily digitized. Because many students, scientists, and researchers now depend on the Web as their primary source of information for research papers, information that is difficult to digitize (and thus less likely to appear on the Web) will often go overlooked completely, even if that information is essential for the topic at hand. For example, many journals that have digitized their recent issues have not digitized back issues because of the cost and time involved. Because the more recent issues are so conveniently searched and browsed, one might not check older issues at all. Because of limited resources, digitization becomes a kind of implicit filter, cutting off certain kinds of information and favoring others.

If you encounter technical errors, contact computing@calvin.edu.