## Introduction

To introduce this subject, let us consider an example that may help you to understand more clearly the idea of representing one thing by another. Take the word cat. It refers to a class of animals, often kept as pets by humans, whose members have a certain set of common characteristics, such as that they have claws, fur, and make purring noises. It is unlikely that you would ever confuse the word cat with the species that it represents or with any particular member of that species.

Digression: At the risk of becoming pedantic, let us go one step farther. Consider that which appears, centered on the screen (or page), between here and the next paragraph.

cat

Is what appears immediately above the word cat itself, or is it just a representation of that word, formed by a pattern of black and white pixels on your computer screen (or ink stains on a sheet of paper, if you're reading a "hard copy" version of this document)? The point is that one could reasonably view each occurrence of the character sequence cat (or any similar sequence that spells some word) appearing on a page, or a computer screen, or a blackboard, etc., as simply a representation of the corresponding word. End of Digression.

Few people would confuse the word cat with the type of animal to which it refers, but many people routinely confuse numerals with the numbers that they represent. For example, consider

35024

This is a five-digit numeral that represents the same number as is represented by the phrase thirty-five thousand twenty-four (which can also be considered to be a numeral!). Just as words refer to (or represent) objects, actions, and various other concepts, numerals refer to (or represent) numbers. In our day-to-day lives, most of us rarely need to make such subtle distinctions. But because computers store representations of concepts, and manipulate those representations, a good understanding of computers requires that you appreciate the difference between a thing and a representation thereof.

Computers are capable of storing and processing data of many different kinds. Among the most common types of data are numeric, textual (composed of characters), logical (i.e., true and false values), visual (i.e., images), and audio (i.e., sound). Yet computers store all data in terms of only 0's and 1's! Or at least that's the point of view taken by computer scientists. The physical manifestation of those 0's and 1's (i.e., by what means the 0's and 1's are represented on whatever physical medium they are stored) is the concern of people who work at levels of abstraction closer to physical reality, such as electronics engineers and physicists.)

How can so many different kinds of data all be expressed in terms of 0's and 1's?? The answer lies in encoding schemes!

## Numeric Data

### Unsigned Integers

We begin by considering unsigned (i.e., nonnegative) integers, or the so-called natural numbers. Most peoples of the world employ the decimal (or base ten) numeral system. In this system, the ten distinct symbols 0, 1, 2, ..., 9 (also called the decimal digits) represent the numbers zero through nine. To express larger numbers, we form sequences of digits and follow the convention that the "worth" of each digit in such a sequence depends not only upon which digit it is (i.e., 4 vs. 7) but also upon its position within the sequence. (Sometimes this is called positional notation.)

More specifically, the positions become increasingly significant as we go from right to left. We say that the rightmost digit is in the 1's column, its neighbor to the left is in the 10's column, the next digit to the left is in the 100's column, the next is in the 1000's column, etc., etc. That is, the weights, or place values, of the columns are the powers of 10. (i.e., 1 (or 100), 10 (or 101), 100 (or 102), 1000 (or 103), etc.). Here is an illustration for the numeral 7326:

 column weights: sequence of (decimal) digits: 1000 100 10 1 7 3 2 6

This numeral means the same thing as

(7 × 1000) + (3 × 100) + (2 × 10) + (6 × 1)

That is, the 7, being in the in the 1000's column, represents 7×1000; the 3, being in the 100's column, represents 3×100; the 2, being in the 10's column, represents 2×10; and the 6, being in the 1's column, represents 6×1.

This system works quite nicely because every nonnegative integer can be expressed as a sum of the form

(dk × 10k) + (dk−1 × 10k−1) + ... + (d1 × 101) + (d0 × 100)

for some natural number k, where each di is a decimal digit (i.e., one of 0, 1, 2, ..., 9). Hence, each such number can be represented by the corresponding numeral

dk dk−1 ... d1 d0

Moreover, if we ignore numerals with leading 0's, each natural number has a unique representation of this form.

Why do we use ten as the base of our numeral system? Is there something inherent about ten that makes it better than any other choice? No! Rather, anthropologists point to evidence that many ancient civilizations adopted counting systems convenient for counting on the hands, which have ten fingers.

We could, for example, just as well use eight as the base (giving rise to the octal system) or 16 (giving rise to the hexadecimal system) or any other integer greater than 1. (There is such a thing as the base 1 (or unary) system, although it is not entirely analogous.)

As an example, consider the octal (i.e., base 8) system. In this system, numerals are formed from the (eight) digits 0 through 7 and the column weights are the powers of eight (1 = 80, 8 = 81, 64 = 82, 512 = 83, etc.). Take, for example, the octal numeral 5207:

 column weights: sequence of (octal) digits: 512 64 8 1 5 2 0 7

Analogous to the decimal numeral example above, we calculate (using base 10 numerals!) that the number represented by the octal numeral 5207 is

(5 × 512) + (2 × 64) + (0 × 8) + (7 × 1)

which works out to 2695 (expressed in decimal). That is, we have

5207(8) = 2695(10)

Note that we place a (decimal numeral) subscript to the right of a numeral in order to indicate its base explicitly.

For reasons having to do with the concerns of engineering (such as reliability and cost), devices on which digital data are stored are built in such a way that each atomic unit of memory/storage is a switch, meaning that, at any moment in time, it is in one of two possible states. By convention, we refer to these states as 0 and 1, which, of course, correspond to the two digits that are available in the binary (or base 2) numeral system. One might call each of these a binary digit, from which we get the contraction bit. It would seem natural, then, for computers to employ the binary numeral system for representing numbers.

As an example, take the binary numeral 10100110(2):

 column weights: sequence of (binary) digits: 128 64 32 16 8 4 2 1 1 0 1 0 0 1 1 0

Notice that the column weights are the powers of two. Analogous to the examples above, we have that 101001102 represents the number corresponding to the sum (expressed in decimal numerals)

(1 × 128) + (0 × 64) + (1 × 32) + (0 × 16) + (0 × 8) + (1 × 4) + (1 × 2) + (0 × 1)

which comes out (in decimal) to 166.

In general, to translate a binary numeral into its decimal equivalent, do exactly as we did in arriving at 166 in the above example: simply add up the weights of the columns in which the binary numeral contains 1's.

Translating from decimal to binary is only a little more difficult. Perhaps the most intuitively appealing approach is to find the powers of two that sum up to the desired number. We illustrate this with an example: Suppose that we want to express the number 75 (here expressed in decimal notation, as usual) in binary notation. First find the largest power of two that is less than or equal to 75. That would be 64 (or 26), because the next higher power of two is 128, which is too big. As 75 − 64 = 11, it remains to find powers of two that sum to 11. Following the same technique as before, find the largest power of two no greater than 11. That would be 8 (or 23). As 11 − 8 = 3, it remains to find powers of two summing to 3. The largest power of two no greater than 3 is 2 (or 21). As 3 − 2 = 1, it remains to find powers of two summing to 1. The largest power of two no greater than 1 is 1 (or 20). As 1 − 1 = 0, we are done. What we have determined is that 75 can be written as the sum of powers of two as follows:

75 = 64 + 8 + 2 + 1

which is to say that the binary representation of 75 has 1's in the 64's, 8's, 2's and 1's columns and 0's in every other column. Omitting leading 0's (in the columns with weights greater than 64), this yields

 column weights: sequence of (binary) digits: 64 32 16 8 4 2 1 1 0 0 1 0 1 1

That is, the binary numeral we seek is 1001011(2).

#### Arithmetic Operations

For a computer to be useful as a "number cruncher", it needs not only to be able to encode integer values, but also to be able to perform arithmetic operations upon them. How can addition, for example, be carried out upon numbers encoded using the binary numeral system? Well, it turns out that addition, as well as the other arithmetic operations, can be performed in binary (or any other base) similarly to how humans perform it in decimal.

Here is an example:

 ¹ ¹ ¹ ¹ 1 1 0 1 1 0 + 0 1 0 1 1 0 ------------- 1 0 0 1 1 0 0
 column weights   carry   addend #1 digits   addend #2 digits     sum 64 32 16 8 4 2 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 0 0

Just as in decimal addition, we work from least significant digit towards most significant, or right-to-left. In the 1's column, we have 0+0 = 0, so we record a 0 in that column in the result, and we carry zero to the 2's column.

In the 2's column, we have 0+1+1 = 2 (the zero corresponding to the incoming carry). But 2(10) = 10(2). Hence, we record the 0 in the result and carry a 1. (This is analogous to, in decimal, having a column with, say 8 and 6 in it, which yields 14, so we record the 4 and carry the 1.)

In the 4's column, we have 1+1+1=3, or 11(2). Hence, we record the 1 in the result and carry a 1.

We leave it to the reader to make sense of what happened in the 8's and 16's columns.

In the 32's column, we have 1+1+0, which yields 10(2), so we record 0 and carry 1 to the next column. As the 64's column does not exist in the two addends, implicitly the bits there are both 0. Hence, in the 64's column we have 1+0+0 = 1 = 01(2), so we record the 1 and carry a 0. Obviously, all remaining columns to the left will have 0's, so we are done.

### Signed Integers

So far, our discussion has included only natural numbers, i.e., nonnegative integers. Obviously, we would like to be able to encode (and perform arithmetic upon) negative integers, too.

Our standard way of writing a decimal numeral representing a negative number is to place a minus sign in front of its digits. For example, we read −53 as "negative fifty-three". We typically write positive fifty-three as 53, with no sign, but if we want to emphasize that it is positive, we could write it as +53. The point is that every decimal numeral begins, either implicitly or explicitly, with a symbol indicating its sign, which is followed by a sequence of digits that represent its magnitude (i.e., a "distance" from zero). We could reasonbly call this the sign-magnitude representation scheme.

As there are two signs, + and −, a very natural way to incorporate the notion of a sign in a binary numeral is to use a single bit to encode it. For example, we could encode + by 0 and − by 1. If we further decide to place the sign first (i.e., use the first bit to encode the sign), then, for example, the binary numeral 110110 would represent -22. (The 1 in the first bit indicates that the number is negative; the other three 1's are in the 16, 4, and 2 columns, and so yield a magnitude of 22.)

The sign-magnitude approach may be the most natural for humans, but it turns out that an alternative scheme, called two's complement, is what most computers use. Under this scheme, the weight (or place value) of the most significant bit is negative. For example, suppose we have an 8-bit numeral. Then the column weights are as usual, except that the weight associated to the leftmost column is -(27) rather than +(27). Hence, the binary numeral 11001001 represents

(1 × -128) + (1 × 64) + (1 × 8) + (1 × 1) = -55

### Computer Storage of Integers

Generally speaking, computers store numerals in fixed-length chunks of memory. For integers, this is typically either two bytes (16 bits) or four bytes (32 bits). In order to make the numbers somewhat smaller (and hence easier to deal with), let's suppose that we're using only one byte (8 bits) to store an integer. Recall that the number of distinct bit strings of length k, for any natural number k, is 2k. Plugging in 8 for k, we get 256. Hence, by deciding to store integers in single bytes, we limit ourselves to a universe of at most 256 different integers that we can encode/represent.

In the case of our unsigned binary numeral encoding scheme (the first one discussed above), the range of integer values that can be represented goes from 0 (using the bit string 00000000 of eight zero's) up to 255 (using the bit string 11111111 of eight one's).

With the two's complement scheme, the range goes from −128 (using the bit string 10000000) to +127 (using 01111111).

Using the sign-magnitude approach, the range goes from −127 (using 11111111) to +127 (using 01111111). It's interesting that this range has only 255 distinct values in it, rather than 256. The reason? Because zero has two different representations, 00000000 (i.e., +0) and 10000000 (i.e., −0)!

The larger point being made here is that, regardless of how many bits are chosen as being the "standard size" for representing integers (or any other type of data), the set of values that is encodable inside any fixed-length chunk of storage is finite. Hence, if the (accurate) result of some particular computation is outside this set, the result that actually gets stored will be in error. For example, if we are working in the realm of 8-bit numerals represented using the 2's complement scheme and we try to add 95 (01011111(2)) and 67 (01000011(2)), we cannot get the correct result (162), simply because that value is outside the range (namely, -128 to +127) of values representable using 2's complement 8-bit numerals.

### Real Numbers

A detailed discussion of how real numbers are encoded is omitted for now. But we note that, like integers, real numbers are typically stored in fixed-length chunks of memory, typically either 32 or 64 bits. As with integers, this limits the range of possible values that can be represented. In addition, however, it limits the precision or accuracy with which real numbers can be stored. For example, in the most common 32-bit representation scheme for real numbers (called single-precision floating point), we cannot accurately represent numbers with more than seven significant (decimal) digits. Hence, for example, the closest we could come (using 32 bits) to representing the number 53.000006372 (having eleven significant digits) might be something closer to 53.00001 (which has only seven digits and is rounded to the nearest one hundred thousandth). Indeed, if the computer were instructed to add 53.0 and 0.000006372, the result would likely be 53.00001.

The main point to remember is that the results produced by computations involving real numbers (stored in fixed-length chunks of memory) are (generally speaking) only approximations and should not be interpreted as providing exact answers.

## Text

A discussion omitted for now, except to point out that, among several standards that exist, the one most widely used is probably ASCII (American Standard Code for Information Interchange). The ASCII standard simply assigns to each of 128 distinct characters a distinct code in the form of a bit string of length seven. (Note that 27 is 128, not accidentally.) Among the 128 characters found in ASCII are those you would expect: upper and lower case (Roman) letters (52 of them), the ten digits (i.e., 0,1,2,...9), several punctuation characters (period, comma, semicolon, etc.), and several special characters (e.g., parentheses, ampersand, asterisk, dollar sign, etc.). Also included are about thirty "characters" that are not characters as most people would think of them; rather, they are intended to be used as codes for computers or other devices (e.g., printers) that deal with textual data. An example is the "carriage return" character, which is used to signal a printing device that it should move to the beginning of the line before continuing.

Extended ASCII extends regular ASCII by using an eighth bit, thereby resulting in a coding scheme for 256 (28) different characters.

In recent years, in an attempt to create a character encoding standard that acknowledges the existence of the non-English-speaking world by including characters found in the various alphabets that they use (e.g., Hebrew, Greek, Russian, etc.), the Unicode standard has been introduced. Due to the large number of characters it seeks to include, Unicode specifies a 16-bit code for each character. This gives it the capability of accommodating 216 (65536) different characters! (This is actually an over-simplification, but one that suffices for our purposes.)

## Digital Images

A digital image can be viewed as a (typically, rectangular) grid of dots, or pixels. ("Pixel" is a contraction for "picture element".)

Resolution is a measure of how much detail an image holds, but exactly what it means depends upon context. Pixel resolution describes the size of an image in terms of its width (number of columns) and height (number of rows). For example, 1024 × 768 is a common resolution for computer monitors, which is to say that such monitors have 1024 columns and 768 rows of pixels. Spatial resolution describes how densely packed the pixels are, and is usually expressed in terms of pixels per inch (ppi) (or dots per inch (dpi)). (To use such a measure, rather than pixels per square inch, would seem to imply that the density is the same along the rows and along the columns.) It is this quality that, practically speaking, determines the clarity of an image. In 2010, computer monitors typically had a spatial resolution of between 72 and 100 ppi.

In a binary image (also called a black-and-white or bi-level image), each pixel is either black or white. Some devices, including fax machines and some laser printers, can handle only bi-level images. As each pixel's appearance can be characterized by one of only two possible values (black or white), the obvious way to represent a single pixel is with a single bit, where 0 represents black and 1 represents white (or vice versa). (Recall the image of the tiger shown in class.)

In a grayscale image, each pixel is of some shade of gray ranging from the darkest, black, to the lightest, white. Hence, a black-and-white image (as discussed immediately above) is just a special case of a grayscale image in which there are only two shades of gray, black and white. However, when one talks of a grayscale image, by implication one usually means an image in which there are more possible shades. Some early computer monitors were capable of displaying any of sixteen shades of gray, for example.

What are commonly referred to as black-and-white photographs are really grayscale images. In such photographs, it is typical for there to be any of 256 possible shades of gray. In some applications, including medical imaging (where it is important for the image to be very detailed and precise), the number of possible shades of gray exceeds one thousand (1024, say, or 4096).

It's no accident that the number of possible shades of gray in the examples above are powers of two! Note that 16 = 24, 256 = 28, and 1024 = 210. Hence, in an image in which each pixel can be any of 16 shades of gray, the obvious way to represent each pixel is using 4 bits (i.e., a half-byte). Interpreted as an unsigned integer, a bit string of length four represents an integer value in the range 0..15. The standard approach is for 0 to represent black (the darkest shade) and for 15 to represent white (the lightest shade), with the numbers in between representing increasingly lighter shades, as we go from 1 to 14. In an analogous fashion, each pixel in an image allowing any of 256 shades would be represented by a bit string of length eight (i.e., a byte) representing in integer in the range 0..255.

In color images, each pixel has a color. Following the RGB color model, in which red, green, and blue are the primary colors, each pixel's appearance can be described by an RGB triple that describes the intensities of red, green, and blue, respectively, present in that pixel. One standard representation, called truecolor, uses 24 bits to store the RGB value of each pixel, eight bits for each of the three components (which, of course, are viewed as integers in the range 0..255). Each cell in the table below is labeled with the RGB value of its background color.

 255,0,0 255,127,0 255,255,0 255,127,127 255,255,127 255,0,127 0,255,0 127,255,0 255,0,255 127,255,127 32,32,32 127,127,127 0,0,255 127,0,255 127,127,255 0,127,127 0,0,127 255,255,255

If you want to view lots of examples of colors and see how they are represented in RGB, click here.

So far we've talked about how individual pixels are represented. What about an image as a whole? Remember, an image is just a two-dimensional grid of pixels, or rows and columns of pixels. To encode an image as a whole, we can "linearize" the two-dimensional grid into a sequence of pixels by, for example, starting with the first row of pixels, then moving to the second, and then to the third, etc. For example, consider the 5 × 5 table below, which is supposed to illustrate an image with five rows and five columns of pixels. (The image forms a somewhat crude upper case N.)

Then, using 0 for black and 1 for white and linearizing the image, we get

 0111000110010100110001110 ^ ^ ^ ^ ^

(The caret symbols indicate the last bit of the representation of each row of pixels.)

#### Images are Big!

According to the wikipedia entry on digital cameras, the Canon 350D takes pictures with pixel resolution 3456 × 2304, for a total of about 8 million pixels! If we stored such an image in the straightforward way sketched above (with each pixel being represented by a bit string of 24 bits, or three bytes), it would take approximately 24 million bytes (i.e., 24MB)! That's a lot of space, and transmitting such a representation over a network takes time. Hence, various ways of compressing (i.e., making smaller) such representations have been developed.

A compression technique is said to be lossless if it can be reversed, meaning that data compressed using that technique can be decompressed to recover the original representation. A compression technique is said to be lossy if, in general, it cannot be reversed, which is to say that decompression will yield something close to the original representation, but (probably) not matching it exactly. Because the human vision system has only a certain degree of sensitivity, and hence cannot distinguish two images that differ only in subtle ways, most compression techniques that are used for digital images are lossy. The same is true for representations of audio (e.g., music). In contrast, to use lossy compression on numeric or textual data could be disastrous, because, for most applications, it is imperative that that kind of data be recoverable in exact form.

Different compression techniques have led to the existence of several image file formats that are in common use, some of which you have probably heard of, including JPEG, TIFF, and GIF. Each one has its strengths and weaknesses. Digital images include photographs, cartoons, diagrams, and other varieties. Some image file formats are better for one kind of image than another.

Omitted for now.