To introduce this subject, let us consider an example that may help you to understand more clearly the idea of representing one thing by another. Take the word cat. It refers to a class of animals, often kept as pets by humans, whose members have a certain set of common characteristics, such as that they have claws, fur, and make purring noises. It is unlikely that you would ever confuse the word cat with the species that it represents or with any particular member of that species.
Digression: At the risk of becoming pedantic, let us go one step farther. Consider that which appears, centered on the screen (or page), between here and the next paragraph.
Is what appears immediately above the word cat itself, or is it just a representation of that word, formed by a pattern of black and white pixels on your computer screen (or ink stains on a sheet of paper, if you're reading a "hard copy" version of this document)? The point is that one could reasonably view each occurrence of the character sequence cat (or any similar sequence that spells some word) appearing on a page, or a computer screen, or a blackboard, etc., as simply a representation of the corresponding word. End of Digression.
Few people would confuse the word cat with the type of animal to which it refers, but many people routinely confuse numerals with the numbers that they represent. For example, consider
This is a five-digit numeral that represents the same number as is represented by the phrase thirty-five thousand twenty-four (which can also be considered to be a numeral!). Just as words refer to (or represent) objects, actions, and various other concepts, numerals refer to (or represent) numbers. In our day-to-day lives, most of us rarely need to make such subtle distinctions. But because computers store representations of concepts, and manipulate those representations, a good understanding of computers requires that you appreciate the difference between a thing and a representation thereof.
Computers are capable of storing and processing data of many different kinds. Among the most common types of data are numeric, textual (composed of characters), logical (i.e., true and false values), visual (i.e., images), and audio (i.e., sound). Yet computers store all data in terms of only 0's and 1's! Or at least that's the point of view taken by computer scientists. The physical manifestation of those 0's and 1's (i.e., by what means the 0's and 1's are represented on whatever physical medium they are stored) is the concern of people who work at levels of abstraction closer to physical reality, such as electronics engineers and physicists.)
How can so many different kinds of data all be expressed in terms of 0's and 1's?? The answer lies in encoding schemes!
More specifically, the positions become increasingly significant as we go from right to left. We say that the rightmost digit is in the 1's column, its neighbor to the left is in the 10's column, the next digit to the left is in the 100's column, the next is in the 1000's column, etc., etc. That is, the weights, or place values, of the columns are the powers of 10. (i.e., 1 (or 100), 10 (or 101), 100 (or 102), 1000 (or 103), etc.). Here is an illustration for the numeral 7326:
|sequence of (decimal) digits:||7||3||2||6|
This numeral means the same thing as
That is, the 7, being in the in the 1000's column, represents 7×1000; the 3, being in the 100's column, represents 3×100; the 2, being in the 10's column, represents 2×10; and the 6, being in the 1's column, represents 6×1.
This system works quite nicely because every nonnegative integer can be expressed as a sum of the form
Moreover, if we ignore numerals with leading 0's, each natural number has a unique representation of this form.
Why do we use ten as the base of our numeral system? Is there something inherent about ten that makes it better than any other choice? No! Rather, anthropologists point to evidence that many ancient civilizations adopted counting systems convenient for counting on the hands, which have ten fingers.
We could, for example, just as well use eight as the base (giving rise to the octal system) or 16 (giving rise to the hexadecimal system) or any other integer greater than 1. (There is such a thing as the base 1 (or unary) system, although it is not entirely analogous.)
As an example, consider the octal (i.e., base 8) system. In this system, numerals are formed from the (eight) digits 0 through 7 and the column weights are the powers of eight (1 = 80, 8 = 81, 64 = 82, 512 = 83, etc.). Take, for example, the octal numeral 5207:
|sequence of (octal) digits:||5||2||0||7|
Analogous to the decimal numeral example above, we calculate (using
base 10 numerals!) that the number represented by the octal numeral 5207 is
Note that we place a (decimal numeral) subscript to the right of a numeral in order to indicate its base explicitly.
For reasons having to do with the concerns of engineering (such as reliability and cost), devices on which digital data are stored are built in such a way that each atomic unit of memory/storage is a switch, meaning that, at any moment in time, it is in one of two possible states. By convention, we refer to these states as 0 and 1, which, of course, correspond to the two digits that are available in the binary (or base 2) numeral system. One might call each of these a binary digit, from which we get the contraction bit. It would seem natural, then, for computers to employ the binary numeral system for representing numbers.
As an example, take the binary numeral 10100110(2):
|sequence of (binary) digits:||1||0||1||0||0||1||1||0|
Notice that the column weights are the powers of two. Analogous to the examples above, we have that 101001102 represents the number corresponding to the sum (expressed in decimal numerals)
which comes out (in decimal) to 166.
In general, to translate a binary numeral into its decimal equivalent, do exactly as we did in arriving at 166 in the above example: simply add up the weights of the columns in which the binary numeral contains 1's.
Translating from decimal to binary is only a little more difficult. Perhaps the most intuitively appealing approach is to find the powers of two that sum up to the desired number. We illustrate this with an example: Suppose that we want to express the number 75 (here expressed in decimal notation, as usual) in binary notation. First find the largest power of two that is less than or equal to 75. That would be 64 (or 26), because the next higher power of two is 128, which is too big. As 75 − 64 = 11, it remains to find powers of two that sum to 11. Following the same technique as before, find the largest power of two no greater than 11. That would be 8 (or 23). As 11 − 8 = 3, it remains to find powers of two summing to 3. The largest power of two no greater than 3 is 2 (or 21). As 3 − 2 = 1, it remains to find powers of two summing to 1. The largest power of two no greater than 1 is 1 (or 20). As 1 − 1 = 0, we are done. What we have determined is that 75 can be written as the sum of powers of two as follows:
which is to say that the binary representation of 75 has 1's in the 64's, 8's, 2's and 1's columns and 0's in every other column. Omitting leading 0's (in the columns with weights greater than 64), this yields
|sequence of (binary) digits:||1||0||0||1||0||1||1|
That is, the binary numeral we seek is 1001011(2).
Here is an example:
¹ ¹ ¹ ¹ 1 1 0 1 1 0 + 0 1 0 1 1 0 ------------- 1 0 0 1 1 0 0
|addend #1 digits||1||1||0||1||1||0|
|addend #2 digits||0||1||0||1||1||0|
Just as in decimal addition, we work from least significant digit towards most significant, or right-to-left. In the 1's column, we have 0+0 = 0, so we record a 0 in that column in the result, and we carry zero to the 2's column.
In the 2's column, we have 0+1+1 = 2 (the zero corresponding to the incoming carry). But 2(10) = 10(2). Hence, we record the 0 in the result and carry a 1. (This is analogous to, in decimal, having a column with, say 8 and 6 in it, which yields 14, so we record the 4 and carry the 1.)
In the 4's column, we have 1+1+1=3, or 11(2). Hence, we record the 1 in the result and carry a 1.
We leave it to the reader to make sense of what happened in the 8's and 16's columns.
In the 32's column, we have 1+1+0, which yields 10(2), so we record 0 and carry 1 to the next column. As the 64's column does not exist in the two addends, implicitly the bits there are both 0. Hence, in the 64's column we have 1+0+0 = 1 = 01(2), so we record the 1 and carry a 0. Obviously, all remaining columns to the left will have 0's, so we are done.
So far, our discussion has included only natural numbers, i.e., nonnegative integers. Obviously, we would like to be able to encode (and perform arithmetic upon) negative integers, too.
Our standard way of writing a decimal numeral representing a negative number is to place a minus sign in front of its digits. For example, we read −53 as "negative fifty-three". We typically write positive fifty-three as 53, with no sign, but if we want to emphasize that it is positive, we could write it as +53. The point is that every decimal numeral begins, either implicitly or explicitly, with a symbol indicating its sign, which is followed by a sequence of digits that represent its magnitude (i.e., a "distance" from zero). We could reasonbly call this the sign-magnitude representation scheme.
As there are two signs, + and −, a very natural way to incorporate the notion of a sign in a binary numeral is to use a single bit to encode it. For example, we could encode + by 0 and − by 1. If we further decide to place the sign first (i.e., use the first bit to encode the sign), then, for example, the binary numeral 110110 would represent -22. (The 1 in the first bit indicates that the number is negative; the other three 1's are in the 16, 4, and 2 columns, and so yield a magnitude of 22.)
The sign-magnitude approach may be the most natural for humans, but it turns out that an alternative scheme, called two's complement, is what most computers use. Under this scheme, the weight (or place value) of the most significant bit is negative. For example, suppose we have an 8-bit numeral. Then the column weights are as usual, except that the weight associated to the leftmost column is -(27) rather than +(27). Hence, the binary numeral 11001001 represents
In the case of our unsigned binary numeral encoding scheme (the first one discussed above), the range of integer values that can be represented goes from 0 (using the bit string 00000000 of eight zero's) up to 255 (using the bit string 11111111 of eight one's).
With the two's complement scheme, the range goes from −128 (using the bit string 10000000) to +127 (using 01111111).
Using the sign-magnitude approach, the range goes from −127 (using 11111111) to +127 (using 01111111). It's interesting that this range has only 255 distinct values in it, rather than 256. The reason? Because zero has two different representations, 00000000 (i.e., +0) and 10000000 (i.e., −0)!
The larger point being made here is that, regardless of how many bits are chosen as being the "standard size" for representing integers (or any other type of data), the set of values that is encodable inside any fixed-length chunk of storage is finite. Hence, if the (accurate) result of some particular computation is outside this set, the result that actually gets stored will be in error. For example, if we are working in the realm of 8-bit numerals represented using the 2's complement scheme and we try to add 95 (01011111(2)) and 67 (01000011(2)), we cannot get the correct result (162), simply because that value is outside the range (namely, -128 to +127) of values representable using 2's complement 8-bit numerals.
The main point to remember is that the results produced by computations involving real numbers (stored in fixed-length chunks of memory) are (generally speaking) only approximations and should not be interpreted as providing exact answers.
A discussion omitted for now, except to point out that, among several standards that exist, the one most widely used is probably ASCII (American Standard Code for Information Interchange). The ASCII standard simply assigns to each of 128 distinct characters a distinct code in the form of a bit string of length seven. (Note that 27 is 128, not accidentally.) Among the 128 characters found in ASCII are those you would expect: upper and lower case (Roman) letters (52 of them), the ten digits (i.e., 0,1,2,...9), several punctuation characters (period, comma, semicolon, etc.), and several special characters (e.g., parentheses, ampersand, asterisk, dollar sign, etc.). Also included are about thirty "characters" that are not characters as most people would think of them; rather, they are intended to be used as codes for computers or other devices (e.g., printers) that deal with textual data. An example is the "carriage return" character, which is used to signal a printing device that it should move to the beginning of the line before continuing.
Extended ASCII extends regular ASCII by using an eighth bit, thereby resulting in a coding scheme for 256 (28) different characters.
In recent years, in an attempt to create a character encoding standard that acknowledges the existence of the non-English-speaking world by including characters found in the various alphabets that they use (e.g., Hebrew, Greek, Russian, etc.), the Unicode standard has been introduced. Due to the large number of characters it seeks to include, Unicode specifies a 16-bit code for each character. This gives it the capability of accommodating 216 (65536) different characters! (This is actually an over-simplification, but one that suffices for our purposes.)
A digital image can be viewed as a (typically, rectangular) grid of dots, or pixels. ("Pixel" is a contraction for "picture element".)
Resolution is a measure of how much detail an image holds, but exactly what it means depends upon context. Pixel resolution describes the size of an image in terms of its width (number of columns) and height (number of rows). For example, 1024 × 768 is a common resolution for computer monitors, which is to say that such monitors have 1024 columns and 768 rows of pixels. Spatial resolution describes how densely packed the pixels are, and is usually expressed in terms of pixels per inch (ppi) (or dots per inch (dpi)). (To use such a measure, rather than pixels per square inch, would seem to imply that the density is the same along the rows and along the columns.) It is this quality that, practically speaking, determines the clarity of an image. In 2010, computer monitors typically had a spatial resolution of between 72 and 100 ppi.
In a binary image (also called a black-and-white or bi-level image), each pixel is either black or white. Some devices, including fax machines and some laser printers, can handle only bi-level images. As each pixel's appearance can be characterized by one of only two possible values (black or white), the obvious way to represent a single pixel is with a single bit, where 0 represents black and 1 represents white (or vice versa). (Recall the image of the tiger shown in class.)
In a grayscale image, each pixel is of some shade of gray ranging from the darkest, black, to the lightest, white. Hence, a black-and-white image (as discussed immediately above) is just a special case of a grayscale image in which there are only two shades of gray, black and white. However, when one talks of a grayscale image, by implication one usually means an image in which there are more possible shades. Some early computer monitors were capable of displaying any of sixteen shades of gray, for example.
What are commonly referred to as black-and-white photographs are really grayscale images. In such photographs, it is typical for there to be any of 256 possible shades of gray. In some applications, including medical imaging (where it is important for the image to be very detailed and precise), the number of possible shades of gray exceeds one thousand (1024, say, or 4096).
It's no accident that the number of possible shades of gray in the examples above are powers of two! Note that 16 = 24, 256 = 28, and 1024 = 210. Hence, in an image in which each pixel can be any of 16 shades of gray, the obvious way to represent each pixel is using 4 bits (i.e., a half-byte). Interpreted as an unsigned integer, a bit string of length four represents an integer value in the range 0..15. The standard approach is for 0 to represent black (the darkest shade) and for 15 to represent white (the lightest shade), with the numbers in between representing increasingly lighter shades, as we go from 1 to 14. In an analogous fashion, each pixel in an image allowing any of 256 shades would be represented by a bit string of length eight (i.e., a byte) representing in integer in the range 0..255.
In color images, each pixel has a color. Following the RGB color model, in which red, green, and blue are the primary colors, each pixel's appearance can be described by an RGB triple that describes the intensities of red, green, and blue, respectively, present in that pixel. One standard representation, called truecolor, uses 24 bits to store the RGB value of each pixel, eight bits for each of the three components (which, of course, are viewed as integers in the range 0..255). Each cell in the table below is labeled with the RGB value of its background color.
If you want to view lots of examples of colors and see how they are represented in RGB, click here.
So far we've talked about how individual pixels are represented. What about an image as a whole? Remember, an image is just a two-dimensional grid of pixels, or rows and columns of pixels. To encode an image as a whole, we can "linearize" the two-dimensional grid into a sequence of pixels by, for example, starting with the first row of pixels, then moving to the second, and then to the third, etc. For example, consider the 5 × 5 table below, which is supposed to illustrate an image with five rows and five columns of pixels. (The image forms a somewhat crude upper case N.)
0111000110010100110001110 ^ ^ ^ ^ ^
(The caret symbols indicate the last bit of the representation of each row of pixels.)
A compression technique is said to be lossless if it can be reversed, meaning that data compressed using that technique can be decompressed to recover the original representation. A compression technique is said to be lossy if, in general, it cannot be reversed, which is to say that decompression will yield something close to the original representation, but (probably) not matching it exactly. Because the human vision system has only a certain degree of sensitivity, and hence cannot distinguish two images that differ only in subtle ways, most compression techniques that are used for digital images are lossy. The same is true for representations of audio (e.g., music). In contrast, to use lossy compression on numeric or textual data could be disastrous, because, for most applications, it is imperative that that kind of data be recoverable in exact form.
Different compression techniques have led to the existence of several image file formats that are in common use, some of which you have probably heard of, including JPEG, TIFF, and GIF. Each one has its strengths and weaknesses. Digital images include photographs, cartoons, diagrams, and other varieties. Some image file formats are better for one kind of image than another.
Omitted for now.