To introduce this subject, let us consider an example that may help you
to understand more clearly the idea of representing one thing by another.
Take the word **cat**. It refers to a class of animals,
often kept as pets by humans, whose members have a certain set of common
characteristics, such as that they have claws, fur, and make purring
noises. It is unlikely that you would ever confuse the word **cat**
with the species that it represents or with any particular member of that
species.

**Digression:**
At the risk of becoming pedantic, let us go one step farther.
Consider that which appears, centered on the screen (or page), between
here and the next paragraph.

Is what appears immediately above the word **cat** itself,
or is it just a *representation* of that word,
formed by a pattern of black and white pixels on your computer screen
(or ink stains on a sheet of paper, if you're reading a "hard copy"
version of this document)?
The point is that one could reasonably view each occurrence of the
character sequence `cat` (or any similar sequence that spells
some word) appearing on a page, or a computer screen, or a blackboard, etc.,
as simply a representation of the corresponding word.
**End of Digression.**

Few people would confuse the word **cat** with the type of animal to
which it refers, but many people routinely confuse **numerals**
with the **numbers** that they represent. For example, consider

This is a five-digit numeral that represents the same number as is
represented by the phrase **thirty-five thousand twenty-four**
(which can also be considered to be a numeral!).
Just as words refer to (or represent) objects, actions, and various
other concepts, numerals refer to (or represent) numbers.
In our day-to-day lives, most of us rarely need to make such subtle
distinctions. But because computers store representations of concepts,
and manipulate those representations, a good understanding of computers
requires that you appreciate the difference between a thing and a
representation thereof.

Computers are capable of storing and processing data of many different kinds.
Among the most common types of data are **numeric**, **textual**
(composed of characters), **logical** (i.e., true and false values),
**visual** (i.e., images), and **audio** (i.e., sound).
Yet computers store all data in terms of only 0's and 1's!
Or at least that's the point of view taken by computer scientists.
The physical manifestation of those 0's and 1's (i.e., by what means the
0's and 1's are represented on whatever physical medium they are stored)
is the concern of people who work at levels of abstraction closer to
physical reality, such as electronics engineers and physicists.)

How can so many different kinds of data all be expressed in terms of 0's
and 1's?? The answer lies in **encoding schemes**!

More specifically, the positions become increasingly **significant**
as we go from right to left.
We say that the rightmost digit is in the 1's column, its neighbor
to the left is in the 10's column, the next digit to the left is in
the 100's column, the next is in the 1000's column, etc., etc.
That is, the **weights**, or **place values**, of the columns are
the powers of 10. (i.e., 1 (or 10^{0}), 10 (or 10^{1}),
100 (or 10^{2}), 1000 (or 10^{3}), etc.).
Here is an illustration for the numeral 7326:

column weights: | 1000 |
100 |
10 |
1 |
---|---|---|---|---|

sequence of (decimal) digits: | 7 | 3 | 2 | 6 |

This numeral means the same thing as

That is, the 7, being in the in the 1000's column, represents
`7×1000`;
the 3, being in the 100's column, represents `3×100`;
the 2, being in the 10's column, represents `2×10`;
and the 6, being in the 1's column, represents `6×1`.

This system works quite nicely because *every* nonnegative integer
can be expressed as a sum of the form

for some natural number k, where each d

Moreover, if we ignore numerals with leading 0's, each natural number
has a *unique* representation of this form.

Why do we use ten as the base of our numeral system?
Is there something inherent about ten that makes it better than any
other choice? **No!**
Rather, anthropologists point to evidence that many ancient civilizations
adopted counting systems convenient for counting on the hands, which
have ten fingers.

We could, for example, just as well use eight as the base
(giving rise to the **octal** system) or 16 (giving rise to
the **hexadecimal** system) or any other integer greater than 1.
(There is such a thing as the base 1 (or unary) system, although
it is not entirely analogous.)

As an example, consider the octal (i.e., base 8) system.
In this system, numerals are formed from the (eight) digits
`0` through `7` and the column weights are the
powers of eight (1 = 8^{0}, 8 = 8^{1}, 64 = 8^{2},
512 = 8^{3}, etc.).
Take, for example, the octal numeral 5207:

column weights: | 512 |
64 |
8 |
1 |
---|---|---|---|---|

sequence of (octal) digits: | 5 | 2 | 0 | 7 |

Analogous to the decimal numeral example above, we calculate (using
base 10 numerals!) that the number represented by the octal numeral 5207 is

which works out to

Note that we place a (decimal numeral) subscript to the right of a numeral in order to indicate its base explicitly.

For reasons having to do with the concerns of engineering (such as
reliability and cost), devices on which digital data are stored are built
in such a way that each atomic unit of memory/storage is a **switch**,
meaning that, at any moment in time, it is in one of two possible states.
By convention, we refer to these states as **0** and **1**, which,
of course, correspond to the two digits that are available in the
**binary** (or **base 2**) numeral system.
One might call each of these a ** binary digit**,
from which we get the contraction

As an example, take the binary numeral 10100110_{(2)}:

column weights: | 128 |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
---|---|---|---|---|---|---|---|---|

sequence of (binary) digits: | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |

Notice that the column weights are the powers of two. Analogous to the
examples above, we have that 10100110_{2} represents
the number corresponding to the sum (expressed in decimal numerals)

which comes out (in decimal) to 166.

In general, to translate a binary numeral into its decimal equivalent, do exactly as we did in arriving at 166 in the above example: simply add up the weights of the columns in which the binary numeral contains 1's.

Translating from decimal to binary is only a little more difficult.
Perhaps the most intuitively appealing approach is to
find the powers of two that sum up to the desired number.
We illustrate this with an example: Suppose that we want to express
the number 75 (here expressed in decimal notation, as usual)
in binary notation.
First find the largest power of two that is less than or equal to 75.
That would be 64 (or 2^{6}), because the next higher power of two
is 128, which is too big.
As 75 − 64 = 11, it remains to find powers of two that sum to 11.
Following the same technique as before, find the largest power of two
no greater than 11. That would be 8 (or 2^{3}).
As 11 − 8 = 3, it remains to find powers of two summing to 3.
The largest power of two no greater than 3 is 2 (or 2^{1}).
As 3 − 2 = 1, it remains to find powers of two summing to 1. The
largest power of two no greater than 1 is 1 (or 2^{0}).
As 1 − 1 = 0, we are done. What we have determined is that 75 can be
written as the sum of powers of two as follows:

which is to say that the binary representation of 75 has 1's in the 64's, 8's, 2's and 1's columns and 0's in every other column. Omitting leading 0's (in the columns with weights greater than 64), this yields

column weights: | 64 |
32 |
16 |
8 |
4 |
2 |
1 |
---|---|---|---|---|---|---|---|

sequence of (binary) digits: | 1 | 0 | 0 | 1 | 0 | 1 | 1 |

That is, the binary numeral we seek is 1001011_{(2)}.

Here is an example:

¹ ¹ ¹ ¹ 1 1 0 1 1 0 + 0 1 0 1 1 0 ------------- 1 0 0 1 1 0 0 |

column weights | 64 | 32 | 16 |
8 | 4 | 2 |
1 |
---|---|---|---|---|---|---|---|

carry | _{1} |
_{1} |
_{1} |
_{1} |
|||

addend #1 digits | 1 | 1 | 0 | 1 | 1 | 0 | |

addend #2 digits | 0 | 1 | 0 | 1 | 1 | 0 | |

sum | 1 | 0 | 0 | 1 | 1 | 0 | 0 |

Just as in decimal addition, we work from least significant digit towards most significant, or right-to-left. In the 1's column, we have 0+0 = 0, so we record a 0 in that column in the result, and we carry zero to the 2's column.

In the 2's column, we have 0+1+1 = 2 (the zero corresponding to the
incoming carry). But 2_{(10)} = 10_{(2)}. Hence, we
record the 0 in the result and carry a 1. (This is analogous to,
in decimal, having a column with, say 8 and 6 in it, which yields 14,
so we record the 4 and carry the 1.)

In the 4's column, we have 1+1+1=3, or 11_{(2)}. Hence, we
record the 1 in the result and carry a 1.

We leave it to the reader to make sense of what happened in the 8's and 16's columns.

In the 32's column, we have 1+1+0, which yields 10_{(2)},
so we record 0 and carry 1 to the next column. As the 64's column does
not exist in the two addends, implicitly the bits there are both 0.
Hence, in the 64's column we have 1+0+0 = 1 = 01_{(2)},
so we record the 1 and carry a 0.
Obviously, all remaining columns to the left will have 0's, so we are done.

So far, our discussion has included only natural numbers, i.e., nonnegative integers. Obviously, we would like to be able to encode (and perform arithmetic upon) negative integers, too.

Our standard way of writing a decimal numeral representing a negative number
is to place a minus sign in front of its digits. For example, we read
−53 as "negative fifty-three".
We typically write positive fifty-three as 53, with no sign,
but if we want to emphasize that it is positive, we could write it
as +53.
The point is that every decimal numeral begins,
either implicitly or explicitly, with a symbol indicating its sign,
which is followed by a sequence of digits that represent its magnitude
(i.e., a "distance" from zero). We could reasonbly call this the
**sign-magnitude** representation scheme.

As there are two signs, + and −, a very natural way to
incorporate the notion of a sign in a binary numeral is to use a single bit
to encode it. For example, we could encode + by 0 and −
by 1. If we further decide to place the sign first (i.e., use the first bit
to encode the sign), then, for example, the binary numeral `110110`
would represent `-22`. (The 1 in the first bit indicates that
the number is negative; the other three 1's are in the 16, 4, and 2 columns,
and so yield a magnitude of 22.)

The sign-magnitude approach may be the most natural for humans, but it
turns out that an alternative scheme, called **two's complement**,
is what most computers use. Under this scheme, the weight (or place value)
of the most significant bit is negative. For example, suppose we have an
8-bit numeral. Then the column weights are as usual, except that the
weight associated to the leftmost column is `-(2 ^{7})`
rather than

In the case of our unsigned binary numeral encoding scheme (the first one
discussed above), the range of integer values that can be represented
goes from 0 (using the bit string `00000000` of eight zero's) up to
255 (using the bit string `11111111` of eight one's).

With the two's complement scheme, the range goes from
−128 (using the bit string `10000000`) to
+127 (using `01111111`).

Using the sign-magnitude approach, the range goes from
−127 (using `11111111`) to
+127 (using `01111111`).
It's interesting that this range has only 255 distinct values in it,
rather than 256. The reason? Because zero has two different
representations, `00000000` (i.e., +0) and `10000000`
(i.e., −0)!

The larger point being made here is that, regardless of how many bits are
chosen as being the "standard size" for representing integers (or any other
type of data), the set of values that is encodable inside any
fixed-length chunk of storage is finite.
Hence, if the (accurate) result of some particular computation is outside
this set, the result that actually gets stored will be in error.
For example, if we are working in the realm of 8-bit numerals represented
using the 2's complement scheme and we try to add 95
(`01011111 _{(2)}`) and 67 (

The main point to remember is that the results produced by computations involving real numbers (stored in fixed-length chunks of memory) are (generally speaking) only approximations and should not be interpreted as providing exact answers.

A discussion omitted for now, except to point out that, among several
standards that exist, the one most widely used is probably
ASCII
(American Standard Code for Information Interchange). The ASCII
standard simply assigns to each of 128 distinct characters a
distinct code in the form of a bit string of length seven.
(Note that 2^{7} is 128, not accidentally.)
Among the 128 characters found in ASCII are those you would expect:
upper and lower case (Roman) letters (52 of them), the ten digits
(i.e., 0,1,2,...9),
several punctuation characters (period, comma, semicolon, etc.),
and several special characters (e.g., parentheses, ampersand, asterisk,
dollar sign, etc.). Also included are about thirty "characters" that
are not characters as most people would think of them; rather, they
are intended to be used as codes for computers or other devices
(e.g., printers) that deal with textual data. An example is the
"carriage return" character, which is used to signal a printing
device that it should move to the beginning of the line before
continuing.

Extended ASCII
extends regular ASCII by using an eighth bit, thereby resulting in a
coding scheme for 256 (2^{8}) different characters.

In recent years, in an attempt to create a character encoding standard
that acknowledges the existence of the non-English-speaking world
by including characters found in the various alphabets that they use
(e.g., Hebrew, Greek, Russian, etc.),
the Unicode standard
has been introduced. Due to the large number of characters it seeks
to include, Unicode specifies a 16-bit code for each character.
This gives it the capability of accommodating 2^{16} (65536)
different characters!
(This is actually an over-simplification, but one that suffices for
our purposes.)

A digital image
can be viewed as a (typically, rectangular) grid of dots,
or **pixels**. ("Pixel" is a contraction for "picture element".)

Resolution
is a measure of how much detail an image holds, but exactly what it means
depends upon context.
**Pixel resolution** describes the size of an image in terms of
its width (number of columns) and height (number of rows).
For example, 1024 × 768 is a common resolution for computer
monitors, which is to say that such monitors have 1024 columns and
768 rows of pixels.
**Spatial resolution** describes how densely packed the pixels are,
and is usually expressed in terms of **pixels per inch** (ppi)
(or **dots per inch** (dpi)).
(To use such a measure, rather than pixels per square inch, would seem
to imply that the density is the same along the rows and along the columns.)
It is this quality that, practically speaking, determines the clarity
of an image. In 2010, computer monitors typically had a spatial
resolution of between 72 and 100 ppi.

In a binary image
(also called a **black-and-white** or **bi-level** image),
each pixel is either black or white. Some devices, including fax machines
and some laser printers, can handle only bi-level images.
As each pixel's appearance can be characterized by one of only two
possible values (black or white), the obvious way to represent a single
pixel is with a single bit, where 0 represents black and 1 represents
white (or vice versa). (Recall the image of the tiger shown in class.)

In a grayscale image, each pixel is of some shade of gray ranging from the darkest, black, to the lightest, white. Hence, a black-and-white image (as discussed immediately above) is just a special case of a grayscale image in which there are only two shades of gray, black and white. However, when one talks of a grayscale image, by implication one usually means an image in which there are more possible shades. Some early computer monitors were capable of displaying any of sixteen shades of gray, for example.

What are commonly referred to as black-and-white photographs are really grayscale images. In such photographs, it is typical for there to be any of 256 possible shades of gray. In some applications, including medical imaging (where it is important for the image to be very detailed and precise), the number of possible shades of gray exceeds one thousand (1024, say, or 4096).

It's no accident that the number of possible shades of gray in the
examples above are powers of two! Note that 16 = 2^{4},
256 = 2^{8}, and 1024 = 2^{10}. Hence, in an image
in which each pixel can be any of 16 shades of gray, the obvious
way to represent each pixel is using 4 bits (i.e., a half-byte).
Interpreted as an unsigned integer, a bit string of length four
represents an integer value in the range 0..15. The standard approach
is for 0 to represent black (the darkest shade) and for 15 to represent
white (the lightest shade), with the numbers in between representing
increasingly lighter shades, as we go from 1 to 14.
In an analogous fashion, each pixel in an image allowing any of 256
shades would be represented by a bit string of length eight (i.e., a
byte) representing in integer in the range 0..255.

In color images, each pixel has a color. Following the RGB color model, in which red, green, and blue are the primary colors, each pixel's appearance can be described by an RGB triple that describes the intensities of red, green, and blue, respectively, present in that pixel. One standard representation, called truecolor, uses 24 bits to store the RGB value of each pixel, eight bits for each of the three components (which, of course, are viewed as integers in the range 0..255). Each cell in the table below is labeled with the RGB value of its background color.

255,0,0 | 255,127,0 | 255,255,0 | 255,127,127 | 255,255,127 | 255,0,127 |

0,255,0 | 127,255,0 | 255,0,255 | 127,255,127 | 32,32,32 | 127,127,127 |

0,0,255 | 127,0,255 | 127,127,255 | 0,127,127 | 0,0,127 | 255,255,255 |

If you want to view lots of examples of colors and see how they are represented in RGB, click here.

So far we've talked about how individual pixels are represented.
What about an image as a whole?
Remember, an image is just a two-dimensional grid of pixels, or
rows and columns of pixels. To encode an image as a whole, we
can "linearize" the two-dimensional grid into a sequence of pixels
by, for example, starting with the first row of pixels, then moving
to the second, and then to the third, etc.
For example, consider the 5 × 5 table below, which is supposed
to illustrate an image with five rows and five columns of pixels.
(The image forms a somewhat crude upper case **N**.)

0111000110010100110001110 ^ ^ ^ ^ ^ |

(The caret symbols indicate the last bit of the representation of each row of pixels.)

A compression technique is said to be **lossless** if it can be
reversed, meaning that data compressed using that technique can be
decompressed to recover the original representation.
A compression technique is said to be **lossy** if, in general,
it cannot be reversed, which is to say that decompression will
yield something close to the original representation, but (probably)
not matching it exactly.
Because the human vision system has only a certain degree
of sensitivity, and hence cannot distinguish two images that differ
only in subtle ways, most compression techniques that are used for
digital images are lossy. The same is true for representations of
audio (e.g., music). In contrast, to use lossy compression on
numeric or textual data could be disastrous, because, for most applications,
it is imperative that that kind of data be recoverable in exact form.

Different compression techniques have led to the existence of several image file formats that are in common use, some of which you have probably heard of, including JPEG, TIFF, and GIF. Each one has its strengths and weaknesses. Digital images include photographs, cartoons, diagrams, and other varieties. Some image file formats are better for one kind of image than another.

Omitted for now.