Notes supplementing Reges/Stepp Chapter 1

CMPS 134: Notes supplementing Reges/Stepp Chapter 1

1.1 Basic Computing Concepts

What is computer science? The name would suggest that it is the study of computers, but that's not quite right. As Edsger Dijkstra (1972 ACM Turing Award winner) put it,

Computer science is no more about computers than astronomy is about telescopes.

The point is that, just as telescopes serve as a tool for studying the nature of the cosmos (which is the subject of astronomy), so computers serve as a tool for studying the nature of computation. So perhaps computation science would be a better term.

Various definitions of computer science have been offered, including Wikipedia's:

The study of theoretical foundations of information and computation and their implementation and application in computer systems.

As with other academic disciplines, computer science has a number of sub-disciplines, including graphics, artificial intelligence/machine learning, databases, and networking.

At the center of the discipline, according to Donald E. Knuth (1974 ACM Turing Award winner), is algorithmic thinking, which is the kind of thinking needed for developing algorithms. An algorithm is a precisely stated list of instructions for carrying out some task (typically for the purpose of solving some problem).

Reges and Stepp argue that a good reason for studying programming is that it is the best way to come to an understanding of algorithmic thinking.

Examples of algorithms include a recipe for making apple pie, instructions for assembling a piece of exercise equipment, for tying a Windsor Knot (for a necktie), for putting on a pair of pants, or for walking from the parking pavilion to McGuirrin Hall.

Algorithms such as these are written in a natural language (e.g., English) and can be supplemented with pictures, videos, or diagrams. They are intended to be read/viewed, interpreted, and executed by human beings.

In contrast, the kind of algorithm that we will be focused on in this course, which is referred to as a computer program, is intended to be interpreted and executed by a computer, which, being a machine, is extremely stupid. (An algorithm that can be carried out by a machine, with no human intervention, is said to be effective.) Indeed, unlike humans (who have common sense and intelligence, and thus are often able to figure out instructions that are ambiguous, unclear, misleading, or even contradictory), a computer follows instructions quite blindly/literally, just as does a kitchen appliance, an automobile, or a chain saw. We will find that, as a consequence of this, developing computer programs that work exactly as intended tends to be more difficult than writing instructions for making an apple pie.

A computer program, then, is an organized set of instructions that is intended to be interpreted and executed by a computer.

Knuth, recalling the cliche

You don't really understand how to do something until you've taught someone else how to do it.

noted that it would be more accurate to say

You don't really understand how to do something until you've "taught" a computer how to do it (meaning that you've written an algorithm by which to do it).

Having tried above to provide a sense of what computer science is, Reges and Stepp next try to address the question What is a computer?

Depending upon how broadly the term is defined, lots of devices can be classified as computers, including cell phones, calculators, GPS systems, and even some children's toys, because all of them carry out instructions that process data.

A universal computer is one that is programmable in one or more languages in which any effective algorithm can be expressed. In the 1930's, several mathematicians described different kinds of "theoretical" computing devices (including the Turing Machine), all of which turned out to be equally "powerful", meaning (essentially) that any algorithm expressible/executable on any one of them was also expressible/executable on any of the others. Because no one has yet described a machine with greater capability, any device with computing power equal to a Turing Machine is said to be universal. Today's PC's and laptop computers are universal, and these are the kinds of computers that we will use in this course.

Hardware and Software: A computer system includes various physical devices collectively referred to as hardware. These devices serve different purposes, but all contribute to the goal of executing computer programs, which are referred to collectively as software. In contrast to hardware, which is physically tangible, software is not. One can draw an analogy to, say, an apple pie recipe. The paper and ink used in rendering the recipe's instructions have physical existence, as does the person who carries out the recipe, but the essence of the recipe is the information it conveys, which is intangible, and not the medium (paper, ink) on which it is stored. Similarly, software is physically encoded on a storage device (e.g., hard disk or flash drive) and is executed by a processor, but its essence is found in the (intangible) instructions of which it is composed.

Among the hardware devices comprising a computer system are

central processing unit (CPU): it executes instructions
(primary) memory (RAM): where data and programs are stored while they are in use (short term)
(secondary) storage (e.g., magnetic disk (hard disk), optical disk (CD, DVD), flash storage, magnetic tape): where data and programs are stored (long term)
input devices: keyboard, mouse, microphone, game controller
output devices: monitor (screen), printer, speaker, actuator (e.g., for robot control)

Each "piece" of software is typically classified as being in one of two categories: system or application. The former refers to software that provides the foundation making it possible for the latter to run.

System software includes utilities, such as the File Explorer and the various programs accessible via the Control Panel (in Windows). It also includes the operating system (e.g., Microsoft Windows 10, Mac OS X, Linux), which is the program that manages the entire operation of the machine.

Application software performs tasks of direct interest to users (e.g., game playing, word processing, web browsing, producing payroll reports).

The Digital Realm: In a digital computer, all data is stored in the form of digits. In particular, modern digital computers, for reasons having to do with engineering concerns (e.g., cost and reliability), encode data using (physical manifestations of) binary digits, or bits. A binary digit can be either a 0 or a 1. (Analogously, a decimal digit is any of 0, 1, 2, ..., or 9.) That is, we can think of all data as being encoded by sequences of 0's and 1's. For example, the character A is encoded as 01000001 under the the widely-used ASCII character-encoding standard. And the integer 53 is encoded as 00110101 using standard base-2 representation.

Because RAM is typically organized as groups of bits of length eight, called bytes, we use the byte as a unit of measure of storage usage/capacity. To refer to quantities in the thousands, millions, billions, etc., of bytes, we use the prefixes kilo (2¹⁰, or 1024), mega (2²⁰, or about a million), giga (2³⁰, or about a billion), tera (2⁴⁰, or about a trillion). peta (2⁵⁰, or about a quadrillion). (See Table 1.2, page 6.)

Programs and Programming: A (computer) program is an organized set of instructions, written in a particular programming language, that can be executed by a digital computer.

What is meant by "programming language"?: A language has syntax and semantics.

A language's syntax is the set of rules that govern whether a given expression is valid (i.e., of correct form).
A language's semantics determines the meaning of each valid expression.

English's syntax, for example, dictates that none of the following are valid sentences:

Cat big fluffy. (lacks a verb)
Ran loose jump. (lacks a subject)
He own a cat. (singular subject, plural verb)
Mary hit he. (direct object is in nominative case rather than objective case)

As for semantics, consider the sentence

Mary threw the round ball.

The meaning of a sentence depends upon the words that appear in it, what those words refer to, the order in which those words occur, and, more generally, the structure of the sentence. Here, two entities are involved, referred to by "Mary" and "ball", and one action, referred to by "threw". Also, one quality/characteristic is referred to by "round".

Who did the throwing? Mary, not the ball.
What was thrown? The ball, not Mary.
When did it happen? In the past, according to the verb's tense.
What is round in shape? The ball, not Mary.

The good news is that the syntax of a typical programming language is much simpler than the syntax of English (or other natural languages). Hence, learning the syntax of Java (the programming language that we will be using in this course) is easier than learning the syntax of English. (On the other hand, people spend years learning English while you have only a couple of months to learn Java!)

The bad news is that a Java program will not work (indeed, the computer will "refuse" to execute it) if it includes even one syntax error! Indeed, computers are very intolerant of syntax errors. This is probably in contrast to English teachers that you had in high school, who may have penalized you for such errors but not rejected your work outright.

As for semantics, it would be reasonable to identify a program's "meaning" with the input-to-output mapping that all its possible executions produce.

1.2 Java

Why do we use Java in this course?: During the past twenty or so years, Java has become a very popular programming language, not only in the software industry but also in introductory programming courses. In particular, it is object-oriented, which is currently in favor. (Later in the course, that term will become meaningful.)

As a first example of a Java program, we give you the following:

public class HelloWorld {

   public static void main(String[] args) {
      System.out.println("Hello, World!");
   }

}

This program, when executed, simply displays the message Hello, World!, as directed by the System.out.println statement. Indeed, the lines that "introduce" the class HelloWorld and the method called main don't "do anything" except to provide structural information.

Hierarchical structure of a Java program: The units of a Java program can be described, loosely, as follows:

Every Java program is composed of one or more classes, each of which is stored in a file whose name matches the class's name (and has a .java extension). Hence, the class shown above would properly be stored in a file called HelloWorld.java.
Every class includes zero or more methods, each of which can be "called" (or "invoked"). Our HelloWorld program has one method, the name of which is main.
Every method includes zero or more statements, each of which directs the computer to take some particular action or to carry out some operation. In HelloWorld, the lone statement in the lone method is System.out.println(...), which causes a string of characters to be displayed.

One can make an obvious analogy with a book, which is composed of chapters, each of which is composed of paragraphs, each of which is composed of sentences, etc.

In addition to the rules above, at least one class in a Java application must include a method called main whose heading closely matches the one in the HelloWorld class. It is this method that serves as a program's "entry point", meaning the place where execution begins. (Even better, you can think of program execution as beginning with a call to the main method.)

Syntactically, a class begins with a header, followed by its body, enclosed between curly braces. The same is true for methods.

For the moment, don't worry about the meaning of keywords such as public, static, or void. In due time, these will come to make sense to you.

Formatting: Notice how the code in HelloWorld is formatted. Our intent is to make a class's code as easy to read (for a human) as possible by providing visual cues (e.g., indentation, similar to a traditional outline) that suggest the structure of the class. Different programmers have different preferences regarding formatting. The HelloWorld class above is formatted in the style of Kernighan & Ritchie. Another popular style would have it look like this:

public class HelloWorld
{
   public static void main(String[] args)
   {
      System.out.println("Hello, World!");
   }
}

Here, the curly braces that begin and end the body of a class or method are lined up in the same column. Notice, however, that the heading of a method is still indented with respect to the class's heading, and each statement in a method is indented with respect to the method's heading. As in a traditional outline, indentation is used to show, for each component, within which other component it is nested.

Carefully formatting code is strictly for ease of reading (by humans). From the point of view of the Java compiler, the following version of HelloWorld is identical to those above:

   public class HelloWorld { public    static void main(String[]
 args) { System.out.println
 ("Hello, World!"); }
      }

Indeed, we could put the entire program on one line, if we wished.

Comments: Aside from formatting, another technique that programmers use to help a reader to understand code more easily is to include comments. Indeed, comments are included in code to provide such information as its author(s), the date(s) on which it was written/modified, and known flaws, as well as both to document its intended behavior and, in some cases, to explain how that behavior is achieved.

The Java compiler "ignores" comments, which is to say that a program's behavior does not change as a result of adding or removing comments.

In Java, comments come in two varieties. A "rest-of-line" comment begins with two slashes (//) and extends to the end of the line. The other kind begins with /* and extends to the next occurrence of */, which may be on the same line or any number of lines later. Here is an augmented version of HelloWorld with comments, which are shown in red:

/* Author: J. Programmer
** Date Written: February 31, 1995
** Purpose: Displays the message "Hello, World!"
*/
public class HelloWorld {
   public static void main(String[] args) {
      System.out.println("Hello, World!");  // displays the message
   }
}

Identifiers and Keywords: In the Java language, certain words have specific meanings and are reserved to be used only to convey that meaning in particular contexts. These are called keywords (or, in some languages, reserved words). Examples include public, class, and void. (See the HelloWorld program for instances of these keywords in context.) (A list of (all?) 50 Java keywords appears on page 20 of the textbook.)

Other words appearing in a program are introduced by a programmer for the purpose of naming things, such as a class, a method, or a variable. Such words are called identifiers. (jGrasp, the IDE used by the instructor, shows keywords in purple and identifiers in black.)

By Java's syntax rules, an identifier must be composed of letters, underscores ( _ ), dollar signs ($), and decimal digits (i.e., any of 0, 1, ..., 9), but cannot begin with a digit. Unlike many programming languages, Java is case sensitive, meaning, for example, that it recognizes myDog and MydOg as being distinct identifiers due to the fact that, even though the letters match, not all of them agree in case.

Certain conventions (i.e., common practices that are intended to heighten uniformity and thereby to aid understanding) exist regarding the form of identifiers. Specifically, classes are usually named by a sequence of words, each of which has only its first letter capitalized. Identifiers for methods (and variables) look similar, except that the first word is not capitalized. This is sometimes called "camel case" (because the outline of an identifier looks like a camel, where the upper case letters are the humps).

Examples: HelloWorld is an identifier naming a class, while doSomethingStupid is an identifier naming a method.

Exploring the println and print statements: The reader is encouraged to take the HelloWorld program, add one or two new println statements (to print other silly messages), and to run the program. (Note: Actually, println statements are calls/invocations of the println() method associated to the object called out in the library class System.) Observe that each invocation displays a message on a different line.

Now change the first println statement to instead invoke print, and run the program. Notice that the two messages appear on the same line.

1.3 Program Errors:

Computer programs are notorious for being erroneous. Errors come in a few varieties:

syntax: analogous to a grammatical error in English.
logic: the program doesn't perform as expected; gives incorrect results.
runtime: Program execution aborts because some specified operation cannot be carried out. Potential causes are a logic error (e.g., that leads to an attempt to divide by zero) or invalid input data.

The textbook (pp. 25-27, 29) illustrates several different kinds of syntax errors (specifically, file name does not match class name, misspelled identifier (pruntln, system), missing semicolon, omitting a required keyword (e.g., class, void), forgetting to end a comment). Note that, although the compiler supplies a message for each error that it finds in a program, the messages are not always particularly helpful (especially to a beginner) in figuring out exactly what is wrong. Also, correcting the first error is usually the best strategy, because subsequent error messages often are due to the compiler getting "confused" by the first error.

To witness a runtime error, the reader is encouraged to download and execute the MeanProgram application and to supply invalid (i.e., non-numeric) input.

1.4 Procedural Decomposition:

Decomposition refers to the act of separating something into discernable parts, each of which is simpler than the whole.

With respect to problem-solving, decomposition is a technique by which a problem is split into smaller sub-problems in such a way that, by solving each of those sub-problems, we obtain a solution to the problem itself. This technique is known as Divide-and-Conquer.

A particular form of this problem-solving strategy that is useful in programming is procedural decomposition, which splits a complex task into a set of simpler sub-tasks. That is, it expresses a complex action as a sequence of simpler actions.

As an example, consider the task of driving from location A to location B. Suppose that you can drive from A to B by first driving from A to C, then from C to D, then from D to E, and, finally, from E to B. Then you have described the original task as a sequence of four sub-tasks.

The example described by Reges & Stepp is the task of baking a cake. Unlike our "driving from A to B" example, the sub-tasks they identify are not all of the same kind.

In the context of programming, there are at least two reasons for employing procedural decomposition:

It can make a program more understandable in that the individual units of code become smaller and hence conceptually simpler. This is analogous to why textbooks are split into separate chapters, sections, and paragraphs, forming a hierarchical structure.

It reduces code redundancy. Often in programming, the same small task needs to be carried out in multiple contexts. Rather than repeating, in each of those contexts, the "block of code" needed to carry out that task, we can instead put that block of code into a separate "subprogram" (in Java it's called a method) that can be invoked wherever it is needed.

Reges & Stepp address the issue of avoiding redundancy with their sequence of DrawBoxes programs and, in Section 1.5, the DrawFigures programs. The course web page has links to these programs.