CMPS 260
Notes on Chapter 1 of Webber

In this course, we study foundational concepts of theoretical computer science that serve as the basis for the study of computability theory and complexity theory.

Computability theory seeks to distinguish those computational problems having algorithmic solutions from those that do not. (You may be surprised to learn that the vast majority of computational problems are in the latter category, a fact that can be proven using a counting argument.)

Perhaps the most famous uncomputable problem is the Halting Problem, which goes like this:

Given a (say, Java) program P and a string x, determine whether or not, if x were provided as input to P, P would eventually halt (as opposed to going into an infinite loop).

Complexity theory seeks to classify solvable computational problems according to how many time and/or space resources are needed by their algorithmic solutions (as a function of input size). For example, it is known that no (comparison-based, sequential) sorting algorithm can run "faster" than in O(N· log N) time, where N is the size of the collection of elements to be sorted.

Perhaps the most well-known question in complexity theory asks whether P = NP is true, P (respectively, NP) being the set of all computational problems solvable by a deterministic (respectively, non-deterministic) algorithm in polynomial time (i.e., in time O(N^c) for some c). Loosely speaking, a non-deterministic algorithm is one that can make "guesses" during execution, with all guesses it makes assumed to be "good" ones.

Chapter 1 of Webber's book introduces the basic fundamentals, which are

Symbol
Alphabet
String
Language

Symbol

A symbol is an atomic unit, usually denoted by a character (e.g., 1, b, $). Often, a symbol is "uninterpreted", meaning that no particular meaning is ascribed to it.

Alphabet

An alphabet is a finite set of symbols. Examples:

{0,1}
{0,1,2,...,9}
{a,b,c}
{a,b,c, ..., z, A, B, ..., Z}
set of values of type char in Java

It is common to use the symbol Σ (upper case Greek letter sigma) as the name of an alphabet. Thus, we might say something like "Let Σ be {0,1}."

String

A string is a finite sequence of symbols (over some alphabet). Examples:

0110110
abbabbba
λ (the metasymbol used to denote the empty string)
Note that Webber (and some other authors) uses ε rather than λ.

Some conventions:

Unlike in program source code, we rarely enclose literal strings inside quotes.
We typically use lower case letters near the end of the roman alphabet (e.g., u,v,x,y,z) (possibly with subscripts) to name strings. For example, we might say something like Let x = abbab.
To denote the length of a string, we use vertical bars (similar to how they are used with numbers to denote absolute value).
Examples: |abbba| = 5, |λ| = 0.

Concatenation: a binary operation on strings ( that maps a pair of strings to a new string). Example: abb · baaa = abbbaaa. When one or both operands are names of strings (rather than literals), we often omit the operator symbol, so we would write xy, for example, rather than x·y.

Concatenation has λ as its identity element, because xλ = λx = x.

Concatenation is associative, meaning that, for all strings x, y, and z, (xy)z = x(yz). Hence, we usually omit the parentheses, as both interpretations yield the same result.

Repeated concatenation: another operation on strings. It maps a string and a natural number to a string. By convention, the number appears as an exponent.
For n≥0, xⁿ denotes x·x·x·...x (where x is repeated n times). Thus, we have x⁰ = λ, x¹ = x, x² = xx, x³ = xxx, etc.

A rigorous recursive definition is

x⁰ = λ
xⁿ⁺¹ = xⁿ · x

Language

A language is a set of strings (over some alphabet). Note that it is not restricted to be finite. In fact, most interesting languages are not finite.

To denote the set of all strings over the alphabet Σ, we use the notation Σ^*. This is an example of the use of what is called Kleene Closure (or Kleene Star), named for Stephen Kleene, one of the pioneers in the study of formal languages.

Example: {0,1}^* denotes the set of all bit strings.

Among the ways of describing a language are by enumeration and by set comprehension. Webber uses the (non-standard) term set former for the latter.

To describe a language via enumeration, we simply enumerate (i.e., list) all its members, as in { ab, bbaa, babbaa }.

This works only for finite languages, of course. When the intent is for the enumeration to be read by a human (as opposed to a computer), it works for only very small finite languages.

Set formers describe a language by specifying the characteristics/properties of the strings that are its members.

Examples:

{x ∈ {0,1}^* | |x| < 4 } is the language whose members are all the bit strings of length less than four.
{x ∈ {0,1}^* | x = 0y11 for some y } is the language whose members are all bit strings that begin with 0 and end with a pair of 1's.
{xabby | x,y ∈ {a,b}^*} is the language over {a,b} containing all those strings in which abb occurs as a substring.
{aⁿbⁿ | n≥0 } is the language whose members are λ, ab, aabb, aaabbb, etc.
{x ∈ {a,b}^* | #_a(x) < #_b(x)} is the language consisting of all strings over {a,b} having fewer occurrences of a than of b. (The notation #_a(z) denotes the number of occurrences of a in z, and similarly for b.)

CMPS 260 Notes on Chapter 1 of Webber

Symbol

Alphabet

String

Language

CMPS 260
Notes on Chapter 1 of Webber