CMPS 260 Spring 2020
Prog. Assg. #1: Regular Expressions
Due Date: April 16

The Relevant Java Components

The purpose of this assignment is to solidify your understanding of regular expressions. Provided are a Java interface and eleven Java classes, three of which are incomplete. Instances of those three classes represent composite regular expressions whose main operators are union, concatenation, and Kleene/star closure, respectively.

The remaining Java classes are provided in full.


Review of Regular Expressions

You will recall that regular expressions come in these varieties:


Deciding Membership

Deciding whether a given string is a member of the language described by an atomic regular expression is straighforward. As for the composite regular expressions:


Treating "words" as Atomic Regular Expressions

Technically, the regular expression abbab is an abbreviation for a·b·b·a·b, which is itself an abbreviation for (((a·b)·b)·a)·b. For the sake of convenience (and efficiency), we treat regular expressions such as abbab as being atomic. Such expressions are modeled by the class RegExprWord. From a user's point of view, the main consequence of this is that, for example, abbab* is equivalent to (abbab)* rather than to abba · b*. Because, by convention, the star operator has higher precedence than concatenation (which itself has higher precedence than union), the normal interpretation would be the latter rather than the former. One way to look at it is that, in our system, implicit concatenation has higher precedence than star. For the sake of avoiding ambiguity, the image of a regular expression, as produced by the toString() method in the relevant class (i.e., RegExprStar), will include parentheses surrounding any sub-expression of length two or more to which the star operator applies.

Dealing with ASCII Limitations

Given that the ASCII alphabet does not support superscripts or symbols such as ∅, λ, or ·, for the purpose of entering regular expressions at the keyboard and viewing them on a console window, we have to use substitutes. The ones chosen are reflected by the symbolic constants defined in the RegExprSymbols class. In particular,


Submission of Work

All that you should submit are the three .java files corresponding to the classes RegExprUnion, RegExprConcat, and RegExprStar. Use the file submission system, to which there is a link on the course web page. Make sure that you use comments to list names of people with whom you collaborated and to acknowledge any defects that you have identified.

Sample Dialog with RegExprApp

What follows is a "transcript" of an interaction between a user and the RegExprApp application. The first thing that the program does is to print the "help page", which lists all the commands that the program can respond to. All user input appears to the right of the > prompt.

Commands:
---------
   q: to quit.
   h: for this list.
   n <regular expression>: to establish a new regular expression.
     Example: n aba.(ba)* + bba
   d: to display the current regular expression
   m <string>: to test string for membership
     Example: m bbabaab
   s: to display stats about the current regular expression.
   g [seed]: to display a random member of the language.
   r: to display reverse of current rexpr.

> n (ab + bb*) . bab + a*
New regular expression is ((ab + (bb)*).bab + a*)

> s
Image: ((ab + (bb)*).bab + a*)
Shortest member has length 0
Has infinitely many members.

> g
Random member: |aaa|

> g
Random member: |bbbbbbbbbab|

> m bbbbbab
The string |bbbbbab| is a member.

> m
The string || is a member.

> n (aa + ba*)*
New regular expression is ((aa + (ba)*))*

> g
Random member: |aabaaa|

> m aabaaaa 
The string |aabaaaa| is NOT a member.

> r
Reverse is ((aa + (ab)*))*

> n (aa + b.a*)*
New regular expression is ((aa + b.a*))*

> g
Random member: |aabaaaa|

> g
Random member: |aa|

> g
Random member: |baabaaaaaa|

> s
Image: ((aa + b.a*))*
Shortest member has length 0
Has infinitely many members.

> m aabaaaaaabaaabbbbbaa
The string |aabaaaaaabaaabbbbbaa| is a member.

> n (a.(L + b))* . cca
New regular expression is (a.(L + b))*.cca

> g 27
Random member: |aaabcca|

> s
Image: (a.(L + b))*.cca
Shortest member has length 3
Has infinitely many members.

> n (ab + N) . cc
New regular expression is (ab + N).cc

> s
Image: (ab + N).cc
Shortest member has length 4
Longest member has length 4

> n N
New regular expression is N

> s
Image: N
Has no members.

> h
Commands:
---------
   q: to quit.
   h: for this list.
   n : to establish a new regular expression.
     Example: n aba.(ba)* + bba
   d: to display the current regular expression
   m : to test string for membership
     Example: m bbabaab
   s: to display stats about the current regular expression.
   g [seed]: to display a random member of the language.
   r: to display reverse of current rexpr.

> q
Goodbye.