CMPS 144 Spring 2020
Prog. Assg. #2: Arithmetic/Boolean Expression Scanner
Due: 11:59pm, Friday, March 6

For this assignment, we are interested in expressions that are composed of five kinds of elements, or tokens:

  1. left parenthesis: (
  2. right parenthesis: )
  3. operator:
  4. integer literal: A non-empty and maximal1 sequence of the digit characters ('0'..'9').
  5. identifier: A non-empty and maximal2 sequence of letters (in either 'A'..'Z' or 'a'...'z').

The process of identifying the substrings that correspond to meaningful units within a given string is commonly referred to as scanning, and an agent with the ability to carry out this process is called a scanner. These meaningful units are often referred to as tokens.3 For example, an instance of the java.util.Scanner class identifies as a token within a String any non-empty maximal substring that contains no whitespace characters and is preceded and followed by whitespace characters (or occurs at the beginning or end of the string).

If you were developing a program that analyzed strings purported to be expressions, it would be useful to have as a tool an object that identified the relevant elements within a given string. Such an object could reasonably be called an expression scanner. The Java class ExpressionScanner, whose design was influenced by the java.util.Scanner class, is intended to provide such objects. Your task is to complete it so that it works as intended (as described in the comments preceding each method). As given, several of its methods are only stubs4 (and are marked as such by comments). (Also included are a few comments that provide "suggestions" that you may ignore, if you wish.)

What makes this task non-trivial is that, unlike the FPAE's that you worked with in a recent lab, here we are not assuming that the tokens in an expression are separated from each other by spaces. That is, two consecutive tokens could possibly occur with no spaces between them. Thus, given the (nonsense) string

234+7   Aardvark−*!=< ) 46(

a scanner of arithmetic expressions should be able to identify the tokens within it, namely: 234, +, 7, Aardvark, , *, !=, <, ), 46, and (, in that order.

Note that a scanner is not responsible for verifying the syntactic validity of an expression; rather, it simply identifies the tokens within it, going from left to right. Typically, a parser makes use of a scanner and is responsible for determining whether or not an expression is syntactically correct (and, in the case that it is, converting it to another form, often a tree).


Program Submission

Submit your completed ExpressionScanner.java file into the prog2_dir folder. Make sure to complete the comments indicated near the top of the source code that has been provided. In particular, you should put your name and list the names of anyone who aided/collaborated with you in doing the work. Also, you are to describe any behavioral defects that your class has (e.g., characterizations of test cases that it fails). If there are defects of which you are unaware, it probably means that you did a poor job of testing your work. Thus, of two submissions having similar defects, one in which those defects are acknowledged deserves a better grade than one in which they are not acknowledged.


Footnotes

[1] By a maximal sequence of digit characters we mean one that, in the context of the expression in which it lies, is neither immediately preceded nor immediately followed by another digit character. For example, in the string 473 - 2, the integer literals are 473 and 2, but none of 4, 47, 73, or 3 occurs as an integer literal.

[2] By a maximal sequence of letters we mean one that, in the context of the expression in which it lies, is neither immediately preceded nor immediately followed by another letter. For example, in the (nonsensical) string 2+dog52+Moose<= 2, the identifiers are dog and Moose, but none of do, g, oos, or Moo occurs as an identifier.

[3] Technically, the word token refers to a type (or category) of element that can occur in an expression (e.g., left parenthesis, relational operator, integer literal). An occurrence of a substring falling into a particular category (e.g., "573", "<=") is rightly called a lexeme.

[4] The term stub is often used to refer to a method that basically doesn't do anything but is intended to be completed later. In the case of a functional method (i.e., one that returns a value), it is necessary to put a return statement in its body in order to make it syntactically correct.