CMPS 134   Fall 2022
Prog. Assg. #7: Lexicon Application
Due: 11:59pm, Monday, Dec. 12

For the purposes of this assignment, a lexicon is defined to be that collection of words appearing in some particular text file. We leave the term word undefined!

Given to you is the Java application LexiconApp, which builds a lexicon of the words appearing in the text file whose name is provided via the first "run argument" (i.e., args[0], where args is the formal parameter of the main() method). The second run argument (i.e., args[1]) specifies the fixed capacity of the lexicon (i.e., the maximum number of words that can be placed therein). The lexicon is represented by an array words[] whose elements are of type String.

During its first phase, the program scans the input file and builds the lexicon. It then reports

  1. how many word occurrences are in the file,
  2. how many words were placed into the lexicon, and
  3. how many words that occur in the file were not placed into the lexicon, due to its capacity already having been reached.

To clarify the distinction between a word and a word occurrence, in the sentence

Among the moon, the stars, and the sun, I prefer the sun.
there are twelve word occurrences but only eight (distinct) words, due to the occurring four times and sun occurring twice. A complete lexicon for this sentence would thus have eight entries, one for each of the eight distinct words that occurs in it.

Next the program displays the words in the lexicon (by printing the values in the elements of words[] in which they are stored).

The program then enters a "query phase" during which (repeatedly) the user is prompted to enter a word, after which the program reports whether or not the entered word is in the lexicon.

A sample user/program dialog appears below.


The Assignment

The LexiconApp program can be improved in several ways, including those outlined below. You are to go as far as you can to make those improvements.

  1. Upon executing the given program with the provided sample data file, you will notice that some of the "words" placed into the lexicon include characters (such as commas, periods, and parentheses) that we normally do not consider to be part of a word. For example, if
    Among the moon, the stars, and the (hot) sun, I prefer the sun.
    were an excerpt from the file, among the words placed into the lexicon would be these:

    "moon,"     "stars,"     "(hot)"     "sun,"     "sun."

    That's because the program relies upon the next() method of a Scanner object to "retrieve" each occurrence of a word in the input file, but what that method actually retrieves is each maximal sequence of non-whitespace characters!

    Using an instance of the WordScanner class will not solve this problem entirely, but its next() method does a better job of filtering out non-word characters. (Making this change is very easy.)

  2. Upon examining the source code of LexiconApp, you should have been horrified by the fact that it consists of only a main() method, suggesting that its author, irresponsibly, made no attempt to apply the concept of procedural decomposition/modularization in its design.

    You are encouraged to refactor the program to rectify this deficiency. This would certainly include introducing separate methods for carrying out various subtasks that the program performs, but it could also include introducing a new instance class, say Lexicon, an instance of which represents just that, a lexicon. All the details pertaining to how words are inserted into and retrieved from a lexicon would be housed there. (Of course, those details could involve the use of an array words[] in the same way as the application does now.) One can imagine that this class would have observer methods that answer questions such as "How many words are in the lexicon?" and "Is the word glorp in the lexicon?".

  3. Suppose that the customer for whom LexiconApp was originally developed asks for its behavior to be modified in two ways:


Work to Submit

You should submit a Java file LexiconApp.java that, when executed, behaves like the one that was provided, except for any of the suggested improvements you have implemented. In addition, submit any other relevant Java file that is needed by your application (e.g., Lexicon.java). However, you need not submit WordScanner.java.

Dialog

$ java LexiconApp fido.txt 20
Welcome to LexiconApp!
There are 64 occurrences of words in the file.
20 words were placed into the lexicon.
37 other words occurred in the file but were not placed into the lexicon.

Press ENTER to continue >:

  0: "Once"
  1: "upon"
  2: "a"
  3: "time,"
  4: "we"
  5: "had"
  6: "dog"
  7: "named"
  8: "Fido."
  9: "Fido"
 10: "was"
 11: "generally"
 12: "very"
 13: "nice"
 14: "dog,"
 15: "but"
 16: "he"
 17: "did"
 18: "not"
 19: "like"

Query Phase: (empty string to quit)

Enter word:>the
NO

Enter word:>dog
YES

Enter word:>
Goodbye.