For the purposes of this assignment, a lexicon is defined to be that collection of words appearing in some particular text file. We leave the term word undefined!
Given to you is the Java application LexiconApp, which builds a lexicon of the words appearing in the text file whose name is provided via the first "run argument" (i.e., args[0], where args is the formal parameter of the main() method). The second run argument (i.e., args[1]) specifies the fixed capacity of the lexicon (i.e., the maximum number of words that can be placed therein). The lexicon is represented by an array words[] whose elements are of type String.
During its first phase, the program scans the input file and builds the lexicon. It then reports
To clarify the distinction between a word and a word occurrence, in the sentence
Among the moon, the stars, and the sun, I prefer the sun.there are twelve word occurrences but only eight (distinct) words, due to the occurring four times and sun occurring twice. A complete lexicon for this sentence would thus have eight entries, one for each of the eight distinct words that occurs in it.
Next the program displays the words in the lexicon (by printing the values in the elements of words[] in which they are stored).
The program then enters a "query phase" during which (repeatedly) the user is prompted to enter a word, after which the program reports whether or not the entered word is in the lexicon.
A sample user/program dialog appears below.
Among the moon, the stars, and the (hot) sun, I prefer the sun.were an excerpt from the file, among the words placed into the lexicon would be these:
That's because the program relies upon the next() method of a Scanner object to "retrieve" each occurrence of a word in the input file, but what that method actually retrieves is each maximal sequence of non-whitespace characters!
Using an instance of the WordScanner class will not solve this problem entirely, but its next() method does a better job of filtering out non-word characters. (Making this change is very easy.)
You are encouraged to refactor the program to rectify this deficiency. This would certainly include introducing separate methods for carrying out various subtasks that the program performs, but it could also include introducing a new instance class, say Lexicon, an instance of which represents just that, a lexicon. All the details pertaining to how words are inserted into and retrieved from a lexicon would be housed there. (Of course, those details could involve the use of an array words[] in the same way as the application does now.) One can imagine that this class would have observer methods that answer questions such as "How many words are in the lexicon?" and "Is the word glorp in the lexicon?".
$ java LexiconApp fido.txt 20 Welcome to LexiconApp! There are 64 occurrences of words in the file. 20 words were placed into the lexicon. 37 other words occurred in the file but were not placed into the lexicon. Press ENTER to continue >: 0: "Once" 1: "upon" 2: "a" 3: "time," 4: "we" 5: "had" 6: "dog" 7: "named" 8: "Fido." 9: "Fido" 10: "was" 11: "generally" 12: "very" 13: "nice" 14: "dog," 15: "but" 16: "he" 17: "did" 18: "not" 19: "like" Query Phase: (empty string to quit) Enter word:>the NO Enter word:>dog YES Enter word:> Goodbye. |