CMPS 134
Java Programming Exercise: Comparing Strings for Equality

Here we consider the problem of determining whether or not two String objects are equal.

The java.lang.String class includes a method (namely, equals()) by which to test whether two instances of the String class are equal to each other, meaning that they represent the same sequence of characters.

As an academic exercise, let's develop our own method to do the same job. Here is the specification:

Version 1
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2)

What, exactly, does it mean for two Strings to be equal? Well, in the first place it requires that their lengths be equal. If that condition is not met, we can immediately conclude that the strings are not equal. If, on the other hand, their lengths are equal, more work is necessary to arrive at an answer. This suggests a solution of the following form:

Version 2
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean result;
   if (s1.length() != s2.length()) {
      result = false;
   }
   else {
      // s1 and s2 have same length, so do whatever is necessary to 
      // determine if they are equal and assign appropriate value to 'result'
   }
   return result;
}

Notice that the code above conforms to a standard template that is often used in functional methods (i.e., methods that return a value), which is to have a local variable, often called result, the purpose of which is to have the method's return-value assigned to it somewhere in the body of the method, and then, in the last line of the body of the method, to have its value returned. Such a variable could reasonably be called the method's result-variable.

Using result as the name of a method's result-variable is appropriate, given that that name is suggestive of the variable's purpose, but often one can devise a name that is more descriptive of the variable's meaning, and hence is a better choice.

Now we focus on the case when the two Strings have the same length. As an example, consider the (unequal) Strings CATTLE and CASTLE. (Attach no significance to the fact that the examples used here make use of upper case letters; it is simply to make them stand out more.)

What makes them unequal?
Answer: They disagree in the 2nd position, where one has a T and the other has an S. (Recall that we number the positions in a String beginning with zero, not one.)

The Strings GARBAGE and BABBLES are unequal as well, as they disagree in several positions (0th, 2nd, 4th, 5th, and 6th).

The point is that if two Strings of equal length disagree in one or more positions, they are unequal. If they disagree nowhere, they are equal.

Based upon this analysis, we design our method to have a loop that iterates through the positions of the two Strings, comparing the pair of characters occurring at each position. If and when a disagreement is observed, we "remember" it by setting the method's return-variable to false. If no disagreement is found, the loop will terminate having found the two Strings to be in agreement at every position.

Given this approach, it would seem reasonable to call the method's return-variable equalSoFar, suggesting that its value corresponds to the truth-value of the statement "no disagreement (between the two Strings) has been observed so far".

If no disagreement is ever found, false will never be assigned to equalSoFar, and so the value assigned to it originally will be returned. Ah, so that means that we need to initialize it to true. Here is our completed solution:

Version 3
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean equalSoFar = true; 

   if (s1.length() != s2.length()) {
      equalSoFar = false;
   }
   else {
      for (int i=0; i != s1.length(); i = i+1) {
         if (s1.charAt(i) != s2.charAt(i)) {
            equalSoFar = false;
         }
      }
   }
   return equalSoFar;
}

Notice that the if statement inside the for-loop has no else part. A common mistake made by novice programmers when asked to develop this method is to include an else part and to put in it the assignment equalSoFar = true. Why is this a mistake? Answer: Because then, in the case that s1 and s2 have equal lengths, the final value of equalSoFar depends exclusively upon the last iteration of the loop, during which it will be set to either true or false according to whether or not the characters in the last position of s1 and s2 are the same. Hence, the result returned by the method will not answer the question of whether or not s1 and s2 are equal but rather whether or not their lengths are the same and their last characters are equal. (For example, the Strings LARK and BUNK would be reported to be equal, simply because they have the same length and end with the same character!)

Ahh, but this bit of reasoning should make it clear to the reader, if it wasn't clear already, that equalSoFar should never have its value changed from false to true. Indeed, its value will become false only if some disagreement between s1 and s2 is observed, and changing its value back to true would be, in effect, "forgetting" that such an observation had been made, possibly leading to reporting a "false positive", as in the LARK vs. BUNK example just mentioned.

So if the value of equalSoFar is to remain false if ever it assumes that value (leading eventually to the method returning that value), then the method ought to cease its work as soon as such an event occurs! In other words, if a disagreement between s1 and s2 is observed in some particular position, there is no point in comparing the characters that occur in subsequent positions of the Strings and hence the loop should terminate without iterating again.

To effect this change, we augment the loop guard by adding equalSoFar as a new conjunct. (Note that the value of the expression equalSoFar is necessarily equal to the value of the expression equalSoFar == true, so there is no point in using the latter, which is unnecessarily verbose. Many novice programmers cannot seem to absorb this fact, however.) Here is the updated method, with the modification in red:

Version 4
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean equalSoFar = true;

   if (s1.length() != s2.length()) { 
      equalSoFar = false;
   }
   else {
      for (int i=0; i != s1.length() && equalSoFar; i = i+1) {
         if (s1.charAt(i) != s2.charAt(i)) { 
            equalSoFar = false;
         }
      }
   }
   return equalSoFar;
}

But what appears in Version 4 is an improper use of a for-loop, at least according to some "programming purists". Why? Because, they would say, a for-loop's guard should depend entirely upon the loop control variable's value in relation to the range of values over which it is intended to iterate. Here the guard depends not only upon that range but also upon the value of equalSoFar. Their view is that the more appropriate loop construct to use here is the while-loop. Making this change, we get

Version 5
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean equalSoFar = true;

   if (s1.length() != s2.length()) { 
      equalSoFar = false;
   }
   else {
      int i = 0;
      while (i != s1.length() && equalSoFar) {
         if (s1.charAt(i) != s2.charAt(i)) { 
            equalSoFar = false;
         }
         i = i+1;
      }
   }
   return equalSoFar;
}

Observe that in Version 5 the loop will terminate either because equalSoFar has been set to false (because a disagreement has been found) or i's value has reached s1.length(), or both. By introducing an else-part and placing the increment of i into it, we make it impossible for both of these conditions to hold:

Version 6
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean equalSoFar = true;

   if (s1.length() != s2.length()) { 
      equalSoFar = false;
   }
   else {
      int i = 0;
      while (i != s1.length() && equalSoFar) {
         if (s1.charAt(i) != s2.charAt(i)) 
            { equalSoFar = false; }
         else
            { i = i+1; }
      }
   }
   return equalSoFar;
}

The modification made to arrive at Version 6 is useful only in that it is going to help us transition more smoothly to a superior Version 7. Specifically, we observe that the loop in Version 6 will terminate with i == s1.length() if, and only if, s1 and s2 are equal. (After all, had a disagreement been found, the loop would have terminated with i indicating the position of that disagreement, somewhere in the range [0..s1.length()).)

This observation leads us to realize that we don't need to make use of equalSoFar inside the loop, nor do we need the if-statement there. Rather, all the loop needs to do is to keep incrementing i until either i "falls off" the ends of the Strings (by becoming equal to s1.length()) or a disagreement is found between the characters at position i of s1 and s2. When the loop terminates, we can determine which of those two events occurred by comparing i with s1.length() and assigning the result to equalSoFar. Based upon these observations, we refine the method to obtain Version 7:

Version 7
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   boolean equalSoFar = true;

   if (s1.length() != s2.length()) { 
      equalSoFar = false;
   }
   else {
      int i = 0;
      while (i != s1.length()  &&  s1.charAt(i) == s2.charAt(i)) {
         i = i+1;
      }
      //assertion: i == s1.length iff s1 and s2 are equal
      equalSoFar = (i == s1.length());
   }
   return equalSoFar;
}

Novice programmers tend to be confused by an assignment such as equalSoFar = (i == s1.length()), but there is really nothing complicated about it. As with any assignment statement, the expression on the right-hand side is evaluated and the result is stored in the variable indicated on the left-hand side. Here the expression and the variable are of type boolean, that's all. Indeed, this statement is equivalent to the following (verbose in comparison) if-else statement:

if (i == s1.length())
   { equalSoFar = true; }
else
   { equalSoFar = false; }

Another point of concern in the code of Version 7 is the loop guard. Won't its evaluation in the case that i == s1.length() result in a StringIndexOutOfBounds exception being thrown as a result of trying to access the character at (nonexistent) position s1.length() of s1? Answer: No. Why not? Because in Java, conjunctions are evaluated in a "short circuit" fashion. That is, when a conjunction P && Q is evaluated, P is evaluated first and, if found to be false, Q is ignored. In other words, Q will be evaluated only after —and if— P has been found to be true.

Now, the convention of making use of a result-variable is just that, a convention and not a requirement. Some programmers would argue that the next version of our method, which is like Version 7 but with equalSoFar (the result-variable) removed, is an improvement.

Version 8
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   if (s1.length() != s2.length()) { 
      return false;
   }
   else {
      int i = 0;
      while (i != s1.length()  &&  s1.charAt(i) == s2.charAt(i)) {
         i = i+1;
      }
      //assertion: i == s1.length iff s1 and s2 are equal
      return i == s1.length();
   }
}

There are some purists, however, who would criticize Version 8 for failing to adhere to the single entry – single exit doctrine of structured programming, which says that every "block" of code (which includes any loop-statement, if-statement, or method body) should have a single entry point (at the "top") and a single exit point (at the "bottom"). Version 8 of our method has two exit points corresponding to the pair of return statements.

An even more egregious violation of this doctrine can be observed in Version 9:

Version 9
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   if (s1.length() != s2.length()) { return false; }

   // If execution reaches here, the lengths of s1 and s2 are the same.
   for (int i=0; i != s1.length(); i = i+1)
   {
      if (s1.charAt(i) != s2.charAt(i)) { return false; }
   }
   // If execution reaches here, no disagreements were found.
   return true;
}

In this author's opinion, Version 8 is acceptable, but Version 9 goes too far in its failure to adhere to the Structured Programming doctrine.


Appendix

Here we consider whether or not it is really necessary to treat the case in which the lengths of s1 and s2 are different separately from the case in which their lengths are equal. How would our Version 8 method behave if, say, we got rid of the if-statement, keeping only the while-loop nested in its else-branch? The method would be as follows:

Version 8'
WARNING: This solution is incorrect
/* Returns true if the two specified Strings are equal, false otherwise
*/
public static boolean strEqual(String s1, String s2) {

   int i = 0;
   while (i != s1.length()  &&  s1.charAt(i) == s2.charAt(i)) {
      i = i+1;
   }
   //assertion: i == s1.length iff s1 and s2 are equal
   return i == s1.length();
}

Consider what would happen if the caller passed the Strings CAT and CATTLE, respectively, to the Version 8' method. After three loop iterations, the loop would terminate having found no disagreements, due to the fact that CAT (the value of s1) is a prefix of CATTLE (the value of s2). Then the method would return the value true, incorrectly informing the caller that CAT and CATTLE are equal! In effect, the method would yield a "false positive".

Now consider what would happen if the caller passed CATTLE and CAT to the method, respectively. On the 4th iteration of the loop (after the first three iterations had found no disagreements in postions 0..2), an attempt would be made to compare the characters at position 3 of the two Strings. But there is no position 3 in CAT, and so an exception will be thrown, causing the program to abort. The general rule is this: if s is of type String, the method call s.charAt(E) will result in an IndexOutOfBoundsException being thrown if E's value is not in the range [0..s.length()). This makes perfect sense, as the only positions that exist within a String are the ones in that range.

Our conclusion is that yes, the case in which the lengths of s1 and s2 are different needs to be handled separately from the other case.