CMPS 134
Java Programming Exercise: Comparing Strings

Here we consider the problem of determining the relationship between two String objects s1 and s2: exactly one among s1 < s2, s1 = s2, and s1 > s2 must hold.

We are adopting as our definition of less-than the usual one, as is used in ordering the words in a dictionary. (This is called lexicographic ordering, of which alphabetical ordering is, more or less, a special case.) Specifically, if s = x1x2...xm and t = y1y2...yn are strings (where each xi and yj is a single character), then s < t if and only if either

  1. s is a proper prefix of t or
  2. xk < yk, where k is the first position at which s and t disagree (which is to say that xi = yi for all i in the range [1..k)).

Of course, the above definition assumes that we already have < defined on individual characters. For this, we adopt Java's ordering on the char data type.

Our task is to develop a Java method that, given two String objects via parameters, returns a value to the caller indicating which one of the three relationships —less-than, equal-to, or greater-than— holds between them.

Version 1
/* Returns a value indicating which one of the following is true:
* s1 < s2, s1 = s2, or s1 > s2.
*/
public static ?? strCompare(String s1, String s2)

What should the return type of the method be? The boolean data type has only two values, so it isn't viable. A more appropriate data type would be one having exactly three members in its "universe of values", perhaps described by the literals LESS_THAN, EQUAL_TO, and GREATER_THAN. But there is nothing like this among Java's primitive types nor is there a class (as far as I know) in Java's standard library defining such a data type. To its credit, Java does have a feature (called enumeration) by which a programmer can create such a data type. But we will not pursue that possibility here.

What about the char data type? We could, for example, associate the char values 'L', 'E', and 'G', respectively, with the outcomes less-than, equal-to, and greater-than. That would be reasonable. Or we could use the Strings "Less Than", "Equal To", and "Greater Than".

But the convention that has been adopted by Java programmers is to use values of type int. Specifically, a negative integer value indicates that the relationship is less-than, zero indicates equal-to, and a positive value indicates greater-than. Following suit, we refine our method's specification to the following:

Version 2
/* Returns a negative value if s1 < 2, zero if s1 = s2, and
*  a positive value if s1 > s2.
*/
public static int strCompare(String s1, String s2)

So now we have to work out the logic and express it in the method's body. The first thing that we recognize is that, unlike in the strEqual() method, here it will not be very helpful to initially distinguish between the cases when the two given Strings have the same length and when they don't. That's because a difference in length tells us that the two Strings are not equal, but it doesn't tell us which one is the lesser of the two.

Hence, we recognize that the correct approach is to use a loop to scan the two Strings character by character (starting at position zero) until either we reach the end of the shorter String or we find a position at which the two Strings disagree in the characters that occur there.

It will be important to be able to ascertain, when the loop terminates, which condition caused the termination. If the loop terminated because a position was found at which the two Strings disagree, then one can compare the characters at that position to determine which one is less than the other. The String containing the lesser of the two characters at that position is the lesser String, of course. As an example, consider "gargantuan" vs. "gaseous". The first difference is at position two, where the characters are 'r' vs. 's'. Because 'r' < 's', we have that "gargantuan" < "gaseous".

If, on the other hand, the loop terminated because the end of the shorter String was reached without a disagreement having been found (as would be the case if "gaseous" were compared to "gas"), then it must be either that the two Strings are equal or that one is a proper prefix of the other.

Following this logic, we arrive at the following code:

Version 3
/* Returns a negative value if s1 < 2, zero if s1 = s2, and
* a positive value if s1 > s2.
*/
public static int strCompare(String s1, String s2) {

   int result;
   boolean equalSoFar = true;
   final int SHORTER_LEN = Math.min(s1.length(), s2.length());

   int i = 0;
   while (i != SHORTER_LEN  &&  equalSoFar) {
      if (s1.charAt(i) == s2.charAt(i)) {
         i = i+1;
      }
      else {
         equalSoFar = false;  // mismatching chars found at position i
      }
   }
   // At this point in execution (the loop having just terminated), 
   // exactly one among i == SHORTER_LEN and !equalSoFar must be true, 
   // which is to say that i == SHORTER_LEN and equalSoFar must be equal
   // to each other (i.e., either both true or both false).
   assert (i == SHORTER_LEN) == equalSoFar;

   if (equalSoFar) {  // no mismatches exist
      final int LENGTH_DIFF = s1.length() - s2.length();
      if (LENGTH_DIFF < 0) { result = -1; }     // s1 is shorter, so s1 < s2
      else if (LENGTH_DIFF > 0) { result = 1; } // s1 is longer, so s1 > s2
      else { result = 0; }                      // lengths are equal, so s1 = s2
   } 
   else {  // first mismatch is at position i 
      if (s1.charAt(i) < s2.charAt(i)) { result = -1; }
      else { result = 1; }
   }
   return result;
}

The above was written as though the method were obligated to return either -1, 0, or 1 to the caller. But that is not so. Any negative value suffices in place of -1 and any positive value in place of 1. With this freedom, we can simplify the if-else statement following the loop:

Version 4
/* Returns a negative value if s1 < 2, zero if s1 = s2, and
* a positive value if s1 > s2.
*/
public static int strCompare(String s1, String s2) {

   int result;
   boolean equalSoFar = true;
   final int SHORTER_LEN = Math.min(s1.length(), s2.length());
   int i = 0;
   while (i != SHORTER_LEN  &&  equalSoFar) {
      if (s1.charAt(i) == s2.charAt(i)) {
         i = i+1;
      }
      else {
         equalSoFar = false;  // mismatching chars found at position i
      }
   }
   // At this point in execution (the loop having just terminated), 
   // exactly one among i == SHORTER_LEN and !equalSoFar must be true, 
   // which is to say that i == SHORTER_LEN and equalSoFar must be equal
   // to each other (i.e., either both true or both false).
   assert (i == SHORTER_LEN) == equalSoFar;

   if (equalSoFar) {   // no mismatches exist
      result = s1.length() - s2.length();
   }
   else {  // first mismatch is at position i 
      result = s1.charAt(i) - s2.charAt(i);
   }
   return result;
}

In case you are confused by the expression s1.charAt(i) - s2.charAt(i), note that, in Java, certain arithmetic operations can be applied to values of type char. Among these is to compute the difference between two char values. If c1 and c2 are of type char, then c1 - c2 will be negative precisely if c1 < c2, just as with numbers.

Next we observe that we don't really need the equalSoFar variable. Why not? Well, regarding the code following the loop, we can replace its lone occurrence by the boolean expression i == SHORTER_LEN. After all, the value of this expression necessarily will be equal to that of equalSoFar at that point in time, as asserted in Version 4. As for the loop itself, now its only purpose is to advance the value of i until either i points to the position at which a mismatch occurs or, absent the existence of any mismatches, i reaches the length of the shorter String. As illustrated in Version 8 of the strEqual() method, this can be accomplished using a while-loop in which the body is nothing but an increment of i and the loop guard is a conjunction, the second conjunct of which is a comparison between the characters at position i of the two Strings. We get:

Version 5
/* Returns a negative value if s1 < 2, zero if s1 = s2, and
* a positive value if s1 > s2.
*/
public static int strCompare(String s1, String s2)
{
   int result;
   boolean equalSoFar = true;
   final int SHORTER_LEN = Math.min(s1.length(), s2.length());
   int i = 0;
   while (i != SHORTER_LEN  &&  s1.charAt(i) == s2.charAt(i)) {
      i = i+1;
   }
   // At this point in execution (the loop having just terminated), exactly one among
   // i == SHORTER_LEN and (i < SHORTER_LEN && s1.charAt(i) != s2.charAt(i)) is true.
   // If the former is true, either the two strings are equal or one is a proper prefix
   // of the other.  In the latter is true, i is the lowest-numbered position at which
   // the two strings fail to match.
   assert (i == SHORTER_LEN) != (i < SHORTER_LEN && s1.charAt(i) != s2.charAt(i));

   if (i == SHORTER_LEN) {  // no mismatches exist
      result = s1.length() - s2.length();
   }
   else {  // first mismatch is at position i 
      result = s1.charAt(i) - s2.charAt(i);
   }
   return result;
}

Although the loop is a bit more concise in Version 5, the post-loop reasoning (as expressed in the assertion) is a bit more complicated.