Java Programming Exercise: Comparing Strings

Here we consider the problem of determining the relationship between
two `String` objects `s1` and `s2`: exactly
one among `s1 < s2`, `s1 = s2`, and `s1 > s2`
must hold.

We are adopting as our definition of *less-than* the usual
one, as is used in ordering the words in a dictionary.
(This is called **lexicographic** ordering, of which
**alphabetical ordering** is, more or less, a special case.)
Specifically, if `s = x _{1}x_{2}...x_{m}` and

`s`is a proper prefix of`t`or- x
_{k}< y_{k}, where`k`is the first position at which`s`and`t`disagree (which is to say that`x`for all_{i}= y_{i}`i`in the range`1..k-1`).

Of course, the above definition assumes that we already have
`<` defined on individual characters. For this, we adopt
Java's ordering on the `char` data type.

Our task is to develop a Java method that, given two
`String` objects via parameters, returns a value to the caller
indicating which one of the three relationships —*less-than*,
*equal-to*, or *greater-than*— holds between them.

/* Returns a value indicating which one of the following is true: * s1 < s2, s1 = s2, or s1 > s2. */ public static ?? strCompare(String s1, String s2) |

What should the return type of the method be?
The `boolean` data type has only two values, so it isn't viable.
A more appropriate data type would be one having exactly three members
in its "universe of values", perhaps described by the literals
`LESS_THAN`, `EQUAL_TO`, and `GREATER_THAN`.
But there is nothing like this among Java's primitive types nor is there
a class (as far as I know) in Java's standard library defining such a
data type.
To its credit, Java *does* have a feature (called
**enum**eration) by which a programmer can create
such a data type. But we will not pursue that possibility here.

What about the `char` data type? We could, for example,
associate the `char` values ** 'L'**,

But the convention that has been adopted by Java programmers is
to use values of type `int`.
Specifically, a negative integer value indicates that the
relationship is *less-than*, zero indicates *equal-to*,
and a positive value indicates *greater-than*.
Following suit, we refine our method's specification to the following:

/* Returns a negative value if s1 < 2, zero if s1 = s2, and * a positive value if s1 > s2. */ public static int strCompare(String s1, String s2) |

So now we have to work out the logic and express it in the method's
body. The first thing that we recognize is that, unlike in the
`strEqual()` method,
here it will not be very helpful to initially distinguish between
the cases when the two given Strings have the same length and when
they don't. That's because a difference in length tells us
that the two Strings are not equal, but it doesn't tell us
which one is the lesser of the two.

Hence, we recognize that the correct approach is to use a loop to scan the two Strings character by character (starting at position zero) until either we reach the end of the shorter String or we find a position at which the two Strings disagree in the characters that occur there.

It will be important to be able to ascertain, when the loop terminates, which condition caused the termination. If the loop terminated because a position was found at which the two Strings disagree, then one can compare the characters at that position to determine which one is less than the other. The String containing the lesser of the two characters at that position is the lesser String, of course. As an example, consider "gargantuan" vs. "gaseous". The first difference is at position two, where the characters are 'r' vs. 's'. Because 'r' < 's', we have that "gargantuan" < "gaseous".

If, on the other hand, the loop terminated because the end of the shorter String was reached without a disagreement having been found, then it must be either that the two Strings are equal or that one is a proper prefix of the other.

Following this logic, we arrive at the following code:

/* Returns a negative value if s1 < 2, zero if s1 = s2, and * a positive value if s1 > s2. */ public static int strCompare(String s1, String s2) { int result; boolean equalSoFar = true; final int SHORTER_LEN = Math.min(s1.length(), s2.length()); int i = 0; while (i != SHORTER_LEN && equalSoFar) { if (s1.charAt(i) == s2.charAt(i)) { i = i+1; } else { equalSoFar = false; // mismatching chars found at position i } } // At this point in execution (the loop having just terminated), // exactly one among i == SHORTER_LEN and !equalSoFar must be true, // which is to say that i == SHORTER_LEN and equalSoFar must be equal // to each other (i.e., either both true or both false). assert (i == SHORTER_LEN) == equalSoFar; if (equalSoFar) { // no mismatches exist final int LENGTH_DIFF = s1.length() - s2.length(); if (LENGTH_DIFF < 0) { result = -1; } // s1 is shorter, so s1 < s2 else if (LENGTH_DIFF > 0) { result = 1; } // s1 is longer, so s1 > s2 else { result = 0; } // lengths are equal, so s1 = s2 } else { // first mismatch is at position i if (s1.charAt(i) < s2.charAt(i)) { result = -1; } else { result = 1; } } return result; } |

The above was written as though the method were obligated to return either
-1, 0, or 1 to the caller. But that is not so. Any negative value
suffices in place of -1 and any positive value in place of 1.
With this freedom, we can simplify the `if-else` statement
following the loop:

/* Returns a negative value if s1 < 2, zero if s1 = s2, and * a positive value if s1 > s2. */ public static int strCompare(String s1, String s2) { int result; boolean equalSoFar = true; final int SHORTER_LEN = Math.min(s1.length(), s2.length()); int i = 0; while (i != SHORTER_LEN && equalSoFar) { if (s1.charAt(i) == s2.charAt(i)) { i = i+1; } else { equalSoFar = false; // mismatching chars found at position i } } // At this point in execution (the loop having just terminated), // exactly one among i == SHORTER_LEN and !equalSoFar must be true, // which is to say that i == SHORTER_LEN and equalSoFar must be equal // to each other (i.e., either both true or both false). assert (i == SHORTER_LEN) == equalSoFar; if (equalSoFar) { // no mismatches exist result = s1.length() - s2.length(); } else { // first mismatch is at position i result = s1.charAt(i) - s2.charAt(i); } return result; } |

In case you are confused by the expression
`s1.charAt(i) - s2.charAt(i)`, note that, in Java,
certain arithmetic operations can be applied to values of
type `char`. Among these is to compute the difference
between two `char` values. If `c1` and `c2`
are of type `char`, then `c1 - c2` will be
negative precisely if `c1 < c2`, just as with numbers.

Next we observe that we don't really need the `equalSoFar` variable.
Why not? Well, regarding the code following the loop, we can replace its
lone occurrence by the boolean expression `i == SHORTER_LEN`.
After all, the value of this expression necessarily will be equal to that of
`equalSoFar` at that point in time, as asserted in Version 4.
As for the loop itself, now its only purpose is to advance the value of
`i` until either `i` points to the position at which a mismatch
occurs or, absent the existence of any mismatches, `i` reaches the
length of the shorter String. As illustrated in Version 8 of the
`strEqual()` method, this can
be accomplished using a while-loop in which the body is nothing but an
increment of `i` and the loop guard is a conjunction, the second
conjunct of which is a comparison between the characters at position `i`
of the two Strings. We get:

/* Returns a negative value if s1 < 2, zero if s1 = s2, and * a positive value if s1 > s2. */ public static int strCompare(String s1, String s2) { int result; boolean equalSoFar = true; final int SHORTER_LEN = Math.min(s1.length(), s2.length()); int i = 0; while (i != SHORTER_LEN && s1.charAt(i) == s2.charAt(i)) { i = i+1; } // At this point in execution (the loop having just terminated), exactly one among // i == SHORTER_LEN and (i < SHORTER_LEN && s1.charAt(i) != s2.charAt(i)) is true. // If the former is true, either the two strings are equal or one is a proper prefix // of the other. In the latter is true, i is the lowest-numbered position at which // the two strings fail to match. assert (i == SHORTER_LEN) != (i < SHORTER_LEN && s1.charAt(i) != s2.charAt(i)); if (i == SHORTER_LEN) { // no mismatches exist result = s1.length() - s2.length(); } else { // first mismatch is at position i result = s1.charAt(i) - s2.charAt(i); } return result; } |

Although the loop is a bit more concise in Version 5, the post-loop
reasoning (as expressed in the **assertion**) is a bit more
complicated.