COBOL

PROCEDURE DIVISION

Overview

The fourth division in a COBOL (sub)program is the PROCEDURE DIVISION, which contains the executable code. It is comprised of one or more paragraphs, each of which is identified by a programmer-supplied name. The rules governing whether a paragraph name is valid are the same as those governing data names.

Paragraphs and Flow of Control

Syntactically, each paragraph is of the form

<paragraph-name>
    <executable statement>
    <executable statement>
    ... 
    <executable statement>
    .  

As this suggests, each executable statement in a paragraph is indented relative to the paragraph's name. This would be a sensible convention to follow even if it were not required. However, it is required! Executable statements may not intersect with Area A, which includes the first four positions on each line.

When a COBOL program is executed, or when a COBOL subprogram is called, execution begins at (the first statement of) its first paragraph and continues (possibly crossing into the 2nd paragraph or beyond) either until the end of the (sub)program has been reached or until execution of the instruction STOP RUN, or, in the case of a subprogram, EXIT PROGRAM, which hands control back to the caller. (Execution of STOP RUN inside a subprogram terminates execution of the (entire) program; this should never be used to achieve "normal" program termination.)

Although COBOL allows it, this author considers it to be a poor programming practice to allow execution to flow out the end of one paragraph and into the next. Hence, it is recommended that any program that you develop have STOP RUN (or, in the case of a subprogram, EXIT PROGRAM) as the last statement in its first paragraph. This will prevent execution from flowing in the undesired manner. Furthermore, in order to comply with the tenets of "structured programming" (one of which says that every block of code should have exactly one entry point and exactly one exit point), this instance of STOP RUN (or EXIT PROGRAM) should be the only instance in the entire (sub)program (excepting any that are used expressly for causing "abnormal" termination of execution). In effect, we are mandating that the first paragraph in a (sub)program act as the "driver".

As for the paragraphs other than the first, each of them is analogous to a a no-argument void method in Java (or a no-parameter procedure subprogram in Ada or Pascal) having no local declarations. That is, any such paragraph may be invoked, and, when it is, its statements are executed, following which execution resumes with the statement following the one that made the invocation. (The first paragraph can be invoked, too, but, due to the remarks below regarding COBOL's non-support of recursion, it would rarely be a good idea to do so.)

Nested Programs

COBOL allows one program to be nested inside another. In effect, a nested program is a callable subprogram having no parameters, but possibly having local data. See Chapter 23 of Comprehensive COBOL for more.

COBOL Does Not Support Recursion

There is no prohibition against a paragraph invoking itself (either directly or indirectly), but, if this occurs, you should not expect behavior analogous to that of a program written in Java, C/C++, Ada, or Pascal.

Let us explore the reasons for this: Assuming that no paragraph will ever call itself (either directly or indirectly), there can be at most one "active instance" of any particular paragraph at any one time. Hence, in order to keep track of the return addresses of the current "chain of calls", there is no need for a stack. (Recall from CS2 how a so-called run-time stack is used for storing the local data and the return address for each active instance of a subprogram.) Rather, it suffices to associate with each paragraph a single location in which is stored the address to which to return when execution of (the active instance of, if any) that paragraph terminates. And that is what a typical implementation of COBOL will do. The effect is that, if an already-active paragraph is invoked, the return address corresponding to its original invocation is replaced by the return address corresponding to the "new" invocation. Thus, the original return address is "forgotten"!

For example, suppose that A and B are paragraphs and that A calls B, which then calls itself. Assuming that (the second instance of) B terminates, execution will resume at the statement in (the first instance of) B that immediately follows B's invocation of itself, as you would expect. Now suppose that, subsequently, the first instance of B terminates. You might expect execution to resume at the statement in A that immediately follows A's invocation of B. But it won't, because the fact that A called B will have been "forgotten", and, instead, execution will resume (as it did the last time B terminated) at the statement in B that immediately follows B's invocation of itself.

What has just been described is one possibility; exactly what happens when a COBOL program is "recursive" depends upon the compiler. End of note.

Statement Format

In COBOL, most statements are of the form
       <verb> <operand(s)>
(possibly with some reserved words among the operands) in which the verb is a reserved word that indicates the kind of action that is to be taken and the operand(s) are the data items on which the action is to be taken. There are also some verbs corresponding to flow-of-control constructs (i.e., iteration, alternative). With these we associate a code segment rather than a list of operands. These take on a form such as

        <verb>
            <sequence of statements>
        END-<verb> 

Classification of Verbs/Statements

One reasonably could clasify the verbs/statements of COBOL as follows:
  1. No-Op (i.e. do nothing): CONTINUE    
  2. Assignment/Data Transfer: MOVE, SET, INITIALIZE
  3. Arithmetic: ADD, SUBTRACT, MULTIPLY, DIVIDE, COMPUTE
  4. Input/Output: OPEN, CLOSE, READ, WRITE, REWRITE, DELETE, START, ACCEPT, DISPLAY
  5. Flow-of-control:
  6. Character Processing: STRING, UNSTRING, INSPECT, reference modification
  7. High-level Utilities: SEARCH, SEARCH ALL, SORT, MERGE
(These lists are not exhaustive.)

No-OP Verb

CONTINUE

Syntax: CONTINUE

Semantics: Execution has no effect.

Some programming languages, including COBOL, have syntactic constructs that call for the appearance of a (non-empty) sequence of statements. Good examples include the two branches of an IF statement and the statement within the AT END clause. If the programmer wants to specify that such a code segment is to have no effect, the most appropriate way of doing so is to use CONTINUE.

Example:

   IF Counter > 0
      SUBTRACT 1 FROM Counter
   ELSE
      CONTINUE
   END-IF 

Data Transfer Verbs

MOVE

Syntax:
    MOVE {<identifier-1> | <literal>} TO <identifier-2> [,<identifier-3>...]

Examples:

    MOVE Num-Children TO Num-Blessings
    MOVE 0 TO Counter-1, Counter-2 

Semantics: The contents of the "source" field is copied into each "destination" field.

On the surface, this would seem to be a simple instruction. However, its behavior differs according to whether it involves either elementary numeric data items or elementary data items whose picture clauses include special "editing" characters (such as $ and , (comma) for inserting dollar signs and commas into numbers).

If either the source or the destination is a group data item, or if both are non-numeric elementary data items, and if the destination is not an edited data field, the copying is done literally. That is, the bytes comprising the source field are copied, from "left to right", into the destination field. If the source field is longer than the destination field, the rightmost bytes of the former are simply not transferred. If the source field is shorter, the rightmost bytes of the destination field are filled with spaces.

Note: If the destination field is an elementary alphabetic or alphanumeric data item whose defining picture clause is accompanied by a JUSTIFIIED RIGHT clause, data will be transferred into that field from "right to left" rather than the ordinary "left to right". For more information, see page 487 in Comprehensive COBOL. End of note.

If both items are elementary and numeric, one could think of the data transfer as proceeding as follows: digits to the left of the decimal point are copied into the destination field from "right to left" (so that any truncation occurring will be of the most significant digits) and digits to the right of the decimal point are transferred into the destination from "left to right" (so that any truncation occurring will be of the least significant digits). Leading and trailing zeros are placed into the destination field (rather than spaces) if there are "left over" digits.

For more information about the MOVE verb, see pages 112-115 in Comprehensive COBOL.

SET
The SET verb has three forms and may be used for any of the following:

The syntax of the first form of SET is

SET <condition-name> TO TRUE

The effect is that the value of the field associated with the condition-name is changed so that it takes on the value associated with the condition-name. (If the condition-name is associated with more than one value, the smallest such value is given to the associated field.)

For example, suppose we have the declaration

  01 Assessment  PIC 9.
     88 Excellent  VALUE 8 THRU 9.
     88 Very-Good  VALUE 7 THRU 8.
     88 Good       VALUE 6 THRU 7.
     88 Fair       VALUE 4 THRU 5.
     88 Poor       VALUE 0 THRU 3. 
Then execution of  SET Good TO TRUE  results in Assessment taking on the value 6.

The syntax of the second form (which covers the second, third, and fourth cases mentioned above) is

SET { <index-name> | <data-name> } TO { <index-name> | <data-name> | <integer-literal> }

For example, suppose that I and J have been declared using a USAGE IS INDEX clause and that Num has been declared using a picture clause such as PIC 9(3). Then each of the following are proper uses of the SET verb:

SET I TO 15
SET I TO J
SET I TO Num
SET Num TO I 

The third form of the SET verb is used to increment or decrement an index data item by a nonnegative integer value specified by either a numeric literal or by a numeric data item that was not declared using the USAGE IS INDEX clause. The syntax is

SET <index-name> {UP | DOWN} BY { <data-name> | <integer-literal> }

Using the same data-names as in the previous examples, the following are valid uses of SET:

SET I UP BY 1
SET I DOWN BY Num 

Arithmetic Verbs: ADD, SUBTRACT, MULTIPLY, DIVIDE, COMPUTE

For each verb (except COMPUTE), there are two basic forms of statement: one in which a destination field is explicitly indicated (via a GIVING clause) and another in which one of the operands is implicitly understood as the destination field.

Syntax:

ADD {<data-name-1> | <numeric-literal-1>} TO {<data-name-2> | <numeric-literal-2>} GIVING <data-name3>

ADD {<data-name-1> | <numeric-literal>} TO <data-name-2> [, <data-name-3>, ...]

SUBTRACT {<data-name-1> | <numeric-literal-1>} FROM {<data-name-2> | <numeric-literal-2>} GIVING <data-name3>

SUBTRACT {<data-name-1> | <numeric-literal>} FROM <data-name-2> [, <data-name-3>, ...]

MULTIPLY {<data-name-1> | <numeric-literal-1>} BY {<data-name-2> | <numeric-literal-2} GIVING <data-name-3>

MULTIPLY <data-name-1> [, <data-name-2>, ...] BY {<data-name-3> | <numeric-literal>}

DIVIDE {<data-name-1> | <numeric-literal-1>} INTO {<data-name-2> | <numeric-literal-2} GIVING <data-name-3>

DIVIDE {<data-name-1> | <numeric-literal}>} INTO <data-name-2> [,<data-name-3>, ...]

Examples:
ADD 1 TO Counter, Num-Widgets
ADD Num-Children TO Num-Parents GIVING Family-Size

Semantics: obvious (except for where result goes in multiplications and divisions when no destination field is explicitly specified)

Remark: The inclusion of ADD, SUBTRACT, MULTIPLY, and DIVIDE in the language was for the purpose of making code more readable to non-technical people. (Indeed, this goal was influential in the design of much of COBOL's syntax, which, for this very reason, tends to make programs annoyingly verbose.) Luckily, we need never use these verbs, because the COMPUTE verb can be used instead:

Syntax:

COMPUTE <data-name> [ROUNDED] = <arithmetic-expression>
   [ ON SIZE ERROR <imperative-statement> ]
   [ NOT ON SIZE ERROR <imperative-statement> ]
[END-COMPUTE]

We omit the precise syntax for an arithmetic expression, because it is, essentially, the same as found in other programming languages. The operators are + (for addition), - (for subtraction), * (for multiplication) and / (for division). Make sure to include at least one space between a minus sign and a data-name, because otherwise the compiler might mistake the former as being part of the latter.

Examples:
COMPUTE Gross-Pay = (Hours-Worked * Hourly-Wage) + ((Hours-Worked - 40.0) * (1.5 * Hourly-Wage))
COMPUTE Average-Score ROUNDED = Sum-of-Scores / Num-Scores

Semantics: The arithmetic expression to the right of the = is evaluated, and the result is copied into the data item whose name is on the left.

Remark: In my opinion, the COMPUTE verb should be used in almost all circumstances in which an arithmetic operation (or possibly several) is to be carried out. The only exceptions I can think of are ones similar to ADD 1 to Counter (rather than COMPUTE Counter = Counter + 1 or MULTIPLY Val BY 2 (rather than COMPUTE Val = 2 * Val.

Even in these examples, one could argue that the version using the COMPUTE verb is no less readable.


Input/Output verbs

DISPLAY

Syntax:     DISPLAY { <identifier-1> | <literal-1> } [, <identifier-2> | <literal-2> ...]
[ WITH NO ADVANCING ]

Semantics: Displays upon "standard output" the values of the item(s) indicated. If the WITH NO ADVANCING clause is present, the cursor remains on the same line after displaying the data; otherwise it advances to the next line. Failing to advance to the next line is usually preferred when displaying a prompt to the user.

Example: DISPLAY 'The value of Junk is', Junk
Example: DISPLAY 'Enter Y or N:' WITH NO ADVANCING

ACCEPT

Syntax:     ACCEPT <identifier>

Semantics: "Reads" value entered by user at keyboard into the specified variable.

Example: ACCEPT Response

For a description of COBOL's remaining input/output verbs, as well as other issues surrounding the use of files in COBOL, click here.


Flow-of-control

STOP RUN

Syntax: STOP RUN

Semantics: Causes execution of program to terminate.

Note: Normal termination of a program should always occur as a result of executing STOP RUN as the last instruction in the first paragraph in the main program. Any other uses of STOP RUN should be for the express purpose of terminating execution due to some error condition.

EXIT PROGRAM

Syntax: EXIT PROGRAM

Semantics: When used inside a subprogram, causes execution of the subprogram to terminate, handing control back to its caller. When used inside a program, effect is ???


Conditional/Alternative/Decision Statments:

IF Statement

Syntax:

  IF <condition>
     <statement-seq-1>
[ ELSE
     <statement-seq-2> ]
  END-IF

Recall that the square brackets enclose an optional portion of the statement.

Semantics: The condition is evaluated. If it is true, statement-seq-1 is executed, but statement-seq-2 is not. If it is false, and if the optional ELSE clause appears, statement-seq-2 is executed, but statement-seq-1 is not.

There is no ELSIF in COBOL. Thus, to write an IF statement with three or more branches, you must nest IF's within ELSE clauses of enclosing IF's. Here is an example with four branches:

    IF Score > 90
       MOVE 'A' TO Grade
       MOVE 'Great job' to Out-Remarks
    ELSE
       IF Score > 80
          MOVE 'B' TO Grade
       ELSE
          IF Score > 70
             MOVE 'C' TO Grade
          ELSE
             MOVE 'D' TO Grade
             MOVE 'Deficiency' to Out-Remarks
          END-IF
       END-IF
    END-IF  

In order to make this a bit more compact, and to make it closer in appearance to a multi-branch IF in Ada, we could write it as follows:

    IF Score > 90
       MOVE 'A' TO Grade
       MOVE 'Great job' to Out-Remarks
    ELSE IF Score > 80
       MOVE 'B' TO Grade
    ELSE IF Score > 70
       MOVE 'C' TO Grade
    ELSE
       MOVE 'D' TO Grade
       MOVE 'Deficiency' to Out-Remarks
    END-IF
    END-IF
    END-IF  

Notice that each IF must be matched by a separate END-IF, which makes the above look a little goofy. An acceptable variation on the above format would be to place all the END-IF's on the same line.

Warning: In COBOL-74, there was no such thing as an END-IF. Instead, IF statements were terminated by a period. Moreover, a period terminated not only the "most recent" IF but also any in which it is nested. (One consequence of this is that it is impossible, in COBOL-74, to nest an ELSE-less IF statement within the first branch (of two branches) of another IF statement.) This is the main reason why, in COBOL-85, it is a good idea never to use a period at the end of a statement, except where it is required, namely after the last statement in each paragraph.

For more information, see Chapters 5 and 8 of Comprehensive COBOL.

EVALUATE Statement

Due to the fact that COBOL's IF statement, described elsewhere, has at most two branches (as opposed to more, such as can be achieved in Ada using the elsif), in order to achieve the effect of a multi-branch conditional statement, it is necessary to nest each IF inside the previous one's ELSE branch. This makes things rather messy; in particular, it requires the use of several END-IF's at the end. Even if multi-branch IF statements did exist in COBOL, such a statement with many branches can often look messy. For this reason, languages such as Algol, Ada, and Pascal have a case statement (invented by C.A.R. Hoare). (C, C++, and Java have the switch statement, which is a perverted cousin of case.) COBOL has a case construct that is known by a different name: EVALUATE. (But it smells as sweet.)

Syntax: The general syntax of this statement is rather complicated, but a simplified version is as follows:

   EVALUATE <expr-1>
      WHEN <expr-2> [ THRU <expr-2b> ]
         <imperative-statement>
      WHEN <expr-3> [ THRU <expr-3b> ]
         <imperative-statement>
      ...
      ...

    [ WHEN OTHER   
         <imperative-statement>  ]

      END-EVALUATE 
Semantics: The so-called evaluation subject, expr-1, is evaluated. Its value is compared to each of the so-called evaluation objects, expr-2, expr-3, etc., in turn, until a matching value is found. If a match is found, the imperative-statement associated with the (first) matching evaluation object is then executed. If no match is found, and if the optional WHEN OTHER clause is present, the imperative-statement associated to it is executed. If no match is found, but the optional WHEN OTHER clause is absent, nothing happens (aside from the evaluation of all the evaluation objects).

If an evaluation object is specified using the THRU clause, then a match occurs if the evaluation subject falls within the indicated range.

Example 1:

   EVALUATE Score

      WHEN 91 THRU 100
         MOVE 'A' TO Grade
         MOVE 'Great job' to Out-Remarks

      WHEN 81 THRU 90
         MOVE 'B' TO Grade

      WHEN 71 THRU 80
         MOVE 'C' TO Grade

      WHEN OTHER
         MOVE 'D' TO Grade
         MOVE 'Deficiency' to Out-Remarks

    END-EVALUATE 

Example 2: Suppose that we have declarations such as these:

   01 User-Response  PIC 9.
      88 Add-Response  VALUE 1.
      88 Chg-Response  VALUE 2.
      88 Del-Response  VALUE 3.
      88 Quit-Response VALUE 4. 
Now suppose that, depending upon the value of User-Response, one of several possible actions are to be taken. This can be expressed using the following:
    EVALUATE TRUE
       WHEN Add-Response   <imperative-statement>
       WHEN Chg-Response   <imperative-statement>
       WHEN Del-Response   <imperative-statement>
       WHEN Quit-Response  <imperative-statement>
       WHEN OTHER          <DISPLAY 'Invalid response ...'
    END-EVALUATE

At first glance, the above might seem confusing, due to the use of TRUE as the evaluation subject. However, it is really no different from the earlier example: The evaluation objects (Add-Response, Chg-Response, etc.) are evaluated, one by one, until one having a matching value (namely, true) is found. The corresponding imperative statement is then executed. Note: The statement associated with each evaluation object in an EVALUATE statement must be an imperative statement. This excludes statements such as IF's, EVALUATE's, and READ's (which qualify as statements but not as imperative statements). As described in the section covering the PERFORM verb, a non-imperative statement can be enclosed between PERFORM and END-PERFORM in order to obtain an imperative statement.

For more about the EVALUATE statement, see pages 241-246 of Comprehensive COBOL.


PERFORM

The PERFORM verb is "overloaded" (to use a modern term) in that its behavior depends upon the syntactic form of the instruction in which it is used.

Grouping Statements Together

Used in conjunction with END-PERFORM, the PERFORM verb groups together a sequence of statements, making them into a single (imperative) statement. (This is exactly what you do in Pascal using begin and end or in C/C++ and Java using { and }.) Such a use of PERFORM would appear as
        PERFORM
           <sequence of statements>
        END-PERFORM 
This form of PERFORM is useful in places where an imperative statement is allowed, but a conditional statement (such as an IF statement) is not. The statement occurring after the AT END (or NOT AT END) clause (within a READ statement) must be imperative, for example, as must the statement specified for each case in an EVALUATE structure (which is COBOL's version of a case statement).

Paragraph Invocation

Alternatively, PERFORM can be used to invoke a paragraph. Recall that a paragraph in the PROCEDURE division is analogous to a parameter-less procedure in Ada. When a paragraph is PERFORM-ed, execution is transferred to its first instruction. When execution reaches the end of that paragraph, control is returned to the calling paragraph, with execution resuming where it had left off (namely, at the instruction that follows the PERFORM). Such a use of PERFORM appears as
     PERFORM <paragraph-name> 

Repetition (Looping)

PERFORM UNTIL
By adding an UNTIL clause (the reserved word UNTIL followed by a boolean expression, or what in COBOL is called a condition) to either of the forms above, we get a loop structure! That is,
           PERFORM UNTIL <condition>
              <sequence of statements>
           END-PERFORM 
is analogous to a WHILE loop in Ada or Pascal (except that the loop terminates when the condition becomes true, rather than when it becomes false). This form is referred to as an "inline" PERFORM. (In COBOL-74, this form did not exist.) Similarly,
         PERFORM <paragraph-name> UNTIL <condition> 
has the effect of repeatedly PERFORM-ing the specified paragraph until the condition has become true. This form is superfluous, in that it can be rewritten as an inline PERFORM, as follows:
          PERFORM UNTIL <condition>
             PERFORM <paragraph-name>
          END-PERFORM 

Not infrequently, we wish to develop a loop that iterates at least once. (A good example is a loop whose purpose is to get user input: each iteration prompts the user and reads his response, and the loop terminates only after the user has entered a valid response.) To achieve this in COBOL, insert the clause WITH TEST AFTER immediately before the word UNTIL. For example, the paragraph-calling form of the PERFORM loop would look like this:

         PERFORM <par-name> WITH TEST AFTER UNTIL <condition>

Alternatively, we can insert the clause WITH TEST BEFORE, which is the default.

PERFORM VARYING
To arrive at a loop structure somewhat analogous to Ada's or Pascal's FOR loop, we use a VARYING ... FROM ... BY clause in addition to the UNTIL clause. The inline form looks like
     PERFORM VARYING <data-name>
             FROM <init-val> BY <incr-val> UNTIL <condition>
        <sequence-of-statements>
     END-PERFORM  
The other form is
     PERFORM <paragraph-name> 
         VARYING <data-name> FROM <init-val> BY <incr-val> UNTIL <condition> 

The data-name identifies what we sometimes call the loop control variable (LCV), the <init-val> gives the value to which the LCV is to be initialized, and <incr-val> gives the value to be added to the LCV at the end of each iteration. Both <init-val> and <incr-val> must be either numeric literals or data-names. (That is, neither can be an expression containing an operator, such as "X + 1".) In a typical use of PERFORM with the VARYING clause, the loop termination condition will involve the LCV. It need not, however. (Because of this, one might argue that this loop form bears a stronger resemblence to C's FOR loop than to Ada's or Pascal's.)

PERFORM ... TIMES

Often times, we simply want to repeat a chunk of code some specific number of times, where this number is either a constant (e.g., 10) or the value of some variable, and where there is no need for the chunk of code to make any reference to an LCV. In that case, we can write an inline PERFORM as follows

       PERFORM <expr> TIMES
          <sequence of statements>
       END-PERFORM 
or the other form as follows
      PERFORM <paragraph-name> <expr> TIMES 
The expression giving the number of iterations must be either a numeric literal or the name of a numeric data item.

Character Processing Verbs

STRING
Syntax:
STRING { <data-name> | <literal> }  DELIMITED BY {  SIZE | <data-name> | <literal> }
       { <data-name> | <literal> }  DELIMITED BY {  SIZE | <data-name> | <literal> }
       ...
       { <data-name> | <literal> }  DELIMITED BY {  SIZE | <data-name> | <literal> }

   INTO  <data-name>  [WITH POINTER <data-name>]

   [ON OVERFLOW <imperative-statement>]
   [NOT ON OVERFLOW <imperative-statement>]
END-STRING

Note on notation: Keep in mind that square brackets around an item indicates that it is optional, whereas a list of items between curly braces and separated by vertical bars indicates that exactly one of the items must appear.

The STRING verb facilitates the transfer of data from one or more source fields into a single destination field. In effect, the destination field ends up containing the concatenation of data copied from the source fields. For each source field, a DELIMITED BY clause is used to control "how much" data is transferred from that field into the destination.

Suppose, for example, that we have the following declarations:

   01 Person-Rec.
      02 Name.
         03 First-Name  PIC X(10).
         03 Middle-Init PIC X.
         03 Last-Name   PIC X(16).
      02 ... 
      02 ...

   01 Combo-Name  PIC X(28).  
Suppose that we want to transfer the person's name from Person-Rec into Combo-Name so that the latter contains the last name, followed by a comma, followed by a space, followed by the first name.

The following would accomplish it:

   STRING Last-Name  DELIMITED BY SIZE
          ', '       DELIMITED BY SIZE
          First-Name DELIMITED BY SIZE   
      INTO Combo-Name
   END-STRING  

However, this would have the (probably unintended) effect of placing all 20 characters from Last-Name into Combo-Name, including any trailing spaces, followed by a comma and a space, followed by all ten characters of First-Name, including any trailing spaces. For example, if we had
Last-Name = "Smith           "   and
First-Name = "Mary      ",

the result of the above statement would be

Combo-Name = "Smith           , Mary      "

What was probably intended was to achieve

Combo-Name = "Smith, Mary"

That is, we probably wanted to place the string "Smith, Mary" into (the first eleven bytes of) Combo-Name. To do this, we would modify the DELIMITED BY clauses for the two name fields so that they indicated that data transfer should stop when a space is encountered. The modified code is as follows:

   STRING Last-Name  DELIMITED BY ' '
          ', '       DELIMITED BY SIZE
          First-Name DELIMITED BY ' '
      INTO Combo-Name
   END-STRING  

Note that the DELIMITED BY clause can specify either that the entire source field be transferred (indicated by the word SIZE) or that data transfer end with the last character preceding the first occurrence of whatever string is indicated (by either a literal string, a figurative constant (such as SPACE), or the name of a data item). If the source field contains no occurrence of the indicated delimiter string, its contents will be transferred to the destination in their entirety.

The example above assumes that neither Last-Name nor First-Name includes any spaces preceding the last non-space. Hence, it would not work correctly for the last name Van Leuven, for example. By changing the delimeter from ' ' (a single space) to '  ' (two spaces), this problem is eliminated. However, it introduces a new problem! Suppose that the contents of Last-Name are "Van Leuven " (in which there is a single trailing space). As the delimiting string '  ' does not appear in Last-Name, the destination field Combo-Name will end up containing "Van Leuven , ..." (with the trailing space from the source field present). It is left to the reader to work out how to fix this.

Another useful feature of the STRING verb is the WITH POINTER clause, which makes it possible to

This makes it easy to transfer data into the destination field using two or more occurrences of the STRING verb, each time placing the "new" data immediately after the "old". This is especially useful when it is necessary to keep track of how many characters were transferred from each source field.

For example, if we needed to know the precise lengths of the names transferred into Combo-Name in the example above, we could have done things like this:

* in Working-Storage section:

01 Pntr        PIC 99.
01 Old-Pntr    PIC 99.
01 Last-Name-Len  PIC 99.
01 First-Name-Len  PIC 99.

...
...

* in Procedure Division:

    MOVE 1 TO Pntr
    MOVE 1 TO Old-Pntr
    STRING Last-Name  DELIMITED BY ' '
       INTO Combo-Name 
       WITH POINTER Pntr
    END-STRING

    COMPUTE Last-Name-Len = Pntr - Old-Pntr

    STRING ', ' DELIMITED BY SIZE
       INTO Combo-Name 
       WITH POINTER Pntr
    END-STRING

    MOVE Pntr TO Old-Pntr
    STRING First-Name  DELIMITED BY ' '
       INTO Combo-Name
       WITH POINTER Pntr
    END-STRING

    COMPUTE First-Name-Len = Pntr - Old-Pntr 

In the above, the data item Pntr is used for keeping track of where next to place data into Combo-Name. Execution of the STRING verb automatically updates the value of Pntr so that it "points to" the location following the last one already filled.

The ON OVERFLOW and NOT ON OVERFLOW clauses are optional, as indicated. In the context of the STRING verb, overflow occurs when data transfer must be terminated due to the destination field not being long enough to hold all the data that was "intended to be" transferred into it.

For another description of the STRING verb, see pages 208-210 of Comprehensive COBOL, which is on reserve at the Weinberg Library.


UNSTRING

The UNSTRING verb facilitates the transfer of data from a single source field into one or more destination fields. In effect, each destination field ends up containing a segment of the data from the source field. A DELIMITED BY clause is used to indicate at which points within the source data the transfer is to shift from one destination field to the next.

Syntax (some options omitted):

UNSTRING <data-name-1>  DELIMITED BY [ALL] {<data-name-2> | <literal> }
   [ OR [ALL]  {<data-name-3> | <literal> } ... ]
   INTO <data-name-4> [, <data-name-5> ...]
   [WITH POINTER <data-name-6>]
   [ON OVERFLOW <imperative-statement>]
   [NOT ON OVERFLOW <imperative-statement>]
END-UNSTRING

For example, suppose that, to save space, the records in a file are stored so that adjacent fields are separated by a comma, rather than being fixed in length. For example, a record representing a student might be

Rumplestiltskin,Chris,Kelly,1313 Mockingbird,Scranton,18510
After reading a record into, say, the field Stu-Rec-Free-Form with PICTURE clause PIC X(70), we wish to parse the data so as to place the appropriate substrings into Stu-Rec-Formatted, declared as
01 Stu-Rec-Formatted.
   02 Name.
      03 Last-Name  PIC X(16).
      03 First-Name PIC X(10).
      03 Mid-Name   PIC X(10).
   02 Address.
      03 Street-Addr PIC X(20).
      03 City        PIC X(15).
      03 Zip         PIC X(10).

We will get the desired effect using the statement

  UNSTRING Stu-Rec-Free-Form 
     DELIMITED BY ','
     INTO  Last-Name,
           First-Name,
           Mid-Name,
           Street-Addr,
           City,
           Zip
  END-UNSTRING

See pages 208-213 of Comprehensive COBOL for a complete description.


INSPECT
See pages 213-215 in Comprehensive COBOL.

Reference Modification

The term "reference modification" refers to neither a COBOL verb nor to a kind of COBOL statement, but rather to a kind of expression that allows the programmer to refer to any segment of a data item.

The syntax is

<data-name>(<start-pos> : [<length>]) 
where the <start-pos> and <length> are given by either numeric literals or names of numeric data items. For example, if Junk is a field declared using the picture clause PIC X(14), then Junk(5:3) refers to the segment of Junk of length three beginning at its 5th character. Such an expression can be used where any data-name is legal.

As indicated in the syntax description, the <length> item appearing after the colon is optional. If it is omitted, the segment referred to extends to the end of the data item. For example, Junk(5:) would refer to the segment of Junk beginning at its 5th character and extending through its last character.

Examples of use:

   01 Junk    PIC X(10) VALUE 'abcdefghij'.
   01 Garbage PIC X(5)  VALUE 'klmno'.
   ...
   ...

   MOVE Garbage TO Junk(3:5)   <--- changes Junk to 'abklmnohij'
   MOVE Junk(8:) TO Garbage    <--- changes Garbage to 'hij  '

See pages 471-474 of Comprehensive COBOL for more.