Relational Algebra Introduction

A relational database is composed of two-dimensional tables. (A table can also be called a relation, although relational "purists" would argue that there is a subtle distinction between the two.) Each "row" of a table is called a tuple. Each tuple is composed of fields, one for each attribute of the table. (The attributes are the concepts, or the names thereof, that we associate with the fields/columns.)

Here is an example of a table in which each tuple describes a student in a hypothetical high school or 4-year college.
Student    ID     Name      Sex   Class
        +------+----------+-----+-------+
        |   2  |  Mary    |  F  |   2   |
        +------+----------+-----+-------+
        |   6  |  John    |  M  |   4   |
        +------+----------+-----+-------+
        |   9  |  Carol   |  F  |   3   |
        +------+----------+-----+-------+
        |   7  |  Mary    |  F  |   4   |
        +------+----------+-----+-------+
        |   1  |  Ann     |  F  |   1   |
        +------+----------+-----+-------+
        |  13  |  Mike    |  M  |   2   |
        +------+----------+-----+-------+
        |   8  |  Helen   |  F  |   4   |
        +------+----------+-----+-------+
        |   5  |  Jim     |  M  |   1   |
        +------+----------+-----+-------+
        |  16  |  Mike    |  M  |   3   |
        +------+----------+-----+-------+ 

Following the usual convention, we have written the name of the table, Student, immediately to the left of the attributes, which serve as column headings. As the reader has probably surmised, we have chosen to encode the values corresponding to male and female, respectively, by the characters M and F. Also, the values 1, 2, 3, and 4 occurring in the Class attribute correspond to freshman, sophomore, junior, and senior, respectively.

The reader should keep in mind that the order in which we list the tuples occurring in a table is irrelevant. That is, a table is viewed as being a set of tuples, not a sequence of tuples.

One of the central functions of a DBMS is to answer questions (about the data in the database) posed to it by users. (Indeed, there would be no reason to store the data in the first place if there were no desire to retrieve bits and pieces of it later for the purpose of answering such questions.) Such a question is referred to as a query. For example, with respect to the table above, a user might ask What are the names of the female students? Or, to state it in the form of a command: List the names of the female students. More examples: List the names of female freshmen; List the name and sex of each senior. In the relational database model, the answers to queries such as these are themselves tables (although such tables do not become a "permanent" part of the database). For example, the answer to the last query mentioned would be:

    Name     Sex  
+----------+-----+
|  John    |  M  |              List the name and sex
+----------+-----+              of each senior.
|  Mary    |  F  | 
+----------+-----+
|  Helen   |  F  | 
+----------+-----+ 

Notice that this table is obtained from the Student table by first eliminating all but those tuples corresponding to seniors (i.e., tuples whose Class attribute has value 4) and then eliminating all but the Name and Sex attributes. (You will see shortly that the elimination of tuples and attributes correspond to two of the fundamental operations used for answering queries.)

Because computers' abilities to "understand" natural language (such as English) is very limited (at least to this point in time), a user must, in specifying a query, employ a more precise and formal notation. Most relational DBMS's support the use of some variant of SQL (Standard Query Language) for specifying queries. Here we shall use a slightly lower-level language, relational algebra, for that purpose. Conceptually, one can view the first phase of query processing —in which a DBMS translates an SQL query submitted to it into a program that produces the answer to that query— as having as its purpose to translate the SQL query into an equivalent relational algebra query.

We shall present two different forms of syntax for relational algebra (RA) queries. One is the traditional syntax, which has a mathematical flavor. The other has a flavor more similar to programming languages, or even SQL. The latter we will refer to as sugared relational algebra (SRA).

Queries in relational algebra are based upon the use of three elementary operations on tables: project, restrict, and join. (The "restrict" operation is usually called "select", but here we use the terminology of C.J. Date (prolific author on the subject of the relational model), in part because the SELECT verb in SQL has an entirely different meaning.)

Because the result of applying an operation is itself a table, we can compose operations in sequence. The obvious analogy is with functions mapping reals to reals, where the function (f o g), read "f of g", is defined by (f o g)(x) = f(g(x)). Indeed, the operators in relational algebra are functions, with tables as both domain and range.

The Project Operation

Applying project to a table yields a copy of that table, but (possibly) with some of its attributes (i.e., columns) excluded. (Presumably, the excluded attributes are not of interest.) SRA's syntax for an application of the project operation is as follows:

PROJECT <list of attributes> FROM <table>

In RA, this is written

Π<list of attributes>(<table>)

For example, the SRA expression

PROJECT Name, Class FROM Student

evaluates to the table obtained by making a copy of the Student table (shown above) and then removing all columns except those corresponding to the attributes Name and Class. In RA, we would have written this as ΠName,Class(Student).
The resulting table is

   Name   Class 
+--------+-----+
|  Mary  |  2  |
+--------+-----+
|  John  |  4  |            PROJECT Name, Class
+--------+-----+            FROM Student
|  Carol |  3  |
+--------+-----+
|  Mary  |  4  |
+--------+-----+
|  Ann   |  1  |
+--------+-----+
| Helen  |  4  |
+--------+-----+
|  Jim   |  1  |
+--------+-----+ 
|  Mike  |  3  |
+--------+-----+ 

The query PROJECT Name, Class FROM Student, then, is nothing but a more formal way of stating the (English) query

List the name and class of each student.

The Restrict Operation

(Note that Elmasri & Navathe (and almost everyone else, with the exception of C.J. Date) calls this select, but we refrain from doing so because the meaning of that term in SQL is quite different.) Applying restrict to a table yields a copy of that table, but (possibly) with some of its tuples (i.e., rows) excluded, namely those tuples that fail to satisfy a specified condition. (Presumably, any tuple failing to satisfy the condition is not of interest to the user.) In keeping with the syntactic style used for project, an application of the restrict operation will be written in SRA in the following way:

RESTRICT <table> WHERE <condition>
In RA, we would write this as
σ<condition>(<table>)

The condition is simply a boolean expression to be evaluated with respect to a tuple. (Hence, any sensible condition will mention one or more attributes of the table.) For example, the expression

RESTRICT Student WHERE Sex = 'M'

evaluates to the table that has the same attributes as Student and includes precisely those tuples of Student in which the Sex attribute has value M. It looks like this:

   ID     Name      Sex   Class
+------+----------+-----+-------+
|   6  |  John    |  M  |   4   |
+------+----------+-----+-------+
|  13  |  Mike    |  M  |   2   |       RESTRICT Student
+------+----------+-----+-------+       WHERE Sex = 'M'
|   5  |  Jim     |  M  |   1   |
+------+----------+-----+-------+
|  16  |  Mike    |  M  |   3   |
+------+----------+-----+-------+

By using boolean (or logical, if you prefer) operators (such as AND, OR, and NOT) in the condition, we can express more complicated queries. For example, if we wanted to form a table like Student, except listing only students who are either (a) sophomore and male or (b) female, we could write

RESTRICT Student WHERE (Sex = 'M' AND Class = 2) OR (Sex = 'F')

With only the project and restrict operations, we have the ability to describe many non-trivial queries. For example, suppose that we wanted a list containing the name and sex of every senior. From the examples above, it should be clear that the expression

RESTRICT Student WHERE Class = 4

refers to the table that looks like Student except that only the tuples corresponding to seniors are present. Once we have constructed that table, it remains only to project from it the Name and Sex attributes. The appropriate query, then, would be

PROJECT Name, Sex FROM (RESTRICT Student WHERE Class = 4)

In RA, this would be

ΠName,SexClass=4(Student))

This example illustrates that one query can be "nested" inside another. Indeed, as the RA syntax suggests, nesting one query inside another is nothing more than function composition, as alluded to earlier.

As another example, suppose we wanted a list containing the name and sex of every student who is either a male sophomore or else a senior. An appropriate SRA query is

PROJECT  Name, Sex 
FROM     (RESTRICT Student 
          WHERE  (Sex = 'M' AND Class = 2)  OR  (Class = 4)
         ) 
The query is split over multiple lines simply because it is too long to fit on a single line. In the more concise RA notation, this would be

ΠName,Sex(Sex='M' AND Class=2) OR (Class=4)(Student))

As queries get more complicated, requiring the use of two, three, or more nested applications of project and/or restrict, they become difficult to understand. Hence, we allow "temporary" tables to be given names. For example, the query above describing the list of names and sexes of students who are either male sophomores or seniors (of either sex) can be written as

(1) Temp ← RESTRICT Student WHERE (Sex = 'M' AND Class = 2)  OR  (Class = 4)
(2) Result ← PROJECT  Name, Sex  FROM  Temp

Here, in line (1) we specify that the table obtained from doing the restrict is to be named (rather unimaginatively) Temp, and then we apply project to that table in line (2) in order to obtain the desired result, to which we give the name Result!


The Join Operation

In any but the most trivial relational databases there will be two, three, or possibly many more tables. In such a setting, to answer most interesting queries will require the use of two or more tables. This is where the join operation becomes useful.

Suppose that, in addition to Student (as illustrated above), our database also includes a table whose purpose is to keep a record of which students are currently enrolled in which courses. (Such a table would arise from there being a many-to-many relationship type involving student and course entities.) Let us call this the Enrolled-In table; for the purposes of doing examples, suppose that its current contents are as follows:

Enrolled-In   StuID   CourseID
             +-----+------------+
             |  2  |   CIL 102  |
             +-----+------------+
             |  2  |   MATH 2   |
             +-----+------------+
             |  1  |   HIST 7   |
             +-----+------------+
             |  7  |   MATH 2   |
             +-----+------------+
             |  1  |   ENGL 4   |
             +-----+------------+
             |  1  |   MATH 2   |
             +-----+------------+
             |  13 |   HIST 7   |
             +-----+------------+
             |  5  |   HIST 7   |
             +-----+------------+
             |  5  |   CIL 102  |
             +-----+------------+ 

Note justifying the addition of such a table: (This can be skipped by the uninterested reader.) It is true that the information represented by the Enrolled-In table could be embedded within the Student table, assuming that we added an attribute for CourseID to that table. However, to do so would result in the duplication of much data, because, for each course a particular student was enrolled in, her ID, Name, Sex, and Class values would have to be repeated. For example, the fact that Mary was enrolled in both CIL 102 and MATH 2 would require that both of these rows/tuples appear in our expanded Student table:

   ID     Name      Sex   Class   Enrolled-In
+------+----------+-----+-------+-------------+
|   2  |  Mary    |  F  |   2   |   CIL 102   |
|   2  |  Mary    |  F  |   2   |   MATH 2    |
+------+----------+-----+-------+-------------+ 

Duplication of data is undesirable for at least two reasons. One is that it takes more storage space than necessary. Another is that it makes the problem of maintaining "data integrity/consistency" much more difficult. Suppose, for example, that Mary moves on to Class 3. Then it becomes necessary to modify every tuple storing data regarding Mary. If, for some reason, some tuples are modified but others are not, we have an inconsistency in the data.

To avoid the need to maintain multiple tuples representing Mary, you might suggest that the Enrolled-In field be of type set of string, rather than just string. That way, the value { "CIL 102", "MATH 2" } could be assigned to the Enrolled-In field of the tuple representing Mary, indicating that the two courses in which Mary is enrolled are CIL 102 and MATH 2

This is a reasonable suggestion; however, for technical reasons it has traditionally been a rule of the relational model that all attributes must be viewed as being atomic, meaning, for example, that if an attribute has a value that is a set, its individual members cannot be extracted/identified by any operation of the relational model. In the case of Mary's courses, this would not be acceptable, because it would prevent us, for example, from using relational operations that could help us answer a query such as "List the names of all students enrolled in the course with ID C/IL 102."
End of note.

Consider the following informal query:

Report which students are enrolled in which courses.

A somewhat more formal way of saying this is

Produce a list of all ordered pairs (x,y) satisfying the condition that x is (the name of) a student and y is (the course ID of) a course in which x is enrolled.

The correct result would be a table containing the tuples in Enrolled-In, except with each student ID replaced by the corresponding student's name. Clearly, such a table cannot be constructed from the Student and Enrolled-In tables using only the project and restrict operations. (Indeed, neither of those operations allows us to construct a table whose tuples are formed by combining tuples from two (or more) different tables. But that's exactly what we need here, because all information about student names is in the Student table whereas all information about course ID's is in the Enrolled-In table.)

Consider how you might go about constructing an answer to this query. Most likely, you would scan through the tuples of Enrolled-In; for each such tuple e you would note the value of e[StuID] (i.e., the value in its StuID field) and then scan through the tuples of Student in search of the tuple s satisfying s[ID] = e[StuID] (i.e., having a matching value in its ID field). Upon finding it, you would place into the table under construction the tuple <s[Name],e[CourseID]>.

The crucial concept here is that of combining a tuple in one table with a matching tuple in another table. (In our example, what made one tuple match another was having the same value in their ID and StuID attributes, respectively. In a different query, the matching criterion would be different.)

This capability of combining matching tuples from two different tables is precisely what the join operation provides. Specifically, the SRA expression

JOIN <table> WITH <table> WHERE <join-condition>

yields as its value the table obtained by combining every pair of tuples (from the two indicated tables) that satisfy the join condition. (Each atomic sub-expression within a join condition must compare an attribute in one table to an attribute in the other table.)

In RA, the join operator is a bowtie symbol, but HTML has no similar symbol. So we will use the × operator, and put the join condition as a subscript to its right. Hence, we will write a join in RA as

<table> ×<join-condition> <table>

To illustrate this idea, consider these two tables:

 Table A              Table B

  V   W              X   Y   Z
+---+---+          +---+---+---+
| r | 2 |          | 5 | g | p |
| t | 4 |          | 4 | d | e |
| p | 6 |          | 2 | m | q |
+---+---+          | 4 | t | f |
                   +---+---+---+ 

The result of the expression   JOIN A WITH B WHERE W = X   (or, in the RA notation, A ×W=X B) is the table

  V  W   X   Y   Z
+---+---+---+---+---+
| r | 2 | 2 | m | q |
| t | 4 | 4 | d | e |
| t | 4 | 4 | t | f |
+---+---+---+---+---+ 

Here we combined a tuple t in A with a tuple u in B iff t[W] = u[X]. This (i.e., comparing two attributes for equality) is the most common kind of join condition. However, it's not the only kind. Consider

JOIN A WITH B WHERE W < X

The resulting table consists of all those combinations of tuples t and u from A and B, respectively, in which t[W] < u[X]:

  V   W   X   Y   Z
+---+---+---+---+---+
| r | 2 | 5 | g | p |
| r | 2 | 4 | d | e |
| r | 2 | 4 | t | f |
| t | 4 | 5 | g | p |
+---+---+---+---+---+ 

In case it is necessary to qualify an attribute by specifying in which table it is found (such as would be the case if the join condition involved an attribute name that occurred in both tables being joined), we can qualify it by its table name, as in   JOIN A WITH B WHERE A.W = B.X   (or, in the RA notation, A ×A.W=B.X B).

Returning to our problem of devising a query that asks for the names of students and the ID's of courses in which they are enrolled, our first step is to employ the join operation as follows:

JOIN  Enrolled-In WITH Student  WHERE  StuID = ID
The resulting table would be

 StuID   CourseID     ID     Name      Sex   Class
+-----+------------+------+----------+-----+-------+
|  2  |   CIL 102  |   2  |  Mary    |  F  |   2   |
+-----+------------+------+----------+-----+-------+    
|  2  |   MATH 2   |   2  |  Mary    |  F  |   2   |
+-----+------------+------+----------+-----+-------+
|  1  |   HIST 7   |   1  |  Ann     |  F  |   1   |
+-----+------------+------+----------+-----+-------+
|  7  |   MATH 2   |   7  |  Mary    |  F  |   4   |
+-----+------------+------+----------+-----+-------+
|  1  |   ENGL 4   |   1  |  Ann     |  F  |   1   |
+-----+------------+------+----------+-----+-------+
|  1  |   MATH 2   |   1  |  Ann     |  F  |   1   |
+-----+------------+------+----------+-----+-------+
|  13 |   HIST 7   |  13  |  Mike    |  M  |   2   |
+-----+------------+------+----------+-----+-------+
|  5  |   HIST 7   |   5  |  Jim     |  M  |   1   |
+-----+------------+------+----------+-----+-------+
|  5  |   CIL 102  |   5  |  Jim     |  M  |   1   |
+-----+------------+------+----------+-----+-------+ 

Note that each tuple's first two columns come from a tuple in Enrolled-In and its last four columns come from a matching tuple in Student.

From the table immediately above (resulting from the application of join), we get the desired result simply by omitting every column except those corresponding to Name and CourseID. To do this, we apply project. Hence, the SRA query that we want is

PROJECT Name, CourseID 
FROM (JOIN  Enrolled-In WITH Student
      WHERE StuID = ID
     ) 

Natural Join and Resolving Naming Conflicts

Suppose that the ID attribute in the Student table had been named StuID instead, corresponding to the name of the attribute in Enrolled-In. Then the join operation performed above would have been written (in SRA)

JOIN  Enrolled-In WITH Student WHERE StuID = StuID

This raises a rather sticky issue: In the WHERE clause, one occurrence of StuID refers to the attribute in Enrolled-In whereas the other refers to the same-named attribute in Student. But how can we know that? One answer is to require that, in any comparison of attributes in a join condition, the first (respectively, second) one mentioned must be from the first (resp., second) table mentioned (i.e., the one preceding (resp., following) WITH). (Since every join condition should be composed of such comparisions, possibly combined using boolean operators such as AND and OR) this rule makes sense.)

Another way to resolve the issue is to require that attribute names be qualified where necessary by prefixing them with the name of the table. Using that approach, our join operation would have been written like this:

JOIN  Enrolled-In WITH Student 
WHERE Enrolled-In.StuID = Student.StuID

This resolves the naming issue as it pertains to forming/interpreting this particular query, but it raises a new naming issue with respect to the table that results from evaluating the query: that table will have two different attributes named StuID.

Ah, but this is easily remedied by omitting one of those attributes! After all, in each tuple, the values in the two same-named attributes will be the same. (The join condition guarantees this.)

Indeed, this situation arises so often in doing join operations that a special version of join has been defined, called the natural join, and the corresponding operator is the asterisk, *. So the result of evaluating

Enrolled-In * Student

would be as pictured above, except that, in effect, the two attributes named StuID would be merged into a single attribute of that name. (Well, the picture shows an attribute named ID, but in the context of the present discussion it has been renamed StuID.)

Does this completely resolve the issue of conflicting (or duplicate) attribute names? No!

Suppose that we wanted to join Student with itself, with the join condition stipulating that two tuples should be combined in the case that their StuID values were different but their Name attributes were the same. That is, we want to say something like

JOIN Student WITH Student
WHERE StuID != StuID  AND  Name = Name 

Even if we ignore any potential ambiguities with respect to attribute names in the query, the problem is that the resulting table must have two attributes named StuID because, in each tuple, their values will be different! (And hence we cannot simply merge the two columns into one.)

To resolve this, we introduce the notions of aliasing and (locally) renaming attributes.


More stuff to appear here soon.

Set-theoretic Operations

In addition to the project, restrict, and join operations, it is often convenient employ the set-theoretic operations of union, intersection, and difference. Each of these operations applies to two relations, which, in order to make sense, must be structurally alike. That is, the two relations must have the same number of attributes and corresponding attributes must be of the same domain. (Note: A relational model purist (such as C.J. Date) would probably argue that, to apply a set-theoretic operation to two tables, they must have precisely the same collection of attributes (in terms of not only domains but also names). Under this restriction, using the attribute-renaming features discussed above would often be necessary before applying a union, intersection, or difference operator. End of note.)

Returning to the problem of finding students who are either female or else both male and in the sophomore class, we could have written the solution as follows:

Females    ← RESTRICT Student WHERE Sex = 'F'
Males      ← RESTRICT Student WHERE Sex = 'M'
Sophomores ← RESTRICT Student WHERE Class = 2
Result     ← Females ∪ (Males ∩ Sophomores) 

For another example, suppose that we wanted to produce the list of students who are not enrolled in CIL 102. Your first inclination might be to produce a table containing the ID's of students enrolled in a course other than CIL 102 and then to join that with Student in order to obtain the corresponding names:

(1) Non_CIL102 ← PROJECT StuID FROM (RESTRICT EnrolledIn WHERE CourseID ≠ 'CIL 102')
(2) Result     ← PROJECT Name 
                 FROM (JOIN Student WITH Non_CIL102 WHERE Student.ID = Non_CIL102.StuID)
But this is incorrect, as it will yield the name of every student who is enrolled in some course other than CIL 102. In particular, students not enrolled in any courses will be (incorrectly) omitted and students enrolled in CIL 102 as well as at least one other course will be (incorrectly) included.

A correct approach would be to produce a table containing the ID's of all students and another table containing the ID's of students enrolled in CIL 102, and then to take the difference of the two, which will be a table including precisely the ID's of students not enrolled in CIL 102. This idea leads to the query

(1) All_ID         ← PROJECT ID FROM Student
(2) CIL102_ID      ← PROJECT StuID AS ID FROM (RESTRICT EnrolledIn WHERE CourseID = 'CIL 102')
(3) Non_CIL102     ← All_ID - CIL102_ID
(4) Result         ← PROJECT Name 
                     FROM (JOIN Student WITH Non_CIL102 WHERE Student.ID = Non_CIL102.ID)

Notice that in step (2) we projected on the StuID attribute but we specified (via the clause AS ID) that that attribute should be named ID in the resulting table. Having done that, we are free in step (3) to take the difference of the two specified tables because their attribute names (and domains, of course) coincide.

In order to avoid the JOIN at the end (which was simply for the purpose of recovering the names associated to the ID's in the Non_CIL102 table, we could have included the names in the intermediate results:

(1) All_ID         ← PROJECT ID, Name FROM Student
(2) CIL102_ID      ← PROJECT StuID AS ID, Name FROM (RESTRICT EnrolledIn WHERE CourseID = 'CIL 102')
(3) Non_CIL102     ← All_ID - CIL102_ID
(4) Result         ← PROJECT Name FROM Non_CIL102

Aggregate Operations

Even with all the capabilities provided by the operations discussed above, there are still some fundamental types of queries that cannot be answered. Suppose, for example, that you simply wanted to know the number of students in freshman class (i.e., class 1). You can use restrict to obtain all tuples from the Student relation having value 1 in the Class attribute. But that's more information than you want! You simply want to know how many such students there are! Or suppose that you wanted to compute the average of the values in the Class attribute? (A more likely scenario would be that you had a Salary, or Number_of_Dependents, attribute in some relation, and you wanted to compute an average.) To answer queries of the form How many ...? or What is the average of ...? and other similar queries that call for some kind of statistical summary of data, RA includes what are called aggregate functions. These include COUNT, SUM, AVG, MIN, and MAX. (Elmasri & Navathe spell out the whole word in the case of the last three, but we'll be content with using these abbreviations.)

As an example, suppose that our Student table included the attribute SAT_Score and that we wanted to find the average, minimum, and maximum values in that column. In SRA, we would write this query as

AGGREGATE AVG(SAT_Score), MIN(SAT_Score), MAX(SAT_Score)
FROM Student
The result will be a one-tuple table. Deriving its attribute names from the functions applied and the attributes to which they were applied, the table might look like this:
   AVG_SAT_Score   MIN_SAT_Score  MAX_SAT_Score
  +--------------+---------------+-------------+
  |   1040.75    |     913       |    1430     |
  +--------------+---------------+-------------+
If we wanted this data only for seniors, say, we would have nested an application of RESTRICT in the FROM clause, as follows:
AGGREGATE AVG(SAT_Score), MIN(SAT_Score), MAX(SAT_Score)
FROM (RESTRICT Student WHERE Class = 4)

Suppose that we want to know how many female seniors have SAT scores above 1100, and the average SAT score among those students. We could write this query like this:

AGGREGATE COUNT(*), AVG(SAT_Score)
FROM (RESTRICT Student
      WHERE Sex = 'F' AND SAT_Score > 1100)

The result might look something like this:

    COUNT*   AVG_SAT_Score
  +--------+---------------+
  |    57  |    1240.75    |
  +--------+---------------+

When, as above, an asterisk is specified as the argument of COUNT, it means that the resulting value should be a count of all the tuples in the "target" table (i.e., the table to which the operation is being applied). Had we used, say, COUNT(SAT_Score) instead, the resulting value would be the number of distinct values in the SAT_Score column (of the target table, which, in our example, includes only tuples corresponding to female seniors). If the target table has a key attribute, say K, then COUNT(*) and COUNT(K) are equivalent.

Suppose that we wanted to know how many male students there were, and the average of their SAT scores, and similarly for females. From the examples above, it should be obvious that we could devise two separate queries to ascertain this information, one for males and another for females. It would be nice if we could do it all with a single query. And indeed we can, because we can use a grouping feature to partition the tuples of the target table into separate groups (according to the values in one or more attributes), in which case the aggregate functions are applied to each group of tuples separately.

For example, the SRA query

AGGREGATE Sex, COUNT(*), AVG(SAT_Score)
FROM Student
GROUP BY Sex
would produce a table such as the following:
  Sex   COUNT*   AVG_SAT_Score
+-----+--------+---------------+
| 'F' |  145   |    1040.5     |
| 'M' |  137   |    1038.2     |
+-----+--------+---------------+

What the first tuple of this table says (literally) is that the Student table has 145 tuples having value 'F' in the Sex column, and the average of the SAT_Score values in these tuples is 1040.5. The second tuple's interpretation is analogous, of course.

Had we omitted the mention of the Sex attribute in the first line of the query, we would have obtained the same table, except without the Sex attribute. Hence, we would have been able to conclude that there were 145 students of one sex and 137 of the other (each having the reported SAT score averages), but we wouldn't know which was which. For this reason, we usually include the grouping attribute(s) in the result. On the other hand, it would not make sense to mention a non-grouping attribute on the first line of an AGGREGATE query, except as the argument of one of the aggregate functions. For example, the following query makes no sense, because there is no way to supply a meaningful value for Name in each tuple of the result.

AGGREGATE Name, Sex, COUNT(*), AVG(SAT_Score)
FROM Student
GROUP BY Sex

As suggested by the phrase "grouping attribute(s)" above, there can be more than one grouping attribute. Thus, for example, if we wanted to group students by sex and class, we could have said

AGGREGATE Sex, Class, COUNT(*), AVG(SAT_Score)
FROM Student
GROUP BY Sex, Class
The result would then have looked like this:
  Sex   Class   COUNT*   AVG_SAT_Score
+-----+-------+--------+---------------+
| 'F' |   1   |   37   |    1023.5     |
| 'F' |   2   |   40   |    1054.1     |
| 'F' |   3   |   29   |    1033.7     |
| 'F' |   4   |   35   |    1019.5     |
| 'M' |   1   |   40   |    1044.7     |
|  .  |   .   |    .   |      .        |
|  .  |   .   |    .   |      .        |
+-----+-------+--------+---------------+

As for the RA notation for aggregates, here are what the (valid) queries above (which were given in SRA notation) would look like in RA notation:

AVG(SAT_Score), MIN(SAT_Score), MAX(SAT_Score)(Student)

AVG(SAT_Score), MIN(SAT_Score), MAX(SAT_Score)Class=4(Student))

COUNT(*), AVG(SAT_Score)Sex = 'F' AND SAT_Score > 1100(Student))

SexSex, COUNT(*), AVG(SAT_Score)(Student)

Sex,ClassSex, Class, COUNT(*), AVG(SAT_Score)(Student)

Exercises

For each of the following English-language queries, devise an "equivalent" query in either or both of the RA or SRA notations. A few of the queries assume that Teaches is a third table in the database, with attributes Course_ID and Faculty_ID, with each tuple having the meaning that the specified course is taught by the specified faculty member.

1. List the sex and class of every student named "Chris".
2. List the course IDs of all courses in which a student named "Chris" is enrolled.
3. List the names of all students enrolled in the course CIL 102.
4. List the ID's of students enrolled in a course taught by a faculty member whose ID is "Sarek".
5. List the name and class of any student enrolled in a course taught by a faculty member whose ID is "Sarek".
6. List the names of all students enrolled in a course in which a student named "Chris" is enrolled.
7. List the ID's of all courses in which there are enrolled students from different classes. (In our example, CIL 102 would be such a course, because both Mary (class 2) and Jim (class 1) are enrolled in it.)
8. List the number of students enrolled in the course CIL 102.
9. For each student, list her/his name and the number of courses in which (s)he is enrolled.
10. For each course, list its ID and the number of students enrolled in it.
11. For each course, list its ID, the ID of the faculty member teaching it, and the number of students enrolled in it.
12. For each course taught by Knuth, list its ID together with the number of students who are enrolled in it.
13. Find the average # of students enrolled in each course.
14. For each faculty member, find the average # of students enrolled in the courses (s)he is teaching.