Lesson 4 | GROUP BY vs. sorting in SQL |
Objective | Create GROUP BY clause to sort |
GROUP BY Statement versus Sorting in SQL
Create a GROUP BY clause to sort and group results from a query.
When you use the GROUP BY clause, you need to indicate which columns you are pulling from the table. These columns will be used by the engine to determine where the breaks in your groups will be. If you are familiar with sorting, you might think of the GROUP BY clause as a sorting routine.
For example, suppose you have the following names to sort (i.e., alphabetize):
James
John
Jabernathy
Jinkies
Jobs
GROUP BY clause
Because all the names begin with the letter "J," you alphabetize by comparing the second letter, then the third, and so on.
Working with the GROUP BY clause is much the same. When the engine considers your information, it will initially look at the first column.
If the values are the same, the engine considers the second column, and so forth. Using this sorting routine, the SQL engine determines how to group the information. One thing to remember about the GROUP BY clause is that you must include all columns taken from your table(s) in the clause. For example, the following statement is incorrect and will generate either an error or no results at all,
depending on your engine:
SELECT * FROM Authors
GROUP BY au_Lname
The reason is that you are selecting several different columns from the table, but only listing the single column as part of the GROUP BY clause.
Since the engine may have to use the additional columns to resolve the sorting, it needs clear instructions about how to use those other columns.
Apply Aggregate Functions
In many cases we want to apply the aggregate functions to subgroups of tuples in a relation, where the subgroups are based on some attribute values. We may want to find the average salary of employees in each department or the number of employees who work on each project. In such a situation we need to partition the relation into non-overlapping subsets (or groups) of tuples. Each group (partition) will consist of the tuples that have the same value of some attribute(s), called the grouping attribute(s). We can then apply the function to each such group independently to produce summary information about each group. SQL has a GROUP BY clause for this purpose. The GROUP BY clause specifies the grouping attributes, which should also appear in the SELECT clause, so that the value resulting from applying each aggregate function to a group of tuples appears along with the value of the grouping attribute(s).
Query: For each department, retrieve the department number, the number of employees in the department, and their average salary.
SELECT DNo, COUNT (*), AVG (Salary)
FROM EMPLOYEE
GROUP BY DNo;
In the query above, the EMPLOYEE tuples are partitioned into groups, each group having the same value for the grouping attribute DepartmentNumber. Hence, each group contains the employees who work in the same department. The COUNT and AVG functions are applied to each such group of tuples. Notice that the SELECT clause includes only the grouping attribute and the aggregate functions to be applied on each group of tuples. Over the next several lessons the sub-SELECT statements will be examined.
Group By Clause Exercise
Ad SQL Guide