Lesson 2 | GROUP BY clause introduction |
Objective | Understand when to use the GROUP BY clause. |
SQL GROUP BY Clause
The SQL GROUP BY clause can be used in a SELECT statement to collect data across multiple records and group the results by one or more columns.
Syntax:The syntax for the SQL GROUP BY clause is:
SELECT expression1, expression2, ... expression_n,
aggregate_function (aggregate_expression)
FROM tables
WHERE conditions
GROUP BY expression1, expression2, ... expression_n;
Parameters or Arguments
expression1, expression2, ... expression_n |
Expressions that are not encapsulated within an aggregate function and must be included in the GROUP BY Clause at the end of the SQL statement. |
aggregate_function |
This is an aggregate function such as the SUM, COUNT, MIN, MAX, or AVG functions. |
aggregate_expression |
This is the column or expression that the aggregate_function will be used on. |
tables |
The tables that you wish to retrieve records from. There must be at least one table listed in the FROM clause. |
conditions |
These are conditions that must be met for the records to be selected. |
Summarize Data from the Database
One of the most common requests is to summarize data from the database so it appears in a much more usable form.
The SQL language is one that excels at
- selecting rows,
- returning them to the user, and
- allowing the user to make summaries and changes.
It also has some very strong clauses you can add to your SELECT statement that can do some of the work for you when it comes to aggregating, or pulling together, related rows in your results set. The GROUP BY clause lets you tell the SQL engine what rows should be condensed into one, and what elements should be considered when the grouping occurs. This sounds complicated, but it is not. Case Based Reasoning works better for those cases where it is harder to develop a linear model for.
SQL Aggregate Functions
Aggregate functions are used to summarize information from multiple tuples into a single-tuple summary. Grouping is used to create subgroups of tuples before summarization. Grouping and aggregation are required in many database applications, and we will introduce their use in SQL through examples. A number of built-in aggregate functions exist:
- COUNT,
- SUM,
- MAX,
- MIN, and
- AVG
The COUNT function returns the number of tuples or values as specified in a query. The functions SUM, MAX, MIN, and AVG can be applied to a set or multiset of numeric values and return the sum, maximum value, minimum value, and average (mean) of those values. These functions can be used in the SELECT clause or in a HAVING clause (which we introduce later). The functions MAX and MIN can also be used with attributes that have nonnumeric domains if the domain values have a total ordering among one another. We illustrate the use of these functions with sample queries.
Query: Find the sum of the salaries of all employees, the maximum salary, the minimum salary,
and the average salary.
SELECT SUM (Salary), MAX (Salary),
MIN (Salary), AVG (Salary)
FROM EMPLOYEE;
Obtain the SUM of a COUNT in SQL
In SQL, getting the sum of a count typically means aggregating data across multiple groups and then summing up the counts from each group. You can achieve this by using a subquery. The subquery calculates the count for each group, and then the outer query sums these counts. Here’s a general approach on how to do it:
- Create a Subquery to Count Rows: First, you create a subquery that groups the data as needed and counts the rows for each group.
- Sum the Counts in an Outer Query: Then, use an outer query to sum up all the counts obtained from the subquery.
Here’s an example using a hypothetical table called `orders` where we count the number of orders by `customer_id` and then sum these counts:
SELECT SUM(order_count) AS total_orders
FROM (
SELECT COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS counts;
Explanation
- Inner Query: The inner query `SELECT COUNT(*) AS order_count FROM orders GROUP BY customer_id` counts the number of orders for each `customer_id`.
- Outer Query: The outer query `SELECT SUM(order_count) AS total_orders` sums up all the individual counts that were returned by the subquery.
This approach works well if you need to sum counts of distinct groups. You can adjust the grouping criteria in the subquery based on what counts you need to sum.
The next lesson gives an example.