Lesson 7 | Use the subquery statement |
Objective | Create subquery statement using the IN keyword. |
Create Subquery Statements using IN Keyword
When you are building your subquery statement, a tip would be to first get the subquery running by working with it against
your table. When you have it returning the results you expect and need, then you can build the outer SELECT statement around it.
This will make it easier to troubleshoot your query, and will make sure you know exactly what the results are that you are expecting.
Subselects allow you to
- query within a result set,
- creatively limit your result set,
- or correlate your results with an otherwise unrelated query in a single call to the database.
Subquery in SQL
A subquery in SQL is a table expression,(for example tx), enclosed in parentheses.
If the table denoted by tx is t, the table denoted by the subquery is t also.
The expression tx cannot be an explicit JOIN expression.
Thus, for example,
( A NATURAL JOIN B )
is not a legal subquery.
By contrast, the following expression is a legal subquery:
SELECT * FROM A NATURAL JOIN B
Subqueries fall into three categories (though the syntax is the same in every case).
The details, partly repeated from earlier chapters, are as follows:
- A table subquery is a subquery that is neither a row subquery nor a scalar subquery.
- A row subquery is a subquery appearing in a position where a row expression is expected. Let rsq be such a subquery; then rsq must denote a table with just one row. Let the table in question be t, and let the single row in t be r; then rsq behaves as if it denoted that row r (in other words, t is coerced to r).
Note: If rsq does not denote a table with just one row, then
- if it denotes a table with n rows (n > 1), an error is raised;
- if it denotes a table with no rows at all, then that table is treated as if it contained just one row, where the row in question contains a null in every column position.
- A scalar subquery is a subquery appearing in a position where a scalar expression is expected. Let ssq be such a subquery; then ssq must denote a table with just one row and just one column. Let the table in question be t, let the single row in t be r, and let the single value in r be v; then ssq behaves as if it denoted that value v (in other words, t is coerced to r, and then r is coerced to v).
Note: If ssq does not denote a table with just one row and just one column, then
- if it denotes a table with m columns (m > 1), an error is raised probably at compile time);
- if it denotes a table with one column and n rows (n > 1), an error is raised (probably at run time);
- if it denotes a table with one column and no rows at all, then that table is treated as if it contained just one row, where the row in question contains a single null.
In the previous lesson, if there was more than one row in the Publishers table with the correct state ("CA"), the query would use commas, as follows:
Create a sub-SELECT statement using the IN keyword
SELECT Title FROM Titles
WHERE pub_id IN ('1389','0736','0877')
Any publisher in the Titles table with a pub_id matching any of the values in the list will appear here.
This is a good way to use one table as a controlling key to another. In this case, the Publishers table is controlling the output for the query against the Titles table.
One final note: be sure to enclose the subquery in parentheses. This is required to make the engine aware of the statement and its start and end point.
IN Operator
The IN operator allows you to specify that you want to match one item from any of those in a list of items.
For example, the following SQL finds all the members who were born in 1642, 1716, or 1777:
SELECT FirstName, LastName, YEAR(DateOfBirth)
FROM MemberDetails
WHERE YEAR(DateOfBirth) IN (1642, 1716, 1777);
The query provides the following results:
FirstName | LastName | YEAR(DateOfBirth) |
Isaac | Newton | 1642 |
Gottfried | Wilhelm | 1716 |
Karl | Gauss | 1777 |
you can also use the IN operator with subqueries.
Instead of providing a list of literal values, a SELECT query provides the list of values.
For example, if you want to know which members were born in the same year that a film in the Films table was released, you would use the following SQL query.
SELECT FirstName, LastName, YEAR(DateOfBirth)
FROM MemberDetails
WHERE YEAR(DateOfBirth)
IN (SELECT YearReleased FROM Films);
Executing this query gives the following results:
FirstName LastName YEAR(DateOfBirth)
Katie Smith 1977
Steve Gee 1967
Doris Night 1997
The subquery
(SELECT YearReleased FROM Films)
returns a list of years from the Films table.
If a member's year of birth matches one of the items in that list, then the WHERE clause is true and the record is included in the final results.
You may have noticed that this is not the only way to get the result. You could have used an INNER JOIN coupled with a GROUP BY statement instead, as shown in the following SQL:
SELECT FirstName, LastName, YEAR(DateOfBirth)
FROM MemberDetails JOIN Films ON YEAR(DateOfBirth) = YearReleased
GROUP BY FirstName, LastName, YEAR(DateOfBirth);
Running this query gives the same results as the previous query.
Question: So which query is best? Unfortunately, there is no definitive answer and this very much depends on the circumstances, the data involved, and the database system involved. Many SQL programmers prefer a join to a subquery and believe that to be the most efficient.
However, if you compare the speed of the two using MS SQL Server 2012, on that system, the subquery is faster by roughly 15 percent. Given how few rows there are in this table example, the difference was negligible in this example, but it might be significant with many records.
Question: What is the best query to use?
You should use the technique that you find easiest, and optimize your SQL code only if problems occur during testing. If you find on a test system with a million records that your SQL runs extremely slow, then you should go back and see whether you can improve your query. There is one area in which subqueries are pretty much essential: when you want to find something is not in a list, something which is very hard to achieve with joins.
For example, if you want a list of all members who were not born in the same year that any of the films in the Films table were released, you would simply change your previous subquery example from an IN operator to a NOT IN operator:
SELECT FirstName, LastName, YEAR(DateOfBirth)
FROM MemberDetails
WHERE YEAR(DateOfBirth) NOT IN (SELECT YearReleased FROM Films);
Subquery Statement Exercise