Lesson 6	Subquery statements and options
Objective	Examine two options for using subquery statements.

SQL Subquery Statement Options

When you use the SQL subquery approach, you have two options

You can use an IN clause, as shown in the example shown in the previous lesson, or
you can use the EQUAL qualifier.

Your choice will depend on a simple question:
Question: Are you expecting your subquery statement to return only one value, or several?
Some examples will help explain. First, with subquery statements, it is important to understand that the SELECT statement you put in the parentheses is executed first, then the results are used to work with the outer SELECT statement. Let us take another look at how the subquery statement from the previous lesson works. Examine the series of images below to see the evaluation process for a subquery statement.

1) This is the original SQL query with a sub-SELECT statement — 1) This is the original SQL query using a subquery statement

2) This is the sub-SELECT statement. The Publishers table is queried first to return results from this sub-SELECT statement. — 2) This is the subquery statement. The Publishers table is queried first to return results from this subquery statement, which returns Pub_ID column values that have a corresponding state of "CA". In this case, the Publishers query returns only a single row, and the Pub_ID is 1389.

3) If you substitute the value of 1389 in the outer query, the engine is actually using this query, the engine is actually using this query.

4) An analogous process is the method of completing mathematical calucations by completing calculations in parentheses — 4) An analogous process is the method of completing mathematical calculations by completing calculations in parentheses in parentheses first, working from the inside to the outside.

Subquery Options

The subselect obtains one row of the surrounding SELECT after the next as an parameter with the name 'p'. Within the subselect all columns of the row 'p' are known and may be used. Here the family name from the outside row is used in the subquery to find all persons within the family and the average weight of the family members.
Subselect queries are executed once per row of the surrounding query. Therefore they are much more costly than non-correlated subqueries. There might exist an equivalent solution using JOIN or GROUP BY which works with better performance. The query optimizer of the DBMS internally might rearrange the given subquery into one of the equivalent forms and this does not work in all cases. The distinction between correlated and non-correlated subqueries is universal. It applies also to the other subquery classes.

SELECT id, firstname, lastname, weight,
    (SELECT avg(weight)
     FROM   person sq                          
     WHERE  sq.lastname = p.lastname  
    ) family_average                  
FROM   person p                          
ORDER BY lastname, weight;

The concept of a subquery function is as follows: One select query is nested inside another query, creating a resource otherwise unavailable for searching in a single statement. The subquery allows query consolidation and the task of result set comparisons is placed on the database rather than application code. One way to use this functionality is by locating a value related by column with comparable data in two tables.

Subquery Statement

Using the standard query, you can nest the outer query known as a subquery, which is termed the inner query:

SELECT MemberId FROM MemberDetails
WHERE MemberId = (SELECT MAX(FilmId) FROM Films);

A WHERE clause is added to the outer query, and it specifies that MemberId must equal the value returned by the nested inner query, which is contained within brackets in the preceding example. It is also possible to nest a subquery inside the inner query. Consider the following example:

SELECT MemberId FROM MemberDetails
WHERE MemberId = (SELECT MAX(FilmId) FROM Films
WHERE FilmId IN (SELECT LocationId FROM Location));

In the preceding example, a subquery is added to the WHERE clause of the inner query.

SQL Data Analytics

Example of a Subquery being rewritten as a Join

I'll provide an example of a subquery being rewritten as a join, which is often more efficient in terms of database performance.
Let's consider a simple scenario with two tables: 1) Employees and 2) Departments.
Subquery Example:
Assume we want to retrieve all employees along with their department names, but only for departments located in 'New York'. Here's how you might write this with a subquery:

SELECT e.EmployeeID, e.FirstName, e.LastName, d.DepartmentName
FROM Employees e
WHERE e.DepartmentID IN (
    SELECT DepartmentID 
    FROM Departments 
    WHERE Location = 'New York'
);

Join Equivalent:
Now, let us rewrite this query using a join, which can be more efficient because it allows the database to use indexes more effectively and can often result in a single pass over the data:

SELECT e.EmployeeID, e.FirstName, e.LastName, d.DepartmentName
FROM Employees e
INNER JOIN Departments d ON e.DepartmentID = d.DepartmentID
WHERE d.Location = 'New York';

Explanation:

Subquery:- The subquery first scans the `Departments` table to find all `DepartmentID`s where `Location` is 'New York'.
Then, the main query uses this list to filter the `Employees` table.
1. Join: The `INNER JOIN` combines rows from `Employees` and `Departments` based on the `DepartmentID`, effectively achieving the same result but in a way that often allows for better optimization by the database engine.
2. The `WHERE` clause filters this combined result set for only those departments in 'New York'.

Efficiency:

Join Operations:Joins typically perform better because:
- They allow for index usage on join conditions, which can be very efficient.
- The database can often perform a single pass over the data instead of two separate operations.
- Joins can take advantage of various join algorithms depending on the size of data, indexes present, and other factors.
Subqueries:While subqueries can be useful for complex logic or nested conditions, they often result in:
- Multiple scans of tables (in this case, two separate operations).
- Less opportunity for the database to optimize the query as a whole since it might treat the subquery as somewhat independent.

This transformation from a subquery to a join not only simplifies the SQL in many cases but also leverages the database's capabilities to optimize data retrieval more efficiently. However, the actual performance gain can vary based on the database system, the size of the data, the presence of indexes, and how well the query optimizer can handle each type of query.