Lesson 2	Project review
Objective	Prepare for the course project.

Complex Queries for use against a SQL Engine

When you create more complex queries for use against a SQL engine, you will find yourself using the different clauses and techniques we have outlined throughout this course. Just pulling rows from a table, as the case is with a SELECT * FROM... approach, is not going to be enough as you need to start working from larger, more diverse databases and tables. That is where the different techniques come into play. You will need to consider the results to determine what the logical breaks are in the information. Look for ways to group things, ways to eliminate some information from the results set, and so on. Each of the sidebars below contains a quick review of the main topics we have covered in this course. Feel free to skip them if you remember the general concepts, and refer back to them if you need some guidance during the course project.

GROUP BY Statement

The GROUP BY clause will help you break up results based on column values. GROUP BY will let you use some of the functions against logical groupings of the information in your tables. You can also use it to form logical breaks in the information in the results set. For example, if you're working with a sales history table, you can use GROUP BY to help in breaking the information up by state. You can then subtotal the sales values for a state:

SELECT * FROM Sales 
GROUP BY State

The GROUP BY clause, which groups results according to the parameters set forth in the clause, finds out information about a particular record, whereas aggregation summarizes more than one record. This module examines

the COUNT() function, which counts records;
the SUM() function, which adds the value of records together;
the AVG() function, which finds averages; and finally, the MAX() and MIN() functions, which find the lowest and highest values in a set of records.

This section examines the GROUP BY clause, which is used in conjunction with the SELECT statement. It allows you to group identical data into one subset rather than listing each record. The GROUP BY clause is at its most powerful when used with SQL's summarizing and aggregating functions, which are covered in the next section. The aim of this module is to get a handle on how GROUP BY works and how to use it more effectively. Begin by looking at how GROUP BY can answer the question,

Which states do members of the film club live in?

The answer does not require a list of every member and the state they live in; you simply want a list of the specific different states. Use the GROUP BY clause to answer this question, even though strictly speaking SELECT DISTINCT would work just as well:

SELECT State
FROM MemberDetails
GROUP BY State;

The GROUP BY clause must go after any FROM or WHERE clauses in the SELECT statement. All the columns you want to be grouped must be listed in the GROUP BY column list. For example, the preceding code groups by the State column. If you want to include more than one column in the GROUP BY clause, then separate the columns with commas, in the same way that you would separate columns in a SELECT statement's column list. The preceding SQL produces the results shown in the following table. One of the values in the table is NULL, so you end up with one group that is NULL:

Use Views for Web Reporting

You can use views to create "stored windows" to your data. Views include all the general capabilities of the SELECT statement, and can be updateable or read-only. Remember, views are great tools for reporting. You can make your SELECT statement refined and polished, then create a view, and you will not have to reissue the SELECT statement again. You can simply reference the view.
Views are queries used to provide users with a specific view of the data in the database. In doing so, views serve a variety of purposes. They may tie related tables together and then pull selected fields out that specific users need, or they may pull selected interesting rows of data from a table that contains other rows of no interest to specific users. They may summarize large data sets, returning averages, counts, and the like.
Relational databases are sets of data in tables, when taken alone, are rarely of any use to anyone. Views allow the database administrator (DBA) to pull fields of interest from tables of interest and return a coherent data set useful to some specific user or application.

Normalized Database: A normalized database is created by designing tables for each object or event being modeled, with the fields of each table representing an attribute of the object or event. Once the developer creates the normalized structure, it becomes necessary to allow users to view specific pieces of data. The users rarely understand the structure, so the DBA is often tasked with building the views that various users require. A view is nothing more than a SQL statement that is stored in the database with an associated name. Views allow the DBA to do the following:
1. Structure data in a way that users or classes of users find natural or intuitive
2. Restrict access to the data such that a user can see and (sometimes) modify exactly what they need and no more
3. Simplify the DBA's job by building base data structures that the DBA can use to build other views in a natural progression
4. Summarize data for reports

Subquery Statement

A SQL subquery, also known as a nested query or inner query, is a query embedded within another SQL query. The subquery is used to retrieve data that will be used in the main query as a condition to further restrict the data that is being fetched. Subqueries can be used in various parts of a query, including the SELECT, FROM, WHERE, and HAVING clauses. In the context of filtering results, subqueries are commonly used in the WHERE clause. They allow you to filter the main query's results based on complex conditions that can't be defined using standard SQL operators alone. Consider a scenario where you have a "Sales" table that records the sales of various products in a store. Each row in the table represents a transaction, with columns for transaction_id, product_id, sales_date, and sales_amount. Suppose you want to find all transactions where the sales amount is above the average. The SQL statement with a subquery would look something like this:

SELECT transaction_id, product_id, sales_amount 
FROM Sales 
WHERE sales_amount > (SELECT AVG(sales_amount) FROM Sales);

In this case, the subquery (SELECT AVG(sales_amount) FROM Sales) calculates the average sales amount from the "Sales" table. This subquery is run first and the result is used by the main query as a filter. The main query then returns all transactions from the "Sales" table where the sales amount is greater than this average value. Here, the subquery acts as a dynamic filter that is evaluated at runtime. It allows for more flexible conditions, as the filter value doesn't need to be known in advance and can be calculated based on the data itself. In essence, SQL subqueries provide a powerful mechanism to enhance the precision of your data retrieval, permitting intricate relationships to be defined between tables and offering more flexibility in constructing complex queries.

What are Subqueries?
Subqueries are regular queries placed inside parenthesis. Subqueries can be used in different ways and at different locations inside a query: Here is an subquery with the IN operator
```
SELECT columnNames
  FROM tableName1
 WHERE value IN 
(SELECT columnName
   FROM tableName2 
   WHERE condition)
```
Subqueries can also assign column values for each record:
```
SELECT column1 = (SELECT columnName 
FROM tableName WHERE condition),
 columnNames
 FROM tableName
 WEHRE condition
```
When to use a Subquery: There will be times when you need to limit a SELECT statement to values that relate to another table, but a join is not necessarily the answer. In these cases, the subquery comes into play. You can use a SELECT statement as the filtering mechanism to limit the overall results from a query. For example, from the PUBS database, you can say, "Give me all the titles for publishers who are located in CA." The subquery in the query below limits the publisher IDs to only those who live in the state of California:
```
SELECT Title
FROM Titles
WHERE Pub_ID
IN (SELECT Pub_ID
FROM Publishers
WHERE State= 'CA')    
```
The result is a list of titles, but only those published by California publishers.

SQL DISTINCT Keyword and specifying Only Unique Records Be Retrieved

Many times you simply need to filter out duplicated information. For example, with a PUBS query that limits rows to California, you may want to only list each publisher once. You can do this using the DISTINCT keyword, which lets you limit rows returned to only those rows that do not have duplicate values for columns you indicate.

SELECT Title FROM Titles 
  WHERE Pub_ID 
    IN (SELECT DISTINCT Pub_ID 
   FROM Publishers 
   WHERE State= 'CA')

This would filter out any duplicate Publisher rows in California.
Specifying Only Unique Records Be Retrieved

SELECT DISTINCT city
FROM Student

Retrieve specific rows
The other thing that we want to do with a single table is to retrieve just some of the rows. For example, we may want to retrieve information about those students who are doing a science degree or just those students who first enrolled in 2011. Retrieving a subset of the rows is known as a select operation. We need to specify how we will determine which rows we want. We do this by specifying a test condition that, for each row, is either true or false. To find all the science students, we would specify the condition degree = 'Science', while to find all the students entering the university in 2011, the condition would be year = 2011. The condition is checked for each row in turn, and if it is true, then that row is included in the set being retrieved. We can build up more complicated conditions by using operators such as AND, OR, and NOT. For example, if we want just the science students enrolling in 2011, the condition would be

degree = 'Science' and year = 2011.

If we wanted a list of all commerce and arts students (but not any other degree), the condition would be

degree = 'Arts'

OR degree = 'Commerce'.
A select operation is specified in an SQL statement, by using the keyword WHERE followed by the appropriate condition as shown below. The * in the first line means retrieve all the columns or fields for the selected rows.

SELECT *
FROM Student
WHERE degree = 'Science' and year = 2011

One small but important point to bear in mind is that if a field (i.e., degree) has no value, then the truth of a statement such as degree = 'Science' is unknown. SQL queries only return those rows for which the condition statement is known to be true. If we retrieve rows for degree = 'Science' and then retrieve rows for degree < > 'Science' we will miss the rows that have no value in degree because we do not know the value (it might be Science and it might not).
To find those fields which are empty we can use the expression where degree is NULL. Most queries will require a combination of the select and project operations. In this case, the rows are first selected according to the condition, and then the specified columns are retrieved. Rather than seeing all the information about each of our selected students, we may just want to see their ID numbers and names.

Web Reporting Tool

We also talked about views and how you can create stored windows to your data. Views include all the general capabilities of the SELECT statement, and can be updateable or read-only. Remember, views are great tools for reporting. You can get your SELECT statement refined and polished, then create a view, and you will not have to reissue the SELECT statement again. You can simply reference the view.
Views are queries used to provide users with a specific view of the data in the database. In doing so, views serve a variety of purposes. They may tie related tables together and then pull selected fields out that specific users need, or they may pull selected interesting rows of data from a table that contains other rows of no interest to specific users. They may summarize large data sets, returning averages, counts, and the like.
Relational databases are sets of data in tables, when taken alone, are rarely of any use to anyone. Views allow the database administrator (DBA) to pull fields of interest from tables of interest and return a coherent data set useful to some specific user or application.

Normalized Database: A normalized database is created by designing tables for each object or event being modeled, with the fields of each table representing an attribute of the object or event. Once the developer creates the normalized structure, it becomes necessary to allow users to view specific pieces of data. The users rarely understand the structure, so the DBA is often tasked with building the views that various users require. A view is nothing more than a SQL statement that is stored in the database with an associated name. Views allow the DBA to do the following:
1. Structure data in a way that users or classes of users find natural or intuitive
2. Restrict access to the data such that a user can see and (sometimes) modify exactly what they need and no more
3. Simplify the DBA's job by building base data structures that the DBA can use to build other views in a natural progression
4. Summarize data for reports

SQL Query Functions Review

This module covered the functions that you can use in your queries to create new columns of information based on the results of computations. These functions break down into three broad categories:

String- (or character-) based functions that operate on string values. These contain functionalities like pulling the leftmost three characters, adding two strings together to create a new string, and so on.
Numeric functions that work with numbers and are used to summarize, average, and generally calculate values based on values in the table or values hard-coded into the SELECT statement.
Date functions that allow you to manipulate date information, including adding time increments to a known date, calculating differences between two dates, and so on.

Using these date functions against table information will give you natural break points to be used with the GROUP BY capabilities.
Listing 5.3 List the book prices discounted by 10 percent. The derived columns would have DBMSspecific default names if the
AS clauses were removed. See Figure 5.3 for the result.

SELECT title_id, price,
0.10 AS "Discount",
price * (1 - 0.10) AS "New price"
FROM titles;

Listing 5.3. shown above

title_id price Discount New price
-------- ----- -------- ---------
T01 21.99 0.10 19.79
T02 19.95 0.10 17.95
T03 39.95 0.10 35.96
T04 12.99 0.10 11.69
T05 6.95 0.10 6.25
T06 19.95 0.10 17.95
T07 23.95 0.10 21.56
T08 10.00 0.10 9.00
T09 13.95 0.10 12.56
T10 NULL 0.10 NULL
T11 7.99 0.10 7.19
T12 12.99 0.10 11.69
T13 29.99 0.10 26.99

From Clause using Dfferent SQL Dialects

Oracle FROM clause Oracle requires a FROM clause in a SELECT statement and so creates the dummy table DUAL automatically to be used for SELECTing a constant expression; search Oracle documentation for DUAL table. To run Listing 5.1, add a FROM clause that selects the constant value from DUAL:
```
SELECT 2 + 3
FROM DUAL;
```
DB2 FROM Clause: DB2 requires a FROM clause in a SELECT statement and so creates the dummy table SYSIBM.SYSDUMMY1 automatically to be used for SELECTing a constant expression; search DB2 documentation for SYSIBM.SYSDUMMY1. To run Listing 5.1, add a FROM clause that selects the constant value from SYSIBM.SYSDUMMY1:
```
SELECT 2 + 3
FROM SYSIBM.SYSDUMMY1;
```
PostgreSQL FROM clause In older PostgreSQL versions, convert the floating- point number in Listing 5.3 to DECIMAL;
To run Listing 5.3, change the New price calculation in the SELECT clause to:
```
price * CAST((1 - 0.10) AS DECIMAL)
```