Lesson 9	The DISTINCT keyword
Objective	Understand how to use the DISTINCT keyword

DISTINCT Keyword in SQL

With the GROUP BY statement, you saw that you can limit results and remove duplicates in grouping situations.
What do you do if you want to remove duplicates but do not need to group the results?
That is where the DISTINCT keyword comes in. DISTINCT is one of the easiest, most straightforward, most underused keywords in SQL.
It is so simple and powerful that you will probably use it often in your queries. Here is an example of the DISTINCT keyword:

SELECT DISTINCT Email 
FROM MailingList

This statement will pull all the unique email values from the MailingList table and return them in the results set. These resultswill be sorted based on email, and you will have no duplicates returned. You can see how this works in the image below:

Removing duplicate email addresses from a table using the distinct keyword

To use the DISTINCT keyword, you simply indicate the column or columns that you want to be unique. In this case, we indicated that the email value should be unique, but you can indicate columns from any table.

SQL Data Analytics

Select distinct values from a Database Table in SQL

To select distinct values from a database table in SQL, you use the DISTINCT keyword. Here's how you can do it:

SELECT DISTINCT column_name
FROM table_name;

Example: Let us say you have a table named employees with a column department. To get all unique department names, you would write:

SELECT DISTINCT department
FROM employees;

With Multiple Columns: If you want to select distinct combinations of more than one column, you can list those columns after DISTINCT:

SELECT DISTINCT column1, column2
FROM table_name;

For example, to get unique combinations of department and job_title from the employees table:

SELECT DISTINCT department, job_title
FROM employees;

Considerations:

Case Sensitivity: Depending on your database system (like MySQL vs. PostgreSQL), DISTINCT might treat 'A' and 'a' as different values. Check your database's documentation for case sensitivity in comparisons.
Performance: Using DISTINCT can be resource-intensive on large datasets because the database needs to sort through all the data to find unique values. It's generally more efficient to select only the columns you need if they're not already indexed.
NULL Values: DISTINCT treats all NULL values as identical. So if your column has multiple NULL entries, DISTINCT will return only one NULL.
Order: The order of the returned results isn't guaranteed unless you use ORDER BY. If you need the results in a specific order:

SELECT DISTINCT column_name
FROM table_name
ORDER BY column_name;

Remember, the exact SQL syntax might vary slightly between different database systems, but SELECT DISTINCT is widely supported across SQL dialects.

Theory behind the "distinct" keyword

If you want to know all the unique values in a record use the DISTINCT keyword. The DISTINCT keyword is added to the SELECT statement's column listing, directly after the SELECT keyword. For example, if someone asks you which cities members come from, you could try a query similar to the following:

SELECT City FROM MemberDetails;

Executing the query gives you the following results:

City
Townsville
Orange Town
New Town
Orange Town
Orange Town
Big City
Windy Village

In the above result set, Orange Town is listed three times because there are three members from that city. But if you simply want a list of the unique places that members live in, you could use the DISTINCT keyword:

SELECT DISTINCT City FROM MemberDetails;

Executing the modified query gives these results:

City
Big City
New Town
Orange Town
Townsville
Windy Village

After the second query is execued, Orange Town is mentioned just once. The DISTINCT keyword works on all columns in combination and all the columns listed in the SELECT statement must be unique. If you change the previous query to include MemberId (which is unique for every single row) and rerun the query, you end up with all the rows:

SELECT DISTINCT City, MemberId FROM MemberDetails;

The results are as follows:

City                 MemberId
Big City             8
New Town             4
Orange Town          5
Orange Town          6
Orange Town          7
Townsville           1
Windy Village        9

Orange Town appears three times because MemberId is unique on each row. Furthermore, using the DISTINCT keyword where one of the columns is always unique is pointless.