Lesson 2	Table joins
Objective	Explain how table joins are formed.

How SQL Table Joins Are Formed Using Primary and Foreign Keys

When you normalize a database, you deliberately spread data across multiple tables so that each table represents a single, well-defined entity. That design improves data quality, but it also means answers to real business questions are rarely found in just one table. You use SQL joins to re-assemble related rows at query time so the database stays normalized while your queries still return complete, business-friendly results.

This lesson introduces how table joins are formed in SQL using a concrete example from the Stories on CD project. The same techniques apply across modern relational database systems (Oracle Database, PostgreSQL, SQL Server, MySQL, and cloud platforms that implement ANSI SQL join syntax).

Table Joins Theory

In a typical application, different tables store different aspects of the domain:

CUSTOMER stores information about customers.
ORDER stores information about customer orders.
LINE ITEM stores which products appear on each order.
CD stores information about CDs being sold.
DISTRIBUTOR stores information about the companies that supply those CDs.

A single business question—such as “Which distributor supplied this CD on this customer’s order?”—requires data from several of these tables. SQL joins are the mechanism that combines rows from multiple tables based on related key values.

Modern SQL defines several join types, including:

INNER JOIN – returns only matching rows in both tables.
LEFT OUTER JOIN – returns all rows from the left table plus any matches from the right.
RIGHT OUTER JOIN – returns all rows from the right table plus any matches from the left.
FULL OUTER JOIN – returns all rows from both tables, with matches where they exist.

In this lesson, we focus on the fundamentals of forming joins, especially inner joins using primary and foreign keys.

How Are Table Joins Formed in SQL?

At its core, a join combines rows from two tables whenever the values in the join columns match. Typically:

One table contributes the primary key value.
Another table contributes a matching foreign key value.

The database engine uses this relationship to produce a result set that looks like a single, wider table containing columns from both sources.

A common, ANSI-compliant pattern for joining two tables is:

SELECT column_list
FROM   table1 AS t1
JOIN   table2 AS t2
  ON   t1.key_column = t2.key_column;

-- Example using the CD and DISTRIBUTOR tables
SELECT c.CDNo,
       c.Title,
       d.DistName
FROM   CD AS c
JOIN   DISTRIBUTOR AS d
  ON   c.DistID = d.DistID;

Let’s break down the join syntax:

SELECT lists the columns you want in the result (from one or more tables).
FROM identifies the first (or “left”) table in the join.
JOIN identifies the second table and the type of join (INNER, LEFT, RIGHT, FULL, etc.).
ON specifies the join condition, typically matching a foreign key in one table to a primary key in another.

Using explicit JOIN ... ON ... syntax is the modern standard. It makes queries easier to read, easier to maintain, and it keeps join conditions separate from filtering conditions in the WHERE clause.

When designing a relational database, a key rule is that each table should represent a single entity type. You still need to answer questions that span multiple entities—such as “Which CDs has this customer ordered, and from which distributor?” Rather than duplicate data across tables, you:

Assign each table a primary key.
Use foreign keys to reference related rows in other tables.
Use SQL joins at query time to combine the related rows.

This approach preserves normalization and reduces redundancy while still allowing you to retrieve rich, cross-table views of your data.

The following ER diagram shows the entities used in the course project and their key relationships.

📘 **Entities and Key Attributes**
Below is each table and its key attributes (including PK and FK indicators as shown in the diagram).

**CUSTOMER**

**Primary Key:** CustID

**Attributes:**

CustID PK

CustLast

CustFirst

CustStreet

CustCity

CustState

CustZip

CustPhone

**ORDER**

**Primary Key:** OrderNo

**Foreign Key:** CustID → CUSTOMER(CustID)

**Attributes:**

OrderNo PK

CustID FK

OrderDate

**LINE ITEM**

**Composite Primary Key:** (OrderNo, CDNo)

**Composite Foreign Keys:**

OrderNo → ORDER(OrderNo)

CDNo → CD(CDNo)

**Attributes:**

OrderNo **CPK / FK**

CDNo **CPK / FK**

Quantity

SellingPrice

LineCost

Shipped

CD

**Primary Key:** CDNo

**Foreign Key:** DistID → DISTRIBUTOR(DistID)

**Attributes:**

CDNo PK

CDTitle

DistID FK

RetailPrice

AgeGroup

Description

**DISTRIBUTOR**

**Primary Key:** DistID

**Attributes:**

DistID PK

DistName

DistStreet

DistCity

DistState

DistZip

DistPhone

ERD Diagram consisting of CUSTOMER, ORDER, LINE ITEM, DISTRIBUTOR, and CD table.

Entity Diagram

Every entity in the project ER diagram shares at least one attribute with another entity. These shared attributes (such as CustID, OrderNo, CDNo, and DistID) are exactly what you use to form joins between tables.

In this lesson we focus on the relationship between CD and DISTRIBUTOR. Both tables contain DistID:

In DISTRIBUTOR, DistID is the primary key.
In CD, DistID is a foreign key pointing back to DISTRIBUTOR.

Because every CD has exactly one distributor, joining these tables on DistID gives you a clean one-to-many relationship: one distributor, many CDs.

Attribute	Description
CDNo	Unique identifier for each CD (Primary Key)
Title	Name or title of the CD
DistID	Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)

Attribute	Description
DistID	Unique identifier for each distributor (Primary Key)
DistName	Name of the distributor

You can create a join between the CD and DISTRIBUTOR tables whenever the DistID values match. The result set returned by the query behaves like a new table that combines columns from both tables. We often call this a virtual table because it exists only for the duration of the query and is not stored permanently in the database.

Conceptually, the virtual table for this join looks like the illustration below.

**📀 CD Table – Attributes**
The CD table contains:

Attribute Description

CDNo Unique identifier for each CD (Primary Key)

Title Name or title of the CD

DistID Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)

**CD (Attributes):**

CDNo

Title

DistID

**🏢 DISTRIBUTOR Table – Attributes**
The DISTRIBUTOR table contains:

Attribute Description

DistID Unique identifier for each distributor (Primary Key)

DistName Name of the distributor

**DISTRIBUTOR (Attributes):**

DistID

DistName

**🟨 Virtual Table (Join Result)**
The virtual table produced by joining CD and DISTRIBUTOR on `DistID` contains:

CDNo

Title

DistID

DistName

This is not a physical table in the database; it is the logical result of the join query. Create a VirtualTable from the CD and DISTRIBUTOR table.

Attribute	Description
CDNo	Unique identifier for each CD (Primary Key)
Title	Name or title of the CD
DistID	Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)

Attribute	Description
DistID	Unique identifier for each distributor (Primary Key)
DistName	Name of the distributor

Primary Key Field

It is critical that the column you use on at least one side of the join be a primary key. In our example, DistID is the primary key of the DISTRIBUTOR table. That guarantees:

Each DistID value identifies exactly one distributor.
Any CD row that references a given DistID can join to at most one distributor row.

If the value in a supposed key column pointed to more than one distributor, a join would return multiple distributor rows for a single CD, violating the intended one-to-many relationship.

Modern relational databases enforce this consistency through primary key and foreign key constraints. When you design tables with clear keys and then join on those keys, your SQL joins remain predictable, performant, and easy to reason about. In the next lesson, you will explore the different join types and see how inner, left, right, and full outer joins behave in practice.