How SQL Table Joins Are Formed Using Primary and Foreign Keys
When you normalize a database, you deliberately spread data across multiple tables so that each table represents a single, well-defined entity. That design improves data quality, but it also means answers to real business questions are rarely found in just one table. You use SQL joins to re-assemble related rows at query time so the database stays normalized while your queries still return complete, business-friendly results.
This lesson introduces how table joins are formed in SQL using a concrete example from the Stories on CD project. The same techniques apply across modern relational database systems (Oracle Database, PostgreSQL, SQL Server, MySQL, and cloud platforms that implement ANSI SQL join syntax).
Table Joins Theory
In a typical application, different tables store different aspects of the domain:
CUSTOMER stores information about customers.
ORDER stores information about customer orders.
LINE ITEM stores which products appear on each order.
CD stores information about CDs being sold.
DISTRIBUTOR stores information about the companies that supply those CDs.
A single business question—such as “Which distributor supplied this CD on this customer’s order?”—requires data from several of these tables. SQL joins are the mechanism that combines rows from multiple tables based on related key values.
Modern SQL defines several join types, including:
INNER JOIN – returns only matching rows in both tables.
LEFT OUTER JOIN – returns all rows from the left table plus any matches from the right.
RIGHT OUTER JOIN – returns all rows from the right table plus any matches from the left.
FULL OUTER JOIN – returns all rows from both tables, with matches where they exist.
In this lesson, we focus on the fundamentals of forming joins, especially inner joins using primary and foreign keys.
How Are Table Joins Formed in SQL?
At its core, a join combines rows from two tables whenever the values in the join columns match. Typically:
One table contributes the primary key value.
Another table contributes a matching foreign key value.
The database engine uses this relationship to produce a result set that looks like a single, wider table containing columns from both sources.
A common, ANSI-compliant pattern for joining two tables is:
SELECT column_list
FROM table1 AS t1
JOIN table2 AS t2
ON t1.key_column = t2.key_column;
-- Example using the CD and DISTRIBUTOR tables
SELECT c.CDNo,
c.Title,
d.DistName
FROM CD AS c
JOIN DISTRIBUTOR AS d
ON c.DistID = d.DistID;
Let’s break down the join syntax:
SELECT lists the columns you want in the result (from one or more tables).
FROM identifies the first (or “left”) table in the join.
JOIN identifies the second table and the type of join (INNER, LEFT, RIGHT, FULL, etc.).
ON specifies the join condition, typically matching a foreign key in one table to a primary key in another.
Using explicit JOIN ... ON ... syntax is the modern standard. It makes queries easier to read, easier to maintain, and it keeps join conditions separate from filtering conditions in the WHERE clause.
When designing a relational database, a key rule is that each table should represent a single entity type. You still need to answer questions that span multiple entities—such as “Which CDs has this customer ordered, and from which distributor?” Rather than duplicate data across tables, you:
Assign each table a primary key.
Use foreign keys to reference related rows in other tables.
Use SQL joins at query time to combine the related rows.
This approach preserves normalization and reduces redundancy while still allowing you to retrieve rich, cross-table views of your data.
The following ER diagram shows the entities used in the course project and their key relationships.
📘 Entities and Key Attributes
Below is each table and its key attributes (including PK and FK indicators as shown in the diagram).
CUSTOMER
Primary Key: CustID
Attributes:
CustID PK
CustLast
CustFirst
CustStreet
CustCity
CustState
CustZip
CustPhone
ORDER
Primary Key: OrderNo
Foreign Key: CustID → CUSTOMER(CustID)
Attributes:
OrderNo PK
CustID FK
OrderDate
LINE ITEM
Composite Primary Key: (OrderNo, CDNo)
Composite Foreign Keys:
OrderNo → ORDER(OrderNo)
CDNo → CD(CDNo)
Attributes:
OrderNo CPK / FK
CDNo CPK / FK
Quantity
SellingPrice
LineCost
Shipped
CD
Primary Key: CDNo
Foreign Key: DistID → DISTRIBUTOR(DistID)
Attributes:
CDNo PK
CDTitle
DistID FK
RetailPrice
AgeGroup
Description
DISTRIBUTOR
Primary Key: DistID
Attributes:
DistID PK
DistName
DistStreet
DistCity
DistState
DistZip
DistPhone
ERD Diagram consisting of CUSTOMER, ORDER, LINE ITEM, DISTRIBUTOR, and CD table.
Entity Diagram
Every entity in the project ER diagram shares at least one attribute with another entity. These shared attributes (such as CustID, OrderNo, CDNo, and DistID) are exactly what you use to form joins between tables.
In this lesson we focus on the relationship between CD and DISTRIBUTOR. Both tables contain DistID:
In DISTRIBUTOR, DistID is the primary key.
In CD, DistID is a foreign key pointing back to DISTRIBUTOR.
Because every CD has exactly one distributor, joining these tables on DistID gives you a clean one-to-many relationship: one distributor, many CDs.
📀 CD Table – Attributes
The CD table has three key attributes:
Attribute
Description
CDNo
Unique identifier for each CD (Primary Key)
Title
Name or title of the CD
DistID
Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)
CD (Attributes):
CDNo
Title
DistID
🏢 DISTRIBUTOR Table – Attributes
The DISTRIBUTOR table has two attributes:
Attribute
Description
DistID
Unique identifier for each distributor (Primary Key)
DistName
Name of the distributor
DISTRIBUTOR (Attributes):
DistID
DistName
🔗 Relationship Summary
CD.DistID → DISTRIBUTOR.DistID
Each CD is supplied by exactly one distributor.
A distributor may supply many CDs (a one-to-many relationship).
Create a join between the CD and DISTRIBUTOR table.
You can create a join between the CD and DISTRIBUTOR tables whenever the DistID values match. The result set returned by the query behaves like a new table that combines columns from both tables. We often call this a virtual table because it exists only for the duration of the query and is not stored permanently in the database.
Conceptually, the virtual table for this join looks like the illustration below.
📀 CD Table – Attributes
The CD table contains:
Attribute
Description
CDNo
Unique identifier for each CD (Primary Key)
Title
Name or title of the CD
DistID
Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)
CD (Attributes):
CDNo
Title
DistID
🏢 DISTRIBUTOR Table – Attributes
The DISTRIBUTOR table contains:
Attribute
Description
DistID
Unique identifier for each distributor (Primary Key)
DistName
Name of the distributor
DISTRIBUTOR (Attributes):
DistID
DistName
🟨 Virtual Table (Join Result)
The virtual table produced by joining CD and DISTRIBUTOR on DistID contains:
CDNo
Title
DistID
DistName
This is not a physical table in the database; it is the logical result of the join query. Create a VirtualTable from the CD and DISTRIBUTOR table.
Primary Key Field
It is critical that the column you use on at least one side of the join be a primary key. In our example, DistID is the primary key of the DISTRIBUTOR table. That guarantees:
Each DistID value identifies exactly one distributor.
Any CD row that references a given DistID can join to at most one distributor row.
If the value in a supposed key column pointed to more than one distributor, a join would return multiple distributor rows for a single CD, violating the intended one-to-many relationship.
Modern relational databases enforce this consistency through primary key and foreign key constraints. When you design tables with clear keys and then join on those keys, your SQL joins remain predictable, performant, and easy to reason about. In the next lesson, you will explore the different join types and see how inner, left, right, and full outer joins behave in practice.