Database Analysis   «Prev  Next»

Lesson 2 Table joins
Objective Explain how table joins are formed.

How SQL Table Joins Are Formed Using Primary and Foreign Keys

When you normalize a database, you deliberately spread data across multiple tables so that each table represents a single, well-defined entity. That design improves data quality, but it also means answers to real business questions are rarely found in just one table. You use SQL joins to re-assemble related rows at query time so the database stays normalized while your queries still return complete, business-friendly results.

This lesson introduces how table joins are formed in SQL using a concrete example from the Stories on CD project. The same techniques apply across modern relational database systems (Oracle Database, PostgreSQL, SQL Server, MySQL, and cloud platforms that implement ANSI SQL join syntax).

Table Joins Theory

In a typical application, different tables store different aspects of the domain:
  • CUSTOMER stores information about customers.
  • ORDER stores information about customer orders.
  • LINE ITEM stores which products appear on each order.
  • CD stores information about CDs being sold.
  • DISTRIBUTOR stores information about the companies that supply those CDs.
A single business question—such as “Which distributor supplied this CD on this customer’s order?”—requires data from several of these tables. SQL joins are the mechanism that combines rows from multiple tables based on related key values.

Modern SQL defines several join types, including:
  1. INNER JOIN – returns only matching rows in both tables.
  2. LEFT OUTER JOIN – returns all rows from the left table plus any matches from the right.
  3. RIGHT OUTER JOIN – returns all rows from the right table plus any matches from the left.
  4. FULL OUTER JOIN – returns all rows from both tables, with matches where they exist.
In this lesson, we focus on the fundamentals of forming joins, especially inner joins using primary and foreign keys.

How Are Table Joins Formed in SQL?

At its core, a join combines rows from two tables whenever the values in the join columns match. Typically:
  • One table contributes the primary key value.
  • Another table contributes a matching foreign key value.
The database engine uses this relationship to produce a result set that looks like a single, wider table containing columns from both sources.

A common, ANSI-compliant pattern for joining two tables is:

SELECT column_list
FROM   table1 AS t1
JOIN   table2 AS t2
  ON   t1.key_column = t2.key_column;

-- Example using the CD and DISTRIBUTOR tables
SELECT c.CDNo,
       c.Title,
       d.DistName
FROM   CD AS c
JOIN   DISTRIBUTOR AS d
  ON   c.DistID = d.DistID;

Let’s break down the join syntax:
  1. SELECT lists the columns you want in the result (from one or more tables).
  2. FROM identifies the first (or “left”) table in the join.
  3. JOIN identifies the second table and the type of join (INNER, LEFT, RIGHT, FULL, etc.).
  4. ON specifies the join condition, typically matching a foreign key in one table to a primary key in another.
Using explicit JOIN ... ON ... syntax is the modern standard. It makes queries easier to read, easier to maintain, and it keeps join conditions separate from filtering conditions in the WHERE clause.
When designing a relational database, a key rule is that each table should represent a single entity type. You still need to answer questions that span multiple entities—such as “Which CDs has this customer ordered, and from which distributor?” Rather than duplicate data across tables, you:
  • Assign each table a primary key.
  • Use foreign keys to reference related rows in other tables.
  • Use SQL joins at query time to combine the related rows.
This approach preserves normalization and reduces redundancy while still allowing you to retrieve rich, cross-table views of your data.

The following ER diagram shows the entities used in the course project and their key relationships.

ERD Diagram consisting of CUSTOMER, ORDER, LINE ITEM, DISTRIBUTOR, and CD table
📘 Entities and Key Attributes
Below is each table and its key attributes (including PK and FK indicators as shown in the diagram).
  1. CUSTOMER
    • Primary Key: CustID
    • Attributes:
      • CustID PK
      • CustLast
      • CustFirst
      • CustStreet
      • CustCity
      • CustState
      • CustZip
      • CustPhone
  2. ORDER
    • Primary Key: OrderNo
    • Foreign Key: CustID → CUSTOMER(CustID)
    • Attributes:
      • OrderNo PK
      • CustID FK
      • OrderDate
  3. LINE ITEM
    • Composite Primary Key: (OrderNo, CDNo)
    • Composite Foreign Keys:
      • OrderNo → ORDER(OrderNo)
      • CDNo → CD(CDNo)
    • Attributes:
      • OrderNo CPK / FK
      • CDNo CPK / FK
      • Quantity
      • SellingPrice
      • LineCost
      • Shipped
  4. CD
    • Primary Key: CDNo
    • Foreign Key: DistID → DISTRIBUTOR(DistID)
    • Attributes:
      • CDNo PK
      • CDTitle
      • DistID FK
      • RetailPrice
      • AgeGroup
      • Description
  5. DISTRIBUTOR
    • Primary Key: DistID
    • Attributes:
      • DistID PK
      • DistName
      • DistStreet
      • DistCity
      • DistState
      • DistZip
      • DistPhone
ERD Diagram consisting of CUSTOMER, ORDER, LINE ITEM, DISTRIBUTOR, and CD table.

Entity Diagram

Every entity in the project ER diagram shares at least one attribute with another entity. These shared attributes (such as CustID, OrderNo, CDNo, and DistID) are exactly what you use to form joins between tables.

In this lesson we focus on the relationship between CD and DISTRIBUTOR. Both tables contain DistID:
  • In DISTRIBUTOR, DistID is the primary key.
  • In CD, DistID is a foreign key pointing back to DISTRIBUTOR.
Because every CD has exactly one distributor, joining these tables on DistID gives you a clean one-to-many relationship: one distributor, many CDs.


Create a join between the CD and DISTRIBUTOR table
📀 CD Table – Attributes
The CD table has three key attributes:

Attribute Description
CDNo Unique identifier for each CD (Primary Key)
Title Name or title of the CD
DistID Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)

CD (Attributes):
  • CDNo
  • Title
  • DistID

🏢 DISTRIBUTOR Table – Attributes
The DISTRIBUTOR table has two attributes:
Attribute Description
DistID Unique identifier for each distributor (Primary Key)
DistName Name of the distributor

DISTRIBUTOR (Attributes):
  • DistID
  • DistName

🔗 Relationship Summary
  • CD.DistID → DISTRIBUTOR.DistID
  • Each CD is supplied by exactly one distributor.
  • A distributor may supply many CDs (a one-to-many relationship).
Create a join between the CD and DISTRIBUTOR table.

You can create a join between the CD and DISTRIBUTOR tables whenever the DistID values match. The result set returned by the query behaves like a new table that combines columns from both tables. We often call this a virtual table because it exists only for the duration of the query and is not stored permanently in the database.

Conceptually, the virtual table for this join looks like the illustration below.

Create a VirtualTable from the CD and DISTRIBUTOR table
📀 CD Table – Attributes
The CD table contains:
Attribute Description
CDNo Unique identifier for each CD (Primary Key)
Title Name or title of the CD
DistID Distributor ID (Foreign Key referencing DISTRIBUTOR.DistID)

CD (Attributes):
  • CDNo
  • Title
  • DistID

🏢 DISTRIBUTOR Table – Attributes
The DISTRIBUTOR table contains:
Attribute Description
DistID Unique identifier for each distributor (Primary Key)
DistName Name of the distributor

DISTRIBUTOR (Attributes):
  • DistID
  • DistName

🟨 Virtual Table (Join Result)
The virtual table produced by joining CD and DISTRIBUTOR on DistID contains:
  • CDNo
  • Title
  • DistID
  • DistName
This is not a physical table in the database; it is the logical result of the join query. Create a VirtualTable from the CD and DISTRIBUTOR table.

Primary Key Field

It is critical that the column you use on at least one side of the join be a primary key. In our example, DistID is the primary key of the DISTRIBUTOR table. That guarantees:
  • Each DistID value identifies exactly one distributor.
  • Any CD row that references a given DistID can join to at most one distributor row.
If the value in a supposed key column pointed to more than one distributor, a join would return multiple distributor rows for a single CD, violating the intended one-to-many relationship.

Modern relational databases enforce this consistency through primary key and foreign key constraints. When you design tables with clear keys and then join on those keys, your SQL joins remain predictable, performant, and easy to reason about. In the next lesson, you will explore the different join types and see how inner, left, right, and full outer joins behave in practice.

SEMrush Software 2 SEMrush Banner 2