| Lesson 8 |
The first, second, and third normal forms |
| Objective |
Understand what the different normal forms are. |
First, Second, and Third Normal Forms (1NF, 2NF, 3NF Explained with Examples)
What Normalization Achieves — A Brief Connector
The Two Goals of Normalization
Normalization serves two fundamental goals that apply regardless of the domain or the size of the database. The first is eliminating redundant data — storing each fact exactly once so that changes require updating only one row in one table. The second is ensuring that data dependencies make sense — organizing columns so that each attribute belongs to the table whose primary key uniquely determines it, with no attributes stranded in a table where they do not logically belong.
Both goals are achieved progressively through the normal forms. Each normal form addresses a specific class of dependency problem. Reaching the Third Normal Form is the standard target for transactional database design; at that point, the schema is free of the redundancy patterns that cause real-world data corruption.
The Three Anomalies — Quick Reference
The motivation for each normal form is the class of anomaly it eliminates. These were covered in detail in lesson 5 and illustrated with the Customer Orders table in lessons 6 and 7. A brief reference here anchors the lesson 8 examples to the same framework.
An insertion anomaly prevents valid data from being recorded because the table structure requires unrelated data to accompany it. In a table that combines student and course data in a single flat structure, a new course cannot be added to the catalog until at least one student is enrolled in it — the table has no place for a course without a student row.
An update anomaly creates inconsistency when a single real-world fact must be changed in multiple rows simultaneously, and any partial failure leaves the database with conflicting values. If a student's name appears in ten rows and an UPDATE touches only nine, two different names exist for the same student.
A deletion anomaly destroys information unintentionally when deleting a record removes data about a different entity that was stored in the same rows. Deleting a student's enrollment record might be the only record of a course's existence — removing the enrollment silently erases the course from the database.
The Progressive Gate Structure — Why Order Matters
The three normal forms are not independent checks that can be applied in any order. They form a strict progression: a table must satisfy 1NF before it can be evaluated for 2NF, and it must satisfy 2NF before it can be evaluated for 3NF. This gate structure exists because each normal form's rules assume the previous form's guarantees are already in place.
First Normal Form establishes the baseline: every cell contains exactly one atomic value, and every row is uniquely identifiable. Without this baseline, the concepts of functional dependency and partial dependency — which 2NF and 3NF address — cannot be evaluated precisely. A column that contains a comma-separated list of values does not have a well-defined functional relationship with the primary key because it contains multiple values, each of which might have a different relationship.
Second Normal Form assumes 1NF and eliminates partial dependencies — situations where a non-key column depends on only part of a composite primary key. Third Normal Form assumes 2NF and eliminates transitive dependencies — situations where a non-key column depends on another non-key column rather than directly on the primary key. Each step is only meaningful after the previous step's guarantees are established.
First Normal Form (1NF) — Eliminate Repeating Groups
The 1NF Rule
A table is in First Normal Form when three conditions are all satisfied: every column contains atomic values — single, indivisible values of a consistent data type — there are no repeating groups of columns, and every row is uniquely identifiable by a primary key. Atomicity is the core requirement. A column value is atomic when it cannot be decomposed into smaller meaningful parts without losing information about the entity the row describes.
The most common 1NF violation is a multi-valued column — a column that stores a list, an array, or a comma-separated set of values in a single cell. Multi-valued columns are a natural temptation when modeling one-to-many relationships: it seems convenient to store all of a student's courses in one column rather than creating multiple rows. But that convenience comes at the cost of every operation that needs to work with individual course values — inserting a new course, removing one course, or finding all students enrolled in a specific course all require parsing the cell contents rather than using standard SQL predicates.
Before 1NF — The Student Courses Table
The following table stores each student's course enrollments as a comma-separated list in a single Courses column:
| StudentID | Name | Courses |
| 1001 | Alice | Math, Physics |
| 1002 | Bob | Chemistry, Biology |
The Courses column is a multi-valued attribute — each cell contains two course names rather than one atomic value. This single violation produces all three anomalies simultaneously. Adding a new course requires either finding a student to assign it to or modifying the string value in an existing cell. Removing Physics from Alice's enrollment requires string manipulation rather than a DELETE statement. Finding all students enrolled in Math requires a LIKE query against a concatenated string rather than a simple equality check.
After 1NF — Atomic Rows
The 1NF solution expands the multi-valued column into multiple rows — one row per student-course combination:
| StudentID | Name | Course |
| 1001 | Alice | Math |
| 1001 | Alice | Physics |
| 1002 | Bob | Chemistry |
| 1002 | Bob | Biology |
Every cell now contains one atomic value. Each row represents one enrollment — one student in one course. A primary key for this table would be the composite (StudentID, Course) — the combination of student identity and course name uniquely identifies each row.
SQL Consequences of a 1NF Violation
Against the unnormalized table, finding all students enrolled in Math requires a string search:
-- 1NF violation: requires LIKE against a concatenated string
SELECT StudentID, Name
FROM Students
WHERE Courses LIKE '%Math%';
The LIKE predicate with a leading wildcard cannot use an index — it forces a full table scan regardless of table size. It also produces false positives: a search for "%Math%" would match "Mathematics" or "Math Methods" if those strings appear in the Courses column. Against the 1NF table, the same query is a clean equality check:
-- 1NF compliant: standard equality predicate, index-eligible
SELECT StudentID, Name
FROM Students
WHERE Course = 'Math';
What 1NF Does Not Fix
After applying 1NF, the Student Courses table has a new problem: Alice's name appears in two rows. This is redundancy — the same fact (Alice's name) stored in multiple places. This redundancy is a 2NF violation, not a 1NF violation. 1NF eliminated the multi-valued column problem but introduced a partial dependency problem. The 2NF check catches it.
Second Normal Form (2NF) — Eliminate Partial Dependencies
The 2NF Rule and Functional Dependency Notation
A table is in Second Normal Form when it is in 1NF and every non-key attribute is fully functionally dependent on the entire primary key. The second condition only matters when the primary key is composite — made up of more than one column. A partial dependency exists when a non-key attribute depends on only a subset of the composite key rather than the full combination.
Using functional dependency notation: if the composite primary key is (StudentID, CourseID), and StudentName depends only on StudentID — written StudentID → StudentName — then StudentName is partially dependent on the key. It belongs in a table where StudentID alone is the primary key, not in a table where StudentID is only half of the key.
Before 2NF — Composite Key with Partial Dependency
The table below has a composite primary key of (StudentID, CourseID). StudentName depends only on StudentID, not on the full composite key:
| StudentID | CourseID | CourseName | StudentName |
| 1001 | C101 | Math | Alice |
| 1001 | C102 | Physics | Alice |
The functional dependencies present in this table are: (StudentID, CourseID) → CourseName and (StudentID, CourseID) → StudentName — but also StudentID → StudentName alone. That second dependency is the partial dependency. StudentName is determined by StudentID without needing CourseID at all. Because Alice is enrolled in two courses, Alice's name appears twice. An update anomaly is already present in this two-row example — changing Alice's name requires touching both rows simultaneously.
After 2NF — Three Decomposed Tables
The 2NF decomposition separates StudentName into its own table where StudentID is the sole primary key, and separates CourseName into a Course table where CourseID is the sole primary key. The Enrollment table retains only the composite key relationship:
Student Table:
| StudentID | StudentName |
| 1001 | Alice |
Enrollment Table:
| StudentID | CourseID |
| 1001 | C101 |
| 1001 | C102 |
Course Table:
| CourseID | CourseName |
| C101 | Math |
| C102 | Physics |
Alice's name now appears in exactly one row in the Student table. Changing Alice's name requires one UPDATE to one row. Adding a new course to the catalog requires one INSERT into the Course table — no student enrollment is required. The partial dependency has been eliminated.
SQL CREATE TABLE for the 2NF Schema
CREATE TABLE Students (
StudentID INT NOT NULL,
StudentName VARCHAR(100) NOT NULL,
CONSTRAINT pk_students PRIMARY KEY (StudentID)
);
CREATE TABLE Courses (
CourseID CHAR(4) NOT NULL,
CourseName VARCHAR(100) NOT NULL,
CONSTRAINT pk_courses PRIMARY KEY (CourseID)
);
CREATE TABLE Enrollments (
StudentID INT NOT NULL,
CourseID CHAR(4) NOT NULL,
CONSTRAINT pk_enrollments PRIMARY KEY (StudentID, CourseID),
CONSTRAINT fk_student FOREIGN KEY (StudentID) REFERENCES Students (StudentID),
CONSTRAINT fk_course FOREIGN KEY (CourseID) REFERENCES Courses (CourseID)
);
The Enrollments table has a composite primary key (StudentID, CourseID) — the combination of student and course uniquely identifies each enrollment. Both columns are also foreign keys, enforcing referential integrity: no enrollment can reference a student or course that does not exist in their respective tables.
SQL Consequences of a 2NF Violation
Against the pre-2NF table, retrieving a list of unique students requires deduplication:
-- 2NF violation: DISTINCT required to compensate for partial dependency redundancy
SELECT DISTINCT StudentID, StudentName
FROM Enrollments;
-- 2NF compliant: Student table contains one row per student by design
SELECT StudentID, StudentName
FROM Students;
The DISTINCT keyword in the first query is compensating for a schema problem — every developer who queries the pre-2NF table must know to use it, and every developer who forgets gets the wrong row count without an error message.
Third Normal Form (3NF) — Eliminate Transitive Dependencies
The 3NF Rule and Transitive Dependency Notation
A table is in Third Normal Form when it is in 2NF and no non-key attribute is transitively dependent on the primary key through another non-key attribute. A transitive dependency exists when A → B and B → C, making C depend on A only indirectly — through the intermediate attribute B. The dependency chain passes through a non-key attribute rather than running directly from the primary key to the dependent attribute.
Using functional dependency notation for the employee example: EmployeeID → Department and Department → DeptLocation. Therefore EmployeeID → DeptLocation transitively through Department. DeptLocation is a fact about Department, not a fact about the employee — it belongs in a Department table where Department is the primary key.
Before 3NF — Employee Table with Transitive Dependency
The following table stores department location alongside employee data:
| EmployeeID | Name | Department | DeptLocation |
| 2001 | John | HR | 1st Floor |
| 2002 | Alice | IT | 3rd Floor |
DeptLocation depends on Department, not directly on EmployeeID. If the HR department moves from the 1st Floor to the 2nd Floor, every row where Department = 'HR' must be updated. In a real employee table with hundreds of HR employees, that is hundreds of rows to update simultaneously. Any partial failure produces inconsistent location data — some HR employees show "1st Floor" while others show "2nd Floor" for the same department. A new department cannot be added to the system until at least one employee is assigned to it, because DeptLocation has no home without an employee row.
After 3NF — Employee and Department Tables
The 3NF decomposition extracts DeptLocation into a Department table where Department is the primary key:
Employee Table:
| EmployeeID | Name | Department |
| 2001 | John | HR |
| 2002 | Alice | IT |
Department Table:
| Department | DeptLocation |
| HR | 1st Floor |
| IT | 3rd Floor |
DeptLocation now appears in exactly one row per department. Moving HR from the 1st Floor to the 2nd Floor requires one UPDATE to one row in the Department table — regardless of how many employees are in HR. Adding a new department requires one INSERT into the Department table with no employee required. The transitive dependency is eliminated.
SQL CREATE TABLE for the 3NF Schema
CREATE TABLE Departments (
Department VARCHAR(50) NOT NULL,
DeptLocation VARCHAR(100) NOT NULL,
CONSTRAINT pk_departments PRIMARY KEY (Department)
);
CREATE TABLE Employees (
EmployeeID INT NOT NULL,
Name VARCHAR(100) NOT NULL,
Department VARCHAR(50) NOT NULL,
CONSTRAINT pk_employees PRIMARY KEY (EmployeeID),
CONSTRAINT fk_department FOREIGN KEY (Department) REFERENCES Departments (Department)
);
The Department column in the Employees table is now a foreign key referencing the Departments table. The database engine enforces that every employee references a valid department, and that no department can be deleted while employees reference it — unless the CASCADE option is specified.
SQL Consequences of a 3NF Violation
Against the pre-3NF table, retrieving department locations requires GROUP BY to avoid duplicate results when multiple employees share a department:
-- 3NF violation: GROUP BY needed to deduplicate department locations
SELECT Department, DeptLocation
FROM Employees
GROUP BY Department, DeptLocation;
-- 3NF compliant: Department table contains one row per department by design
SELECT Department, DeptLocation
FROM Departments;
The GROUP BY in the first query is again compensating for a schema design problem. The second query is simpler, faster, and immune to inconsistency because DeptLocation is stored in one place.
Summary — The Three Normal Forms at a Glance
| Normal Form |
Requirement |
Fixes |
SQL Impact |
| 1NF |
Atomic values only, no repeating groups |
Remove multi-valued fields |
Eliminates LIKE string-search workarounds; enables standard WHERE equality |
| 2NF |
Full dependency on entire primary key |
Eliminate partial dependencies |
Eliminates DISTINCT workarounds for deduplication; enables single-row updates |
| 3NF |
No transitive dependencies |
Break non-key dependencies |
Eliminates GROUP BY workarounds; reduces multi-table UPDATE risk |
Each normal form eliminates a specific SQL workaround that developers would otherwise need to apply to compensate for a schema design problem. A schema in 3NF requires no workarounds — every query can be written directly against the structure the schema provides.
Beyond 3NF — A Brief Note on BCNF
The Boyce-Codd Normal Form (BCNF), defined by Edgar F. Codd and Raymond F. Boyce in 1974, is a slightly stronger version of 3NF that addresses a narrow class of anomaly that can remain in 3NF tables when a table has multiple overlapping candidate keys. For the vast majority of table designs encountered in transactional database development, a table that satisfies 3NF also satisfies BCNF — the distinction only becomes relevant in complex multi-key schemas.
Fourth Normal Form (4NF) and Fifth Normal Form (5NF) address increasingly rare structural problems — multi-valued dependencies and join dependencies — that arise in specialized data modeling scenarios. These forms are rarely pursued in standard transactional database design because the anomalies they prevent are uncommon and the decompositions they require can increase query complexity without a proportionate benefit in data integrity.
For the SQL work covered in this course, Third Normal Form is the correct target. The three lessons that preceded this one — normalization overview, the unnormalized Customer Orders table, and the three-table decomposition — have now been formalized into the 1NF, 2NF, and 3NF rules this lesson covered. The next lessons in this module review the relational and flat-file database concepts covered so far, preparing you for the SQL querying work that follows.
