| Lesson 6 | Rules for Database Table Rows |
| Objective | List the rules for table rows. |
In the relational model, a table represents a set of entities - customers, orders, products, or any other subject about which the database stores information. A row (also called a record or tuple) represents exactly one instance of that entity. A row in the Customers table represents one customer. A row in the Orders table represents one order.
To keep data consistent and reliably queryable, the relational model applies a set of rules to rows. These rules are not arbitrary conventions - they derive from E.F. Codd's foundational work on the relational model and from the mathematical properties of sets. A relation in the mathematical sense is a set of tuples, and sets have well-defined properties: no duplicates, no ordering dependency, and elements that are fully defined by their attributes. The row rules translate those mathematical properties into practical database design requirements.
Every row in a relational table must satisfy three rules:
Consider a CustomerPhone column that contains the value "555-1111, 555-2222".
This violates the atomic values rule because the column holds two phone numbers in a single cell.
SQL has no standard way to extract, compare, or index individual values within a comma-separated
string. Queries that need to find all customers with a specific phone number become fragile and
unreliable. The correct approach is to move multiple values to a related table - a
CustomerPhones table with one phone number per row and a foreign key referencing the
parent Customers table. Each row in CustomerPhones then contains one
atomic phone number value that SQL can query, index, and join on reliably.
In practice, the no-duplicate-rows rule is enforced by the primary key and by unique constraints. The primary key guarantees that no two rows can share the same identifier. Unique constraints on other columns or column combinations (candidate keys) extend this guarantee to other attributes that should be distinct, such as email addresses or national identification numbers. The RDBMS rejects any INSERT or UPDATE that would create a duplicate primary key value or violate a unique constraint.
This rule is what drives the requirement for primary key design in every table. A table without a primary key is a table whose rows cannot be reliably addressed, and a database containing such tables cannot guarantee the consistency of its data.
The three row rules have direct and specific consequences for how primary keys, uniqueness constraints, and atomic value requirements are implemented in a relational database:
| Aspect | Row Rule | Direct Influence | Benefit |
|---|---|---|---|
| Primary Keys | Each row must be uniquely identifiable | Requires a PK on every table participating in relationships | Reliable row identification and safe joins |
| Uniqueness | No duplicate rows allowed | Enforced via PK and UNIQUE constraints on candidate keys | Prevents ambiguity and data errors |
| Atomic Values | Single value per cell | No lists or arrays in columns; 1NF compliance required | Queryability and normalization |
The importance of primary keys becomes most visible when tables are linked through relationships.
Consider a Customers table and an Orders table. Each order belongs to one customer, and the Orders
table references the Customers table through a foreign key on CustomerID.
If the Customers table lacks a primary key - or if the same CustomerID value appears in
more than one row - then an Order row referencing that identifier could match more than one customer
row. This breaks three critical database capabilities:
A primary key prevents all of these problems by guaranteeing that each customer row is uniquely
identifiable. Even when two customers share the same name, their CustomerID values differ,
and every order references exactly one customer record. The primary key is the anchor that makes
multi-table queries reliable.
Primary key characteristics:
Each row in a table represents one instance of the entity the table describes. A row in the Customers table represents one customer. A row in the Products table represents one product. A row in the Orders table represents one order. The columns in that row store the attribute values that describe that specific instance - the customer's name and address, the product's description and price, the order's date and status.
This concept has a surface resemblance to objects in object-oriented programming, but the analogy should not be pushed too far. The distinction is important:
Many applications bridge this gap using an ORM (object-relational mapper) - a software layer that maps database rows to objects in the application's programming language, and maps object field changes back to SQL UPDATE statements. The ORM makes rows behave like objects from the application's perspective, but the underlying database stores only the data. The row is the persisted record of the entity's state; the behavior lives in the application code.
At a high level, a relational database is a collection of tables, each containing rows and columns. This structure superficially resembles a spreadsheet workbook where each sheet is a table and each cell holds a value. The resemblance is real but the differences are significant.
In a spreadsheet, a cell will accept almost anything typed into it - a number, a string, a date, a formula, or nothing at all. The spreadsheet imposes no constraints on what a cell contains or whether the value is consistent with neighboring cells. This flexibility makes spreadsheets useful for ad hoc analysis but unreliable for storing data that multiple applications and users depend on.
In a relational table, each column is defined for a specific kind of value, and the RDBMS enforces
that definition. A DATE column will reject a string that does not represent a valid date.
A NOT NULL column will reject a blank entry. A foreign key column will reject a value
that does not exist in the referenced table. These constraints are what make a relational database
trustworthy as a shared data store.
The concept that governs what values are valid for a column goes by two related but distinct terms: data type and domain[1].
INTEGER, DECIMAL, VARCHAR, DATE,
TIMESTAMP. The data type is enforced by the RDBMS at the storage level.
Consider two columns, both defined as VARCHAR(100): one stores street addresses, the
other stores color names. Both have the same data type, but they belong to completely different
domains. Storing a color name in an address column or an address in a color column would not violate
the data type but would violate the domain - the value is technically storable but semantically
meaningless in that context.
Domain constraints are covered in detail in the next lesson. The distinction introduced here is that
a well-designed column is defined not just by its data type but by the specific domain of values it
is intended to hold. The domain is what turns a generic VARCHAR column into a
well-defined, meaningful attribute in a business model.
A row exists to keep related column values together. Every column in a row stores an attribute of the same entity instance - the same customer, the same order, the same product. This co-location of related attribute values within a single row is why row uniqueness and primary keys are foundational to relational design. Without unique rows, the values in a row cannot be reliably attributed to a specific entity instance, and the fundamental purpose of the table is undermined.
Before moving on, click the link below to review entities and attributes from the course ER diagram:
Entities Attributes Exercise