Lesson 7	Primary keys and unique constraints
Objective	Identify primary keys, unique constraints, and explain their purposes.

Primary Keys and Unique Constraints

We saw in the preceding lessons that data normalization requires one or more columns to uniquely identify a row in a table. Primary key columns cannot contain any null values and the combination of these columns is called the primary key. Only rarely would you have a table that does not contain a primary key. When a primary key is defined in the database, the SQL Server enforces entity integrity by preventing any data from being inserted or updated that violates the primary key. The primary key is violated when data is duplicated in such a way that the values for the primary-key columns appear in more than one row of data.

Unique Constraints: A unique constraint ensures that values entered into the database for the given column or sets of columns contain unique values. A unique constraint is similar to a primary key constraint, except that it does not have to uniquely identify a row of data. Instead, it does not allow duplicate values for a column in more than one row. In addition, a unique constraint can contain null values. Finally, you can create multiple unique constraints within a table, but you can only create a single primary key constraint. When you create a unique constraint, SQL Server 2012 automatically creates a unique index. This unique index is the method by which SQL Server enforces uniqueness within a column or sets of columns. An example of where a unique constraint might be used is in a column with the identity flag. An identity flag is an attribute that ensures that values in a column are automatically incremented and unique within the column. However, this column does not have to be part of the primary key. Below is an example of a unique constraint, on a column named SSN (for Social Security Number). In this example, the SSN is not required because it may not be known at the time of data entry. Therefore, SSN cannot be a primary key. However we can still enforce the uniqueness of the values in this column with a unique constraint, which creates a unique index.

SQL Server 2019

Primary Key Constraints

Before I define what a primary key^[1] actually is, I am going to digress slightly into a brief discussion of relational databases. Relational databases are constructed on the idea of being able to relate data. Therefore, it becomes critical in relational databases for most tables (there are exceptions, but they are very rare) to have a unique identifier for each row. A unique identifier allows you to accurately reference a record from another table in the database, thereby forming a relation between those two tables. This is a wildly different concept from the one used with old mainframe environments or the ISAM databases (dBase, FoxPro, and so on) of the 1980s and early 1990s. In those environments, you dealt with one record at a time and would generally open the entire table, and go one record at a time until you found what you were looking for. If you needed data from a second table, you had to open that table separately and fetch that table's data, and then mix the data programmatically yourself. They must contain unique values (and hence cannot be NULL). Because of their importance in relational databases, primary keys are the most fundamental of all keys and constraints.
Do not confuse the primary key, which uniquely identifies each row in a table, with a GUID, which is a more specific tool, typically used to idthisentify something across all space and time. Although a GUID can certainly be used as a primary key, it incurs some overhead, and is usually not called for when you are only dealing with the contents of a table. Indeed, the only common place that a GUID becomes particularly useful in a database environment is as a primary key when dealing with replicated or other distributed data.

Table consisting of 1) EmployeeID, 2) LastName and 3) SSN

Use of Candidate Keys in Data Modeling

"Candidate keys" are a crucial data modeling concept when designing a table in SQL Server (or any relational database). Here's a detailed explanation:
What are Candidate Keys?

Definition:
- A candidate key is a set of one or more columns in a table that can uniquely identify a row in that table.
- Each candidate key is a potential primary key. However, only one candidate key is chosen to be the primary key.
Uniqueness:
- Candidate keys must contain unique values, ensuring that no two rows in the table have the same values for these columns.
- This uniqueness constraint is essential for maintaining data integrity.
Non-Nullability:
- All columns that make up a candidate key must not allow NULL values.
- This is because NULL values would violate the uniqueness requirement.

Importance in Data Modeling

Identification of Records:
- Candidate keys are used to identify records uniquely within a table.
- They ensure that each record can be distinguished from others, which is fundamental for data retrieval and manipulation.
Data Integrity:
- Enforcing candidate keys helps maintain data integrity by preventing duplicate records.
- This is crucial for ensuring accurate and reliable data in the database.
Basis for Primary Key Selection:
- During the design phase, all candidate keys are identified, and then one is selected to be the primary key.
- The primary key is the main identifier for records and is used for indexing and relationships with other tables.

Example in SQL Server
Consider a table `Employees` with the following columns: `EmployeeID`, `SSN`, `Email`.

Possible candidate keys could be:
- `EmployeeID`: Each employee has a unique employee ID.
- `SSN`: Each employee has a unique social security number.
- `Email`: Each employee has a unique email address.

From these candidate keys, you would choose one to be the primary key. For example, you might select `EmployeeID` as the primary key, while `SSN` and `Email` remain as candidate keys.
SQL Example

CREATE TABLE Employees (
    EmployeeID INT NOT NULL,
    SSN VARCHAR(11) NOT NULL,
    Email VARCHAR(255) NOT NULL,
    Name VARCHAR(255),
    PRIMARY KEY (EmployeeID),
    UNIQUE (SSN),
    UNIQUE (Email)
);

In this example:

`EmployeeID` is chosen as the primary key.
`SSN` and `Email` are defined as unique constraints, representing the other candidate keys.

Summary
Candidate keys are essential in the design of SQL Server tables as they ensure each row can be uniquely identified. They play a critical role in maintaining data integrity and serve as the basis for selecting the primary key of the table.
Now that you have learned how to enforce entity integrity with primary keys, in the next lesson you will learn how to enforce relational integrity with foreign keys.

[1] Primary keys are the unique identifiers for each row.