In the first course in this series Database Design, the following was discussed:
identify business objects in your environment,
represent those objects as entities, and
create an Entity-Relationship diagram (ER diagram) that displayed every entity in the database
as well as the relationships among those entities.
By following the process outlined, a sound ER diagram could be translated into efficient database tables to be used by application developers. Even if you follow the ER diagram development procedure from the first course Database Design, you may still create tables that have problems. Some of the challenges you may face during data modeling[1] are:
Deleting data results in the inadvertent deletion of other data,
Updating table data takes too long or is done incompletely,
Tables may contain redundant data,
Searching for specific data takes too long.
Fortunately, there is a way to check your tables to ensure they are designed properly.
Normalization is a way to break large tables into smaller, more efficient tables without losing any information.
Because RDBMSs are built on a solid foundation of set theory, there is a well-defined set of rules you can follow to normalize database tables.
The rest of this module describe those rules.
Insertion, Update and Deletion Anomalies
Insertion, update, and deletion anomalies are significant issues that can arise in database tables, particularly when the database design is not fully normalized or lacks comprehensive integrity constraints. These anomalies can lead to inconsistencies and challenges in maintaining the integrity of the data. Their occurrences are often associated with the initial release of database tables, where the design might not have been rigorously tested or optimized.
Insertion Anomalies: These occur when certain data cannot be inserted into the database due to the absence of other data. For example, in a poorly designed employee database, if an employee does not have an assigned department, it might be impossible to add their details to the database. This limitation could be due to the structure of the table where the department information is mandatory but not applicable for some entries.
Update Anomalies: Update anomalies arise when changes in one piece of data require multiple rows of a database to be updated. If the update is not carried out uniformly, it can lead to data inconsistency. For instance, if a customer's address is stored in multiple tables and it changes, each occurrence of the address must be updated. Failure to do so can result in records with conflicting information.
Deletion Anomalies: These anomalies occur when the deletion of data inadvertently leads to the loss of additional, unintended data. An example might be a database where employee information is stored alongside department information. If the last employee of a department is deleted from the database, this might unintentionally result in the loss of the department information itself, if there are no other employees linked to that department.
These anomalies highlight the importance of thoughtful database design, especially in terms of normalization. Proper normalization, typically up to the third normal form, is often recommended to mitigate these issues. Moreover, establishing rigorous integrity constraints and relationships between tables can further help in preventing such anomalies. However, it is important to note that achieving a balance between normalization and practical usability is crucial, as overly normalized databases can become complex and inefficient for certain types of queries and operations.
The following series of images below illustrates how the normalization process breaks large tables into smaller, more efficient ones.
Normalization Creates Tables
Normalization is a process used in database design to organize data into tables and to minimize data redundancy. The process involves breaking down a large table into smaller, more manageable tables, which are linked together by common attributes. The result of normalization is a set of tables that are well-structured, with each table containing a specific set of related data. The normalization process involves several steps, which are typically referred to as normal forms. The most commonly used normal forms are the first normal form (1NF), second normal form (2NF), and third normal form (3NF). In general, higher normal forms represent more advanced levels of normalization, with each level building upon the previous level.
During the normalization process, each table is designed to represent a single entity or concept, such as a customer, order, or product. The table is then divided into multiple tables, with each table containing only information that is related to the specific entity or concept it represents. This results in tables that are more focused and easier to manage.
For example, consider a table that contains customer information, such as name, address, and phone number, as well as order information, such as order number and date. This table violates the first normal form because it contains multiple values for a single attribute (i.e., customer name and order number). To normalize this table, it would be divided into two tables: one for customer information and one for order information. Each table would contain only the relevant attributes, resulting in more focused and manageable tables.
The process of normalization can be complex and time-consuming, particularly for large and complex databases. However, the benefits of normalization include reduced data redundancy, increased data integrity, and easier maintenance of the database over time.
In conclusion, normalization creates tables by breaking down a larger table into smaller, more focused tables that contain only information related to a specific entity or concept. The result is a well-structured database with reduced data redundancy and increased data integrity.
Database normalization is the process of organizing the columns and tables of a relational database to minimize
data redundancy. It involves decomposing a table into less redundant tables without losing information and defining foreign keys in the old table and referencing the primary keys of the new ones. The objective is to isolate data so that additions, deletions, and modifications of an attribute can be made in just one table and then propagated through the rest of the database using the defined foreign keys. Edgar F. Codd introduced the concept of normalization and what we now know as the First normal form (1NF) in 1970. Codd went on to define the (2NF) Second normal form and (3NF) Third normal form in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974. A relational database table is often described as "normalized" if it meets Third Normal Form. Most Third Normal Form tables are free of insertion, update, and deletion anomalies. The relational model separates the logical design from the physical design: DBMS performance is a matter of
Normalization works hand in hand with the ERD development process you studied in the first course in this series.
If you concentrate on having every entity in your ERD represent a single business object, you will go a long way toward normalizing your database's tables. Remember, entities in the ERD become tables in the database. If every entity represents only a single business object, then every table will represent only a single business object. The next lesson explains the purpose of normalization.
[1]data modeling: The process of creating a data model, which is a conceptual representation of the data structures that are required by a database.