Lesson 1
First, Second, and Third Normal Forms
Relational databases work because they are based on logically sound elements of set theory. Fortunately, you do not have to become an expert in a theoretical branch of mathematics to design efficient databases. If you learn a few of the principles without delving into the underlying theory, you will be a better designer. This module introduces techniques you can use to improve your database designs without worrying about the math.
- Two Approaches to Database Design:
As with many design problems, database design may be performed using two approaches: 1) bottom-up or 2) top-down.
A bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. This approach is not very popular in practice because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point. For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes. In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally (for example, on an invoice, a form, or a report). The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met. The theory described in this module is applicable to both the 1) top-down and 2) bottom-up design approaches, but is more appropriate when used with the top-down approach.
Relational database design ultimately produces a set of relations. The implicit goals of the design activity are a) information preservation and b) minimum redundancy. Information is very hard to quantify, and for this reason we consider information preservation in terms of maintaining all concepts, including attribute types, entity types, and relationship types as well as relationships, which are described using a model such as the ERD model. Thus, the relational design must preserve all of these concepts, which are originally captured in the conceptual design after the conceptual to logical design mapping has occurred. Minimizing redundancy implies minimizing redundant storage of the same information and reducing the need for multiple updates to maintain consistency across multiple copies of the same information in response to real-world events that require making an update.
Bottom-up Design Methodology also called 'Design by Synthesis'
In database design, the terms "bottom-up design methodology" and "design by synthesis" share a similar approach, but they emphasize different aspects of the process. Here's how they relate:
Bottom-up Design Methodology:
- Focuses on building the database from the ground up: This approach starts with the smallest elements, like individual data elements and attributes, and gradually builds them up into more complex structures like tables, relationships, and constraints.
- Emphasizes normalization: Normalization principles are applied at each step to ensure data integrity, reduce redundancy, and improve query efficiency.
- Benefits: Provides a strong foundation for the database, minimizes duplication of data, and allows for flexibility in accommodating future changes.
Design by Synthesis:
- Focuses on combining existing components into a coherent whole: This approach takes existing data sources, entities, or functions and combines them into a unified database structure. It emphasizes the overall functionality and how different parts work together.
- Think of it like building a machine from prefabricated parts: Instead of creating each part from scratch, you assemble existing components to form the desired system.
- Benefits: Reduces development time by leveraging existing resources, improves integration with existing systems, and can be faster to implement for complex or legacy systems.
Overlap and Relationship:
- Both approaches are iterative and involve refining and adjusting the design based on feedback and analysis.
- Bottom-up design can be seen as the foundation for design by synthesis, providing the individual building blocks.
- Design by synthesis utilizes bottom-up principles to ensure the combined components work together effectively and efficiently.
Therefore, "design by synthesis" is essentially an alternative way to describe the "bottom-up design methodology" with a focus on the integration and functionality of the combined components.
Here's a table summarizing the key differences:
Feature |
Bottom-Up Design Methodology |
Design by Synthesis |
Starting Point |
Individual data elements |
Existing data sources or entities |
Emphasis |
Building blocks and normalization |
Integration and functionality |
Benefits |
Strong foundation, flexibility, data integrity |
Reduced development time, faster implementation |
Ultimately, the choice between using a pure bottom-up approach or incorporating elements of
design by synthesis depends on the specific project requirements and the available resources. Both approaches can be effective when used appropriately.
Learning Objectives
After completing the lessons in this module, you should be able to:
- Define the concept of normalization
- Explain the requirements for first normal form (1NF)
- Identify repeating groups
- Explain the requirements for second normal form (2NF)
- Identify functional dependencies
- Describe the limitations of 2NF
Ad Database Design and Theory
Database for Enterprise
When we design a database for an enterprise, the main objective is to
- create an accurate representation of the data,
- relationships between the data, and
- place constraints on the data that is pertinent to the enterprise.
To help achieve this objective, we can use one or more database design techniques.
Entity Relationship or (ER) modeling is one of those tecniques. In this module I describe another database design technique called
normalization.
Normalization is a database design technique, which begins by examining the relationships
(called
functional dependencies) between attributes. Attributes describe some property of the data or of the relationships between the data that is important to the company. Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for these attributes to identify a set of suitable relations that supports the data requirements of the company.
The main purpose of this module is to introduce
- the concept of functional dependencies and
- describe normalization up to Third Normal Form (3NF).
Later we take a more formal look at functional dependencies and also consider later normal forms that go beyond 3NF.
Database normalization prevents
data redundancy and makes the best possible usage of storage.
Purpose of Normalization
The purpose of normalization is to identify a suitable set of relations that support the data requirements of a company's database.
The characteristics of a suitable set of relations include the following:
- The minimal number of attributes necessary to support the data requirements of the enterprise;
- attributes with a close logical relationship (described as functional dependency) are found in the same relation;
- minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys, which are essential for the joining of related relations.
The benefits of using a database that has a suitable set of relations is that
- the database will be easier for the user to access and maintain the data, and
- take up minimal storage space on the computer.
The next lesson introduces normalization.