Boyce-Codd Normal Form (BCNF) is also known as 3.5 Normal Form. It's a stronger version of Third Normal Form (3NF), with additional restrictions to ensure a higher degree of data normalization.
Here's a summary of why it's called 3.5NF:
BCNF builds upon 3NF: It incorporates all the rules of 3NF, which address transitive dependencies.
BCNF goes a step further: It also addresses a specific type of dependency that can still exist in 3NF tables, known as a "dependency involving a non-prime attribute."
Positioning between 3NF and 4NF: In terms of normalization levels, BCNF sits between 3NF and Fourth Normal Form (4NF), addressing more anomalies than 3NF but not as strict as 4NF.
Therefore, the 3.5 designation signifies its position as a refinement of 3NF, but not quite reaching the level of 4NF.
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two conditions:
It should be in the Third Normal Form.
And, for any dependency A → B, A should be a super key.
The second point means, that for a dependency A → B, A cannot be a non-prime attribute, if B is a
prime attribute.
This will be explained using BCNF in terms of relations.
Relationship Example
Consider the following relationship: R(A,B,C,D) and following dependencies:
A -> BCD
BC -> AD
D -> B
The relationship shown above is already in 3rd normal form where the keys are A and BC.
Hence, in the functional dependency, A -> BCD, is the super key[1]. In the second relation BC->AD, BC is also a key. However, in the relation, D -> B, D is not a key. Therefore, we can only break our relationship R into two relationships R1 and R2.
Superkey is a set of Attributes in BCNF
In the context of Boyce-Codd Normal Form (BCNF), a superkey is a set of attributes that uniquely identifies each row in a table, but it may contain additional attributes beyond those strictly necessary for uniqueness. Here's a breakdown of key concepts:
Superkey:
It can uniquely identify all rows in a table.
It can be composed of one or more attributes.
It can potentially contain extraneous attributes that aren't needed for uniqueness.
Candidate Key:
It's a superkey that is minimal, meaning no subset of its attributes can uniquely identify the rows.
A table can have multiple candidate keys.
BCNF Rule:
A table is in BCNF if and only if every determinant (the left-hand side of a functional dependency) is a superkey.
This means that all attributes in a table must directly depend on the entire candidate key, and not on any partial or non-key attributes.
Example:
Consider a table `Bookings` with attributes `Court`, `Start time`, `End time`, and `Rate type`. The superkeys could be:
`{Court, Start time}`
`{Court, End time}`
`{Court, Rate type, Start time}`
`{Court, Rate type, End time}`
`{Court, Start time, End time}` (the entire set of attributes)
The candidate keys are `{Court, Start time}` and `{Court, End time}`, as they are minimal superkeys.
BCNF ensures that:
Tables are free from anomalies and data inconsistencies that can arise from transitive dependencies.
Data integrity is maintained, as it prevents unintended updates or deletions of data.
Database design is more efficient and flexible, as tables are more normalized and less prone to update anomalies.
Boyce-Codd Normal Form (BCNF)
A superkey is a set of fields that contain unique values. You can use a superkey to uniquely identify the records in a table and recall that a candidate key is a minimal superkey. In other words, if you remove any of the fields from the candidate key, it will not be a superkey anymore. A determinant is a field that at least partly determines the value in another field. Note that the definition of 3NF deals with fields that are dependent on another field that is not part of the primary key. Now we are talking about fields that might be dependent on fields that are part of the primary key (or any candidate key).
A table is in BCNF if:
It is in 3NF.
Every determinant is a candidate key.
Four important rules for Codd's Twelve Rule Criteria for a fully-relational DBMS
All information in a relational database is represented explicitly at the logical level[2] in exatly one way, by the values in tables.
Creating tables to hold data allows users to search the tables, link tables which have fields in common, and use those links to combine those records in one or more tables efficiently.
Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name. Being able to find a datum is at the heart of 2NF. Relations in 1NF must have a primary key, though the relation can contain columns that are not functionally determined by the relation's primary key field(s). While it will be theoretically possible to find any value given the datum's table name, primary key value, and column name, it may not be practical to find the information you want if the datum is in a record which is completely unrelated to the datum's source.
Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables. Theoretically, deleting a record about an item should not also delete information about the company that makes it. If a table is not in 3NF, however, it is possible to wipe out information about a manufacturer by deleting the last record referring to a product made by that company.
Integrity constraints specific to a particular relational database must be definable in the relational data sub-language and storable in the catalog, not in the application programs. A minimum of the following two integrity constraints must be supported:
Entity integrity: No component of a primary key is allowed to be null.
Referential integrity: For each distinct non-null foreign key value in a relational database, there must exist a matching primary key value from the same domain
Maintaining referential integrity is an absolute requirement for a relational database. Putting all of a database's tables in 3NF, complete with links between tables based on common fields, allows the RDBMS to check each table's contents against related tables to ensure no important information is lost or changed.
[1]superkey: In the relational data model a superkey is a set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values.
[2]logical level: All information in a relational database is represented explicitly by the values in the tables.