- Data Normalization is a tool or process used to validate and improve
a logical design so that unnecessary duplication of data is
avoided.
- The process of decomposing relations within anomalies to produce
smaller more well-structured relationships
Well Structured Relations
- A well structured relation is one that contains minimal data
redundancy that allows users to create and delete data without causing
inconsistencies
- A Well Structured Relationships is one that AVOIDS
- Insertion Anomalies: Adding new rows forces users
to create duplicate data
- Deletion Anomalies: Deleting rows may cause a loss
of unintended data that may need to be used in other places
- Modification Anomalies: Changing values in a row
changes data in other rows due to duplication
Reminder: Relations
- A named, two-dimensional table of data
- Has rows and columns
- Requirements for a table to be a relation:
- Must have unique name
- Attributes must have unique names
- Order of columns must be irrelevant
- Every row must be unique; ie, cannot have two rows that have the
exact same data for each field
- Every attribute must be atomic
Steps of Normalization
- Step 1 - First Normal Form (1NF)
- Ensure that there are no multi-valued attributes
- Step 2 - Second Normal Form (2NF)
- Essentially just 1NF but now all of the non-key attributes
are dependent on the primary key. No partial dependencies
- Move all partial dependencies to separate tables
- Determinant becomes primary key in new table
- Determinant stays in original table as foreign key
- Step 3 - Third Normal Form (3NF)
- Just 2NF plus NO transitive dependencies
- This just means that no dependent (non-key) attributes should fully
determine other dependent attributes
- How?
- Non-key determinant with transitive dependencies go into their own
new table
- Non-key determinant becomes primary key in new table and stays as
the foreign key in the old table
Functional Dependencies
- One attribute (determinant) determines the other values
(dependents)
- If you know the value of the determinant then you also know the
values of the dependents
- Use arrow notation to denote dependency
e.g. Computer Model -> Manufacturer
Candidate and Primary Keys
- Candidate Key:
- Unique identifier, one of the candidate keys will become the primary
key
- Each non-key field is functionally dependent on every candidate
key
- Primary Key:
- One candidate is chosen based upon business practice, rules and
other considerations within the institution
Partial Functional
Dependencies
- Model Example:
- Primary key is COMPOSITE(OrderID + ProductID)
- OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress,
OrderedQuantity
- ProductID -> ProductDescription, ProductFinish,
ProductStandarPrice, OrderedQuantity
- By this point each field past the arrows are partially functionally
dependent on its respective portion of the primary key
Full Functional Dependencies
- Model Example
- PrimaryKey is COMPOSITE(OrderID + ProductID)
- OrderID, ProductID -> OrderedQuantity
Removing Partial
Dependencies
- We are going to split all of the remaining fields that still have
partial dependencies
- Model Example:
- OrderID, ProductID -> OrderQuantity (3NF)
- ProductID -> ProductDescription, ProductFinish,
ProductStandardPrice (3NF)
- OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress
(2NF)
- We now have 3 tables. One of them “OrderID” still has Transitive
Dependencies, but the other two do not.
Transitive Functional
Dependencies
- Functional dependencies on non-primary keys
- Its called transitive because the primary key is a determinate for
another attribute, which is in turn also a determinate for a third
attribute
- Model Example:
- OrderID -> CustomerID -> CustomerName, CustomerAddress
Removing Transitive
Dependencies
- Simply remove the transitive dependencies by splitting into yet
another table
- Model Example:
- OrderID -> OrderDate, CustomerID(fk) (3NF)
- CustomerID -> CustomerName, CustomerAddress (3NF)