Data Normalization

Data Normalization is a tool or process used to validate and improve a logical design so that unnecessary duplication of data is avoided.
The process of decomposing relations within anomalies to produce smaller more well-structured relationships

A well structured relation is one that contains minimal data redundancy that allows users to create and delete data without causing inconsistencies
A Well Structured Relationships is one that AVOIDS
- Insertion Anomalies: Adding new rows forces users to create duplicate data
- Deletion Anomalies: Deleting rows may cause a loss of unintended data that may need to be used in other places
- Modification Anomalies: Changing values in a row changes data in other rows due to duplication

Step 1 - First Normal Form (1NF)
- Ensure that there are no multi-valued attributes
Step 2 - Second Normal Form (2NF)
- Essentially just 1NF but now all of the non-key attributes are dependent on the primary key. No partial dependencies
  - Move all partial dependencies to separate tables
  - Determinant becomes primary key in new table
  - Determinant stays in original table as foreign key
Step 3 - Third Normal Form (3NF)
- Just 2NF plus NO transitive dependencies
  - This just means that no dependent (non-key) attributes should fully determine other dependent attributes
- How?
  - Non-key determinant with transitive dependencies go into their own new table
  - Non-key determinant becomes primary key in new table and stays as the foreign key in the old table

One attribute (determinant) determines the other values (dependents)
If you know the value of the determinant then you also know the values of the dependents
Use arrow notation to denote dependency e.g. Computer Model -> Manufacturer

Candidate Key:
- Unique identifier, one of the candidate keys will become the primary key
- Each non-key field is functionally dependent on every candidate key
Primary Key:
- One candidate is chosen based upon business practice, rules and other considerations within the institution

Model Example:
- Primary key is COMPOSITE(OrderID + ProductID)
- OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress, OrderedQuantity
- ProductID -> ProductDescription, ProductFinish, ProductStandarPrice, OrderedQuantity
By this point each field past the arrows are partially functionally dependent on its respective portion of the primary key

Model Example
- PrimaryKey is COMPOSITE(OrderID + ProductID)
- OrderID, ProductID -> OrderedQuantity

We are going to split all of the remaining fields that still have partial dependencies
Model Example:
- OrderID, ProductID -> OrderQuantity (3NF)
- ProductID -> ProductDescription, ProductFinish, ProductStandardPrice (3NF)
- OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress (2NF)
We now have 3 tables. One of them “OrderID” still has Transitive Dependencies, but the other two do not.

Functional dependencies on non-primary keys
Its called transitive because the primary key is a determinate for another attribute, which is in turn also a determinate for a third attribute
Model Example:
- OrderID -> CustomerID -> CustomerName, CustomerAddress

Simply remove the transitive dependencies by splitting into yet another table
Model Example:
- OrderID -> OrderDate, CustomerID(fk) (3NF)
- CustomerID -> CustomerName, CustomerAddress (3NF)