Normalisation

Normalisation is a process which divides larger datasets into smaller to minimise data redundancy
Eliminate insert / update / delete anomalies

Insert anomaly example: If a student record and Course details stored together and when we just want to add only new student without a course or add a course details without a student, both cases causes trouble.

Update anomaly example: If a table contain 1000’s of students and if there is a change to the course name instead of updating one record, we have to update several student records i.e. when the course details are already embeded with student details. Also there is a chances of missing updates to some of the rows if the task fail in the middle and no rollback occurred.

Delete anomaly example: If we delete a course details there is a risk of loosing the student records and vice versa.

Normal Forms

1NF – First Normal Form – All columns must be atomic, but when solving this issue we introduce redundancy.

Example: Course Name column values: Coding, Testing, Design. If a student enrolled in all of these then the course name contain all of the above values with comma separator. This violates the 1st normal form.
To resolve we store the same student single record into three records i.e. one row per course name.
Now the student record is getting repeated and introduced redundancy.

2NF – Second Normal Form – 1NF + Composite key shouldn’t be the reason for duplicates. So break it and also create a bridge table to handle many to many relationship between student details and course details. i.e. redundant data across multiple rows must be moved to a separate dataset.
example – If the Student_ID and Course_ID forms the uniqueness then split them into separate table and create a bridge table to display the relationship.

3NF – Third Normal Form – 2NF + Columns that are not directly dependant on primary key or Non key attributes dependant on another non key attribute then they are stored in a separate dataset.

DeNormalisation

Optimisation technique applied after normalisation for faster reads. Denormalisation is a process where data from multiple tables are combined into one table, so that data retrieval will be faster.

Normalisation vs DeNormalisation

Normalisation

Normalisation divides larger datasets into smaller and reduces redundancy
Increased number of tables and hence increased the number of joins needs for data reterival
Faster writes and updates
Used in OLTP system

DeNormalisation

Denormalisation combine smaller datasets into larger one
Less Joins
Faster reads, because combining smaller datasets into larger one through the ETL process helps the users to read and analyse the data faster
Used in OLAP system

Top Guides – Data School

Normalisation

Normal Forms

DeNormalisation

Normalisation vs DeNormalisation

Normalisation

DeNormalisation

Other Topics to be added

Bigdata

dbt helps

Advantages of dbt