1. Normalization
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity.
Objectives of Normalization:
- Minimize data duplication
- Ensure data consistency
- Simplify data maintenance
- Improve data integrity
Forms of Normalization:
- First Normal Form (1NF):
- Ensure each column contains atomic (indivisible) values.
- Each column must contain unique data of a specific type.
Example:
Before 1NF:
| StudentID | Name | Subjects |
| 1 | John Doe | Math, Science |
What We Did:
- Split the multi-valued “Subjects” column into separate rows.
After 1NF:
| StudentID | Name | Subject |
| 1 | John Doe | Math |
| 1 | John Doe | Science |
- Second Normal Form (2NF):
- Must meet 1NF requirements.
- Ensure all non-key attributes are fully functionally dependent on the primary key.
Example:
Before 2NF:
| OrderID | ProductID | ProductName |
| 1 | 101 | Laptop |
What We Did:
- Split into two tables to separate product information.
After 2NF:
Orders Table:
| OrderID | ProductID |
| 1 | 101 |
Products Table:
| ProductID | ProductName |
| 101 | Laptop |
- Third Normal Form (3NF):
- Must meet 2NF requirements.
- Ensure no transitive dependency exists (i.e., non-key columns should not depend on other non-key columns).
Example:
Before 3NF:
| EmployeeID | DepartmentID | DepartmentName |
| 1 | 101 | Sales |
What We Did:
- Moved “DepartmentName” to a new table to remove transitive dependency.
After 3NF:
Employees Table:
| EmployeeID | DepartmentID |
| 1 | 101 |
Departments Table:
| DepartmentID | DepartmentName |
| 101 | Sales |
- Boyce-Codd Normal Form (BCNF):
- A stricter version of 3NF.
- Every determinant must be a candidate key.
Example:
Before BCNF:
| ProfessorID | Department | Subject |
| 1 | CS | Database |
| 1 | CS | Algorithms |
What We Did:
- Decomposed into two tables to ensure every determinant is a candidate key.
After BCNF:
Professor_Department Table:
| ProfessorID | Department |
| 1 | CS |
Professor_Subject Table:
| ProfessorID | Subject |
| 1 | Database |
| 1 | Algorithms |
- Fourth Normal Form (4NF):
- Must meet BCNF requirements.
- Eliminate multi-valued dependencies.
Example:
Before 4NF:
| StudentID | Hobby | Skill |
| 1 | Painting | Java |
| 1 | Painting | Python |
What We Did:
- Split multi-valued attributes into separate tables.
After 4NF:
Student_Hobby Table:
| StudentID | Hobby |
| 1 | Painting |
Student_Skill Table:
| StudentID | Skill |
| 1 | Java |
| 1 | Python |
2. Denormalization
Denormalization is the process of combining normalized tables to improve read performance at the cost of increased redundancy.
Example of Denormalization:
Before Denormalization:
Orders Table:
| OrderID | CustomerID |
| 1 | 101 |
Customers Table:
| CustomerID | CustomerName |
| 101 | John Doe |
What We Did:
- Merged “Customers” data into the “Orders” table to reduce joins.
After Denormalization:
| OrderID | CustomerID | CustomerName |
| 1 | 101 | John Doe |
Normalization vs. Denormalization:
| Feature | Normalization | Denormalization |
| Purpose | Reduce redundancy, ensure integrity | Improve read performance, reduce joins |
| Data Integrity | High (fewer inconsistencies) | Lower (due to duplication) |
| Query Performance | Slower (requires joins) | Faster (fewer joins) |
| Storage Efficiency | Uses less storage | Uses more storage |
| Complexity | Higher (more tables, more joins) | Simpler (fewer tables, easier queries) |
| Maintenance | Easier (less redundancy) | Harder (due to data duplication) |