reading-notes

Reading notes for Code Fellows!


Project maintained by William-Moreno Hosted on GitHub Pages — Theme by mattgraham

Database Normalization


The following notes were distilled from Kris Wenzel’s article “Database Normalization (Explained in Simple English).

Introduction

Database Normalization is the process of organizing a database into tables and columns. This process is based on the concept that each table should pertain to a specific topic and should only include data that supports that topic. Limiting a table to a single purpose, reduces the amount of duplicated data within our database. In turn, this helps to eliminate some of the issues which arise from database modification.

There are established rules to accomplish the previously mentioned objectives. As we implement these rules, new labes will be formed. There are three successive forms of normalization that most databases tend to adhere to. As they progress through these tiers, databases become less prone to modification anomalies as they become more tightly focused on specific topics or purposes.

Reasoning

Databases should be normalized for tbree reasons:

  1. Minimization of duplicate data
  2. Avoid or reduce data modification issues
  3. Simplify queries

Consider the following table:

SalesStaff

EmployeeID SalesPerson SalesOffice OfficeNumber Customer1 Customer2 Customer3
1003 Mary Smith Chicago 312-555-1212 Ford GM -
1004 John Hunt New York 212-555-1212 Dell HP Apple
1005 Martin Hap Chicago 312-555-1212 Boeing - -

EmployeeID is the primary key column.

This table Serves many purposes as opposed to a single one:

  1. Identify salespeople
  2. List sales offices and phone numbers
  3. salesperson and sales office association
  4. Displaying a salesperson’s clients

Because this table serves many purposes, it is prone to data duplication, data update issues and it increses the effort needed to query it.

Data Duplication/Modification Anomalies

The table lists both the SalesOffice and the OfficeNumber for each SalesPerson. It is duplicaed information and presents multiple problems:

  1. increases storage
  2. decreases performance
  3. becomes harder to maintain data changes

If the location of an office were to change, all of the entries on this table would have to be updated. This is known as a modification anomaly. There are three types of modification anomalies which can occur:

EmployeeID SalesPerson SalesOffice OfficeNumber Customer1 Customer2 Customer3
1003 Mary Smith Chicago 312-555-1212 Ford GM -
1004 John Hunt New York 212-555-1212 Dell HP Apple
1005 Martin Hap Chicago 312-555-1212 Boeing - -

Searching and Sorting

The final issue with this table is that it makes queries to find data such as specific customers more difficule by requiring a longer or more convoluted query. If all of the customer data were in one column of the table it would make such a query simpler. This table would also make sorting by customer challenging, requiring three separate UNION queries to complete.

These issues can be mitigated by separating the information an putting it into multiple tables, each serving a more singular purpose.

Definition

There are 3 progressive forms of database normaliztion, each successive form building upon the previous form:

Back to Main