A Database Design Tutorial

By:
To read more DBA articles, visit http://dba.fyicenter.com/article/

Everything in this tutorial is going to be explained using an example database. This example database is a simple database that serves well to illustrate various concepts. It's most likely not a very useful database to use in practice, but using it was not the intention.

This database is a variant on a common theme, a database to keep track of customers, items in stock, the different orders of the mentioned customers, the items on each order and the number of items ordered. In addition we will keep track of one bank account for each customer from which we will get the money for each order.

Why a database?
A database is very useful for various reasons, but if one has to choose its crowning glory it will most likely be the fact that one can get its data in a nearly endless variety of ways. That means one can create reports to show just what one needs. In our example database we can get reports for such complicated questions like who bought two or more of item a, but not item b, and had a total order amount between x and y dollars between date p and date q. Think of going through your paper records to come up with the answer to that one.

Relational database foundations
Edgar Codd is the father of the relational database. It all started in 1969 when Codd was a researcher at IBM. The relational database is built on mathematical foundations. It has to be correctly designed to work faultlessly. The relational in relational database refers to the tables (relations) of which the database is composed. These tables are not standalone tables, no, they are related to each other in very specific ways. This brings us to the two kinds of database laws:

laws about table structure
laws about how tables are related to each other

What is a table?
A table is one object or entity about which we want to keep data. Try to make these objects indivisible. For instance, keep data about an employee in one table and data about his dependents in another table. We get compound data using the relationships that exist between tables. One can have a table or tables to handle these relationships. In our example database we have tables for:

customers
a customer's bank account details
orders
items
a relationship table to handle the different items on the different orders

Table structure

So, what does a table look like? Well, something like this:
.K Surname Names D.O.B
1 Brown John Jack 23 Mar 1962
2 Smith Peter George 16 Jun 1953
3 Williams Agnes Semolina 7 Jan 1991
4 Jackson Charlene Gail 5 Aug 1952

You will notice that the table has rows (horizontal) and columns (vertical). In this case the first row contains the column headings. The first column heading is the heading for the primary key. The rest are self explanatory. This brings us to the first law: in one table, column headings may not be repeated. Close on the heels of this is the second law: every row must have a primary key which uniquely identifies that row and is not repeated in that table. In the example the primary key is a single value. Primary keys can also be composite, i.e. consist of more than one value in more than one column. In this last case the total combination of values must be unique, e.g. one can have the primary key values (A,B), (A,C), (B,C) and even (B,A), all nice and legal.

All tables must be in at least third normal format for the database to work faultlessly. So, what are these normal forms?

I found the easiest to remember classification of the normal forms in a book on database design by Fidel Captain. It goes from the smallest data unit, one column in one row, to the biggest, the table.

First normal form

A table is in first normal form if all column values are indivisible. Say you have an order table and in this table you have a column called items. Now for one order you enter the following values in the items column: pen, ink, paper. This table is now not in first normal form and your design is flawed. The items should be in a second table related to the order table.

Furthermore, all possible values for any column must be of the right data type and in the domain of allowed values. That's to say if you decided only to allow integers between 0 and 1,000,000 in a certain column, each value in that column should be an integer between one and a million, both values inclusive.

A column must also be unique in a table. You cannot have more than one column named 'article' in one table.

Second normal form
The second normal form concerns a whole row in any table, i.e. an entity in that table.

A table is in second normal form only if it is in first normal form and every non-key column is fully dependent on the whole primary key. This is only an issue with composite primary keys. Say you have succumbed to the temptation of everything in one table and you have an order_item table with the order_id and the item_id together a composite primary key. Desirous to see everything in one table, you have also included the customer name in this table. Obviously, the customer name has nothing to do with the item_id, which leads to the items table. However, using the order_id to get to the orders table you will find a customer_id there which will take you to the relevant customer in the customers table. That means your table is not in second normal form as the customer name is only dependent on the customer_id part of the primary key. You'll have to remove the customer name from this table. See the example to see how it should be done.

Some people will class what follows as a requirement for the third normal form. As it concerns the integrity of just one row (an entity in a table), I put it here.

Besides the requirement that every column in a row must be dependent on the the whole foreign key, it also must not be dependent on any non-key column. The classical example of this is a derived value. Say the sale table has an item price column and a column indicating how many of the specific item went into that sale, then there should be no total amount column. The reason for this is obvious: if either the price or number of items is updated, the total amount has to be updated as well. Sooner or later someone is going to make a mistake and the data in that row will be corrupt. Such a built in weakness decreases reliability of the data. Rather, let your application accessing the database do the calculation and show the total amount where it has to be shown.

One can summarise the second normal form by saying that the value of any non-key field in any table should be dependent on only the key, the whole key and nothing but the key.

Full article...