Database
Design
Most of us are
capable of designing a robust database that serves most of our needs
by considering a few basic points.
-
How
will you design your computer database to hold your data?
-
How
will you use the data?
A
good place to start is ask two basic questions:
-
What
was the "thing" about which I recorded information? Was
it a bird, a sample of blood, an egg, a seabird colony, a person,
a habitat, a plot, or an event? "Things" are called "entities".
-
What
information did I record about each thing? Information about "entities"
are called "attributes".
Example
1.
Let's assume that
you are running a bird-banding station at which you trap birds, band
them, and measure their body mass. The "thing" or "entity"
about which you are recording information is each individual bird
you catch, and the "attributes" recorded for each thing
are band number, species, location, and body mass. Traditionally we
might record such information on paper in tabular format, with each
bird representing a row in the table, and each item of information
about each bird recorded in a column, as follows:
| Banding
Number |
Species |
Location |
Body
Mass |
| 110390788 |
COTE |
Gull Island
1 |
120 |
| 110390789 |
ARTE |
Gull Island
1 |
135 |
| 110390790 |
COTE |
Green Island |
122 |
| 110390791 |
COTE |
Long Island |
110 |
| 110390800 |
ARTE |
Gull Island
2 |
143 |
This
is called a rectangular table and could be stored as a single file in
a database management system. In a rectangular table each row is called
a "record" and each column is called a "field".
Records can be thought of as sampling units or replicants and fields
as variables.
Although the rectangular
format is usually the most intuitive, it might not be the most appropriate.
Example
2.
Imagine that you
have taken 100 water samples in each of two lakes A and B and measured
pH and water depth of each sample. You may record the data in your
field book as follows:
| pH
A |
Depth
A |
pH
B |
Depth
B |
| 7.2 |
4 |
5.4 |
3 |
| 8.3 |
6 |
6.2 |
9 |
| 6.9 |
8 |
5.2 |
10 |
| ... |
... |
... |
... |
| 8.0 |
9 |
5.8 |
23 |
Although
this arrangement might be suitable for a field note book, it is not
appropriate for a computer database file. First, the variables are not
unique (ie. pH is recorded in two fields), and second, the sampling
unit (ie. what the data in each record or row of the data set represents)
is unclear. Data arranged in this way would make analysis difficult
or impossible without reformatting. For example, you could not produce
a graph of depth on the x-axis and pH on the y-axis for both lakes unless
you somehow overlaid data from both lakes.
Instead,
use the following structure for storage of this data-set in a computer
database:
| Lake |
pH |
Depth |
| A |
7.2 |
4 |
| A |
8.3 |
6 |
| A |
6.9 |
5 |
| B |
5.4 |
3 |
| B |
6.2 |
9 |
| B |
5.2 |
10 |
In this format,
each variable is stored in a single field and it is clear what a row
in the data-set represents one water sample. Note that it is necessary
to add a variable to code for the lake from which the sample was taken.
This was implicit in the fields names in the first attempt above.
Consider
our bird banding example used previously (table shown below), but
now imagine that you want to store more information about each location
such as latitude and longitude, habitat, and nearest town.
| Banding
Number |
Species |
Location |
Body
Mass |
| 110390788 |
COTE |
Gull Island
1 |
120 |
| 110390789 |
ARTE |
Gull Island
1 |
135 |
| 110390790 |
COTE |
Green Island |
122 |
| 110390791 |
COTE |
Long Island |
110 |
| 110390800 |
ARTE |
Gull Island
2 |
143 |
You
could do this by adding these fields to the flat file, however, the
information would have to be repeated everytime a particular location
appears in the database. This would lead to a lot of duplication and
redundancy in the database and result in a much larger file. The solution
is to store the location information in a separate file and link the
two files together. Below is how the second table containing location
information would look:
| Location |
Latitude |
Longitude |
Nearest Town |
| Gull Island
1 |
44.54 |
60.02 |
Halifax |
| Gull Island
2 |
45.55 |
65.45 |
Liverpool |
| Green Island |
47.40 |
64.20 |
Miscou |
| Long Island |
47.63 |
53.01 |
Portugal
Cove |
Note how the
Location field is common to both tables. The two
tables are linked by this field (see image below). Whenever Island
A is mentioned in the banding table, all the attributes in the location
table for Island A are also available as if they resided in the
banding table. Even though Island A is mentioned more than once
in the banding table, it only needs to appear once in the Location
table.

Match
Fields
Match fields
must be carefully planned because they play an important role
in linking multi-table databases. In the example above, the values
used in the match field "Location" need to be unique,
so that references to the table are unambiguous. In the Banding
table, two different Gull Islands were referred to, so a number
was added to the end of the location name to make them unique
and to allow the correct information to be matched in the Location
table.
The
number of characters used in the match field should be as few as
possible because copies reside in each of the tables in your relational
database. Short match fields reduce the amount of disk space used,
and make the database run faster.
|