Lesson 4 | Columns |
Objective | Describe mistakes associated with columns. |
Database Column Mistakes
When you design tables for a relational database, you need to
- create columns to hold individual data items,
- create constraints to reflect what type of information is permissible in those columns, and
- define primary and foreign keys.
As with other elements of the design process, there are a number of common mistakes you should try to avoid. Let us look first at mistakes associated with columns. Common mistakes associated with columns include:
- Representing more than one attribute in a
- Duplicating column names in different tables
- Giving columns cryptic names
Representing Multiple Attributes in DB Column
Just as you should fight the temptation to store data about more than one business object in a single table, you should avoid trying to squeeze
more than one attribute into a single field to artificially reduce the number of columns in a table. For example, you might try to store a distributor's street address, city, state, and ZIP code in a single column. However, each element of a distributor's complete address is actually a separate attribute. It is much easier to discover which states you sell the most CDs to if that
information is stored in a separate column, not buried in an eighty-character text string.
Duplicating column names
A less subtle but potentially confusing mistake is to have columns of the same name in separate tables.
The Distributor table you designed for the course project, for example, contains fields such as DistStreet, DistCity, and DistState. Note that the Customer table from the first course and the Distributor table have several types of values in common: street addresses, cities, and states, to name three. Naming the fields CustAddress, CustCity, and CustState versus DistAddress, DistCity, and DistState clarifies from which table a column originates without having to look in the data dictionary.
Note: The exception to this guideline is when a column is a foreign key. In that case, you should always give the column the name it has in the table in which it is a primary key.
Giving Columns cryptic Names
In the interest of saving time and disk space and to guard against carpal-tunnel syndrome, you may be tempted to assign short, cryptic names to columns in tables. But this may cause trouble later on when its time to begin adding data to the tables. Users may not immediately understand in which column a given piece of data belongs. And besides, given the size of modern hard drives, conserving disk space isnt nearly the issue it used to be. As a general rule, then, assign descriptive names to columns that briefly convey the table and column name: DistStreet for the distributors street address, for example, and CustStreet for the customers street address.
Thinking Too Small
Too often developers design a perfectly reasonable database only to discover during the final stages of the project that it cannot handle the load that’s dumped on it. Make some calculations, estimate the database’s storage and transaction loads, calculate the likely network traffic, and then multiply by five. For some applications, such as online Web applications that can have enormous spikes in load over just a few hours, you might want to multiply by ten or more. Be sure you use a realistic model of the users’ computers and networks. It’s fairly common in software development to give the programmers building a system great big, shiny, powerful computers so they can be more productive. (It takes a lot of horsepower to play those interactive role-playing games quickly so you can get back to work.)
Unfortunately, customers often cannot afford to buy new computers for every user. (Five developers times $3,000 is $15,000. That’s not exactly pocket money, but it’s nothing compared to $2,000 times 200 users for a total of $400,000.) Make sure your calculations are based on the hardware that the users will really have, not on the dream machine that you are using.
If you don’t think your architecture can handle that load, you should probably rethink things a bit. You may be able to buy a more powerful server, buy more disk space, move to a faster network, or split the data across multiple servers. If those tricks don’t work, you might need to consider a three-tier architecture with different middle-tier objects running on separate computers. You might need to think about moving some of the more intense calculations out of database code and moving them into code running on separate servers. You might need to redesign the database to use turnkey records. You might even need to split the database into disjoint pieces that can run in different
computers.
Solving these problems may be difficult, but you should at least plan ahead and be prepared to face them.
A sure way to ruin customer goodwill is to get the customers all excited, release the database, and then
tell them they can’t use it for four months while you rethink its performance problems.
The next lesson examines mistakes associated with constraints and keys.