Describe Physical Database Design in terms of optimizing Database Performance
Physical Database Design and Data Storage
A computer's hard disk is more than a collection of 1s and 0s. To manage the data on a hard disk effectively, a computer's operating system (such as Windows or Unix) divides the disk into blocks[1]. Each block can hold a certain amount of information. If a file is too large to fit onto a single block, the computer finds the next empty block, continues writing the file there, and repeats the process until the file is completely written to the disk. The following illustration shows how a hard disk might be divided among several files.
Note the file titled word1.doc is written on five blocks; the file me.gif is written on five blocks; and the file resume.doc is written on seven blocks. Any given block can hold information from only a single file.
Adding and modifying Files
If you have created and modified a number of files, it is unlikely that all of the data for a particular file will be stored on contiguous blocks. Suppose you create a document using a word processor, save a graphic from a World Wide Web site, and then add a page to the word processor document. Because the operating system writes files to open blocks in the order that the files are created and modified, the physical layout of the files on the disk would be similar to the layout in the following graphic.
The next time you open the word processor document, the computer will consult its internal directory of stored files, read the data from the first set of blocks, move to the second area of the disk, and then read the second set of blocks. The more often a computer needs to move to another area of the disk to read part of a file, the longer it takes to read the file.
Improving performance using Physical Design
Most RDBMS's are able to rearrange files on a hard disk to store records from a table in contiguous blocks. You can go even farther and use other techniques to improve the performance of your database, including:
Indexing
Clustering
Partitioning
The following series of images offers a quick overview of these techniques.
The remaining lessons in this module examine each in greater depth and list their advantages and disadvantages.
Impact of RDBMS on performance issues
The techniques you will learn in this module are powerful, but there is no magic formula you can follow to maximize database performance. There are a number of reasons for this:
Every database is different. Your choices regarding which tables to combine often depend entirely on the tables themselves and your organization's needs.
Your organization's needs may change over time, requiring you to change your database's physical design.
Every RDBMS handles joins and searches differently. Because the best physical design for a database under one RDBMS is not necessarily the best physical design for that database under another RDBMS, you should evaluate different physical designs when creating the database. You should also revisit the physical design as the database grows to ensure the design you have chosen still offers the best performance. If possible, make a copy of the database and test it on a backup machine.
Comparison of Logical and Physical Database Design
In presenting a database design methodology we divide the design process into three main phases:
The phase prior to physical design, namely logical database design, is largely independent of implementation details, such as the specific functionality of the target DBMS and application programs, but is dependent on the target data model.
The output of this process is a logical data model consisting of an
ER/relation diagram,
relational schema, and
supporting documentation
that describes this model, such as a data dictionary. Together, these represent the sources of information for the physical design process, and they provide the physical database designer with a vehicle for making tradeoffs that are so important to an efficient database design.
Logical Database Design
Whereas logical database design is concerned with the what, physical database design is concerned with the how.
It requires different skills that are often found in different people.
In particular, the physical database designer must know how the computer system hosting the DBMS operates, and must be fully aware of the functionality of the target DBMS. As the functionality provided by current systems varies widely, physical design must be tailored to a specific DBMS. However, physical database design is not an isolated activity and there is often feedback between
physical,
logical, and
application design.
For example, decisions taken during physical design for improving performance, such as merging relations together, might affect the structure of the logical data model, which will have an associated effect on the application design.
Physical Design Overview
The process of producing a description of the implementation of the database database on secondary storage is known as physical design.
Physical Design describes the
base relations,
file design organizations, and
indexes
used to achieve efficient access to the data, and any associated integrity constraints and security measures.
The physical database design phase allows the designer to make decisions on how the database is to be implemented.
Therefore, physical design is tailored to a specific DBMS. There is feedback between physical and logical design, because decisions taken during physical design for improving performance may affect the logical data model.
The next lesson introduces indexing.
[1]block: The smallest area that can be addressed on a magnetic disk.