You have learned that I/O operations are one of the primary bottlenecks in the operation of a database. Indexes help to reduce the number of I/O operations, but you can further enhance the performance of data retrieval by separating indexes from their associated tables.
Disk contention
We discussed how disk I/O is the slowest operation in a computer, because of the need for the physical movement of the disk head. There is another factor that contributes to slow disk performance and disk contention[1].
The disk head is not only moving back and forth across the disk, but it is doing this in response to many requests for many types of information. Because the disk head can read only one piece of information at a time, these multiple requests all contend for the operations of the disk head, creating contention that forces disk requests to queue up for access. The more contention, the longer the queues, and the slower the disk access performance. When you access data through an index, you are guaranteed to request data from the index and from the database table. If these two structures are on separate disks, you can reduce disk head contention and improve I/O performance.
Separating indexes:
You can specify where an index goes by adding the TABLESPACE tablespace_name clause to the CREATE INDEX SQL command described in the previous lesson, where tablespace_name is the name of an existing tablespace. This clause follows the ASC/DESC keyword. The difference separating an index can make is shown in the following series of images
Separating Index Produces better Results in Oracle
Other Suitable Columns
When choosing an index, keep in mind this basic rule: the indexing strategy should be based on the columns you use when querying the table. You can create more than one index on a table and have an index contain multiple columns. You will make better decisions if you first consider what types of queries you execute on a table. If you have identified a poorly performing SQL query, also consider creating indexes for the following columns:
Create indexes on columns used often as predicates in the WHERE clause; when multiple columns from a table are used in the WHERE clause, consider using a concatenated (multi-column) index.
Create a covering index on columns used in the SELECT clause.
Consider creating indexes on columns used in the ORDER BY, GROUP BY, UNION, or DISTINCT clauses.
Oracle allows you to create an index that contains more than one column. Multicolumn indexes are known as concatenated indexes (sometimes referred to as composite indexes). These indexes are especially effective when you often use multiple columns in the WHERE clause when accessing a table.
Concatenated indexes are often times more efficient in this situation than creating separate single column indexes.
Columns included in the SELECT and WHERE clauses are also potential candidates for indexes. Recall that a covering index is one that includes all columns returned by the query. In this situation, Oracle can use the index structure itself (and not the table) to satisfy the results of the query. Also, if the column values are selective enough, Oracle can use an index on columns referenced in the WHERE clause to improve query performance.
Also consider creating indexes on columns used in the ORDER BY, GROUP BY, UNION, or DISTINCT clauses. This may result in more efficient queries that frequently use these SQL constructs.
There are
other ways that you can optimize your performance which are discussed in the section below.
Database Structures Optimize Disk Performance
There are three other database structures you can use to optimize disk performance:
Hash index:
In a hash index, the values for the index are run through a hashing algorithm, and the index stores the result. This type of index isdesigned to avoid clumping when there are many similar index values.
Hash cluster:
The hash cluster takes the results of the hash and creates a cluster on disk, so that Oracle can simply run the hash algorithm and thengo directly to the location of the value.
Index organized table:
This type of table includes all the table information within an index structure, so that no additional I/O is needed to retrieve thetable data after the index value is found.
The 1) hash and 2) hash cluster indexes cannot be used for sorting, because they do not use a real value, and because there can only be one hash cluster index per table.
You can also partition your indexes, which spreads the disk I/O out over multiple partitions. The Oracle query optimizer will ignore partitions that will not satisfy a selection condition, or will suppress imposing a selection condition if it knows that all the values in a partition will satisfy it. For instance, if you had partitions based on states, and a query requested all entries from OH, Oracle would ignore all of the partitions other than the one for Ohio, and not bother to test the values in that partition.
Hash Cluster Queries
The database, not the user, determines how to hash the key values input by the user. For example, assume that users frequently execute queries such as the following, entering different department ID numbers for p_id:
SELECT *
FROM employees
WHERE department_id = :p_id;
SELECT *
FROM departments
WHERE department_id = :p_id;
SELECT *
FROM employees e, departments d
WHERE e.department_id = d.department_id
AND d.department_id = :p_id;
If a user queries employees in department_id=20, then the database might hash this value to bucket 77. If a user queries employees in department_id=10, then the database might hash this value to bucket 15. The database uses the internally generated hash value to locate the block that contains the employee rows for the requested department.
Figure 5-5 depicts a hash cluster segment as a horizontal row of blocks.
A limitation of hash clusters is the unavailability of range scans on nonindexed cluster keys. The next lesson is about altering an index.
Adding Indexes - Exercise
Click the Exercise link below to practice creating additional indexes for the COIN database. Adding Indexes - Exercise
[1]Contention: A source of possible performance degradation caused by two or more users contending for the same resources.