Lesson 4	Downside to indexing
Objective	Describe the trade-offs that occur by creating Indexes

Index Tradeoffs in Physical Design

In the previous lesson you learned how indexes speed data access by giving the RDBMS a shortcut into large tables. In physical database design, however, every performance gain has a cost. Indexes improve read performance but consume storage, slow write operations, and add maintenance overhead.

This lesson focuses on tradeoffs: when an index is worth its cost, when it is not, and how to reason about indexing decisions as your workload and data volumes change.

Why Indexes Help Reads but Hurt Writes

Indexes are separate data structures (often B-trees or similar) that keep key values sorted and point to table rows. When a query filters or joins on an indexed column, the optimizer can:

Seek directly into the index to find matching key values.
Follow row pointers to retrieve only qualifying rows from the base table.

This avoids scanning every row, which is critical for large tables and interactive workloads.

The tradeoff is that every data change touching indexed columns must also update each affected index:

INSERT: add a new row and insert its key into every relevant index.
UPDATE: if an indexed column changes, remove the old key and insert the new key.
DELETE: remove the key from every index that references the row.

The more indexes a table has, the more work each write operation must do. Good physical design balances faster reads against this extra write cost.

Example: Orders for CDs

Indexing a column makes searches and joins on that column faster, but not every column deserves an index. Consider the following table of CD orders:

CD table with columns 1)OrderNo 2) OrderDate 3) CDNo 4) OrderCost 5) Received — **CD_ORDERS**
`| OrderNo | OrderDate | CDNo | OrderCost | Received | | ------- | --------- | ---- | --------- | -------- | | 101 | 01/11/00 | 105 | 8.95 | Yes | | 102 | 01/11/00 | 127 | 12.95 | Yes | | 103 | 01/12/00 | 108 | 13.95 | No | | 104 | 01/13/00 | 101 | 12.95 | Yes | | 302 | (blank) | 127 | 12.95 | No |`
CD orders with OrderNo, OrderDate, CDNo, OrderCost, and Received.

Typical queries might ask:

“Show every order for a given CD number (CDNo).”
“Look up an order by its order number (OrderNo).”
“Find unreceived orders (Received = 'No').”

In this workload:

Indexes on OrderNo (usually a primary key) and CDNo make sense.
An index on OrderCost is rarely useful unless you frequently ask price-based questions (for example, auditing by price band).

Indexing every column here would increase write cost and maintenance with little additional benefit.

Core Tradeoffs When Creating Indexes

Indexing a field increases read performance for some queries, but introduces several costs that accumulate over time:

Additional storage consumption Each index is a separate on-disk structure. Modern storage is inexpensive, but:
- Indexes still occupy buffer cache and I/O bandwidth.
- Very large or redundant indexes can evict more useful pages from memory, reducing cache hit rates.
Slower inserts, updates, and deletes Every data change must keep indexes consistent:
- High-volume transactional tables feel the cost most strongly.
- Wide composite indexes or many overlapping indexes amplify the impact.
On OLTP systems, careless indexing can turn fast write workloads into bottlenecks.
Maintenance and fragmentation As rows change, index pages can become fragmented, spreading logically adjacent keys across multiple pages:
- Random I/O increases for range scans and index seeks.
- Periodic rebuilds or reorganization may be required, which in turn consumes CPU, I/O, and sometimes locks.
Design complexity With many indexes per table:
- The optimizer has more choices; execution plan analysis becomes harder.
- It is easier to end up with nearly duplicate indexes that provide little incremental value.

The key design question is never “Should I index?” but “Does this index provide enough benefit on real queries to justify these costs?”

Modern Best Practices for Index Tradeoffs

When designing indexes in a production system, use these guidelines to manage tradeoffs:

Start with keys that matter Most relational systems:
- Automatically index primary keys and unique constraints.
- Benefit from indexes on foreign keys used in joins and referential checks.
These indexes often deliver high value with predictable cost.
Align indexes with real workload Focus on columns that appear repeatedly in:
- WHERE conditions.
- JOIN predicates.
- GROUP BY and ORDER BY clauses.
Avoid indexing columns that are rarely queried or used only in reports that can tolerate full scans.
Evaluate selectivity Indexes work best when they significantly reduce the number of rows examined. Columns with low selectivity (for example, boolean flags with mostly one value) rarely benefit from simple B-tree indexes unless paired in composite keys.
Use composite indexes sparingly Multi-column indexes should reflect common query patterns (for example, (CustID, OrderDate)). Extra columns that queries never filter or sort by add width and maintenance cost without improving plans.
Monitor write-heavy tables On high-throughput OLTP tables, every additional index has an immediate performance impact on insert and update paths. Central transaction tables should have only a small, carefully chosen set of indexes.
Review index usage regularly Most enterprise RDBMS platforms expose index usage statistics. Periodically:
- Identify indexes that are rarely or never used.
- Remove or consolidate redundant indexes.
- Adjust composite indexes to better match evolving queries.

Effective indexing is an iterative process: design, observe, measure, and refine.

Choosing Indexes for the CD Orders Example

Returning to the CD_ORDERS table:

OrderNo Natural candidate for a primary key index; used for lookups and referential integrity.
CDNo Strong candidate for an index when queries frequently list orders for a specific CD.
Received May be useful as part of a composite index (for example, (Received, CDNo)) if you often query “unreceived orders per CD,” but on its own it has low selectivity.
OrderCost Generally not a good standalone index; only consider it if many business questions filter or group by cost ranges.

This small example illustrates a general rule: index the columns that drive business questions, not every column that is convenient to access.

Disadvantages of Indexing - Quiz

Before you move to the next lesson, click the Quiz link below to reinforce your understanding of indexing.
Disadvantages of Indexing - Quiz