Lesson 2	Physical Design
Objective	Describe Physical Database Design in terms of optimizing Database Performance

Physical Database Design and Performance

At the logical level, you think in terms of tables, columns, and relationships. At the physical level, the database must still respect that logical model, but it is ultimately stored as blocks^[1] on disk or SSD.

To manage data efficiently, the operating system and the RDBMS divide storage into fixed-size units (blocks or pages). Each block can hold part of a table, an index, or other internal structures. If a table or index is larger than a single block, the RDBMS spreads it across many blocks.

When related data is kept together in as few blocks as possible, the database engine can satisfy queries with fewer I/O operations. When data is scattered across many blocks, the engine must perform more random I/O, which slows performance—even on fast SSDs under heavy workloads.

Contiguous Blocks — A simple example of contiguous blocks used by multiple files. A block is the smallest addressable unit of storage the system reads or writes as a whole.

In this illustration, three files share the disk, but each file’s blocks are lined up contiguously. Sequential reads are fast because the storage layer can read adjacent blocks with minimal movement or overhead.

Fragmentation and Why Layout Matters

Over time, as files (or database segments) are created, expanded, and removed, it becomes harder to keep every object in one contiguous stretch of blocks. The result is fragmentation: a single file or table ends up spread across non-adjacent blocks.

The same thing happens inside a database. As rows are inserted, updated, and deleted, free space appears in different places and the physical layout of a table becomes less orderly. Queries that once needed just a handful of blocks may now require many more physical reads.

fragemented blocks — Fragmented layout: data for a single file is split across separate regions. To read the full file (or table), the system must “jump” between blocks, increasing I/O cost.

The more frequently the storage engine must jump between blocks to assemble a rowset, the higher the latency and the lower the throughput. Physical database design is about making deliberate choices—indexes, clustering, and partitioning—that minimize unnecessary I/O and keep related data logically and physically close together.

Physical Design as a Performance Toolkit

Modern RDBMSs provide many tools to improve performance without changing the logical schema:

Indexing – Adds access paths that allow the optimizer to find rows with fewer reads.
Clustering – Influences how related rows are physically co-located on storage.
Partitioning – Splits large tables or indexes into smaller pieces for better manageability and selective access.

Together, these techniques let you:

Reduce the number of blocks that must be read for common queries
Align physical layout with real-world access patterns
Scale to larger data volumes while maintaining acceptable response times

The following figures offer a high-level overview of these techniques. Later lessons in this module dive deeper into each one and show how to apply them in practice.

How the RDBMS Influences Performance

There is no single “perfect” physical design. Performance depends on:

The schema and workload. OLTP workloads, analytic workloads, and mixed workloads benefit from different indexing and partitioning strategies.
How the RDBMS implements joins and access paths. Two vendors may execute the same SQL in very different ways.
Data growth and change over time. A design that works well for thousands of rows may need to be revisited at tens or hundreds of millions of rows.

As a result, physical design is an iterative optimization process:

Start from a well-normalized logical model.
Apply indexing, clustering, and partitioning based on typical queries and SLAs.
Measure performance using the RDBMS’s execution plans and monitoring tools.
Refine the design as the workload and data volume evolve.

Whenever possible, evaluate alternative physical designs in a test or staging environment before applying them to production.

From Logical Design to Physical Design

A standard database design methodology distinguishes three main phases:

Conceptual design – Capture business concepts and relationships.
Logical design – Map concepts into a relational schema (tables, keys, constraints).
Physical design – Decide how to implement that schema for performance and reliability.

Logical design is largely independent of the specific DBMS, but physical design is platform-specific. The RDBMS and storage engine determine:

Available index types and partitioning strategies
How clustering is expressed (e.g., clustered indexes, index-organized tables, table/partition placement)
How statistics, caching, and compression affect query plans

The inputs to physical design include:

An ER or relational diagram
A relational schema definition
Supporting documentation such as a data dictionary and workload description

Physical design choices may also feed back into logical and application design—for example, denormalization or materialized views for heavy analytic workloads.

Logical vs. Physical Design Skills

Logical database design answers “what data must we represent and how should it relate?” Physical database design answers “how do we store and access this data efficiently on a specific platform?”

Effective physical design requires:

Understanding how the DBMS uses indexes, caches, and statistics
Knowing how storage behaves (latency, throughput, concurrency characteristics)
Reading execution plans and identifying I/O hot spots

Design activities are tightly connected:

Physical design may suggest minor changes to the logical schema (for example, adding surrogate keys or summary tables).
Application design may need to adapt to the chosen partitions, indexes, or clustering strategies.
Ongoing monitoring may reveal new access patterns that call for additional physical tuning.

Indexing: Making Lookups Faster

1) Indexing values in a field helps the RDBMS find values more quickly. — **INDEX (Conceptual Example)**
`| DistID | Locations | | ------ | ------------------------ | | 101 | 1, 5, 9, 47, 82, 108 | | 102 | 3, 6, 14, 23, 36, 83, 91 | | 103 | 2, 4, 8, 16, 32, 64, 128 | | 190 | 17, 39, 53 |`
This conceptual index shows how the RDBMS can map a key value (DistID) to the blocks where matching rows are stored. In a real system, indexes are implemented using structures such as B-trees or bitmap indexes, but the goal is the same: **reduce the number of blocks that must be read to find the required rows.**

Clustering: Keeping Related Rows Together

2) Clustering puts records which are often retrieved together on contiguous disk blocks — CD
`| CDNo | CDTitle | DistID | Price | AgeGroup | | ---- | -------------- | ------ | ----- | -------- | | 101 | Southern Tales | 102 | 14.95 | YA-13 | | 102 | Northern Tales | 101 | 13.95 | YA-9 | | 103 | Western Tales | 102 | 9.95 | YA-13 | | 104 | Eastern Tales | 102 | 12.95 | YA-11 | | 105 | Sports Stories | 102 | 13.95 | YA-13 | | 106 | Ghost Stories | 101 | 14.95 | YA-11 |`

**DISTRIBUTOR**
`| DistID | DistName | City | State | | ------ | ---------------------- | ------------ | ----- | | 101 | Gamby Distributing | Harrisonburg | VA | | 102 | Tales to Tell | Denver | CO | | 103 | Stories from the Heart | Richmond | VA |`
Clustering arranges rows so that CDs and their distributors that are frequently joined are stored close together on disk. When a query joins `CD` and `DISTRIBUTOR` on `DistID`, fewer blocks must be read, improving join performance.

Partitioning: Breaking Large Tables into Manageable Pieces

3) Partitioning separates a table into two(or more) — **CD (Partition 1)**
`| CDNo | CDTitle | DistID | Price | AgeGroup | | ---- | -------------- | ------ | ----- | -------- | | 101 | Southern Tales | 102 | 14.95 | YA-13 | | 102 | Northern Tales | 101 | 13.95 | YA-9 | | 103 | Western Tales | 102 | 9.95 | YA-13 |`

**CD (Partition 2)**
`| CDNo | CDTitle | DistID | Price | AgeGroup | | ---- | -------------- | ------ | ----- | -------- | | 104 | Eastern Tales | 102 | 12.95 | YA-11 | | 105 | Sports Stories | 102 | 13.95 | YA-13 | | 106 | Ghost Stories | 101 | 14.95 | YA-11 |`
Horizontal partitioning splits a table into multiple partitions—often by key range, list, or hash. Queries can then target only relevant partitions (for example, “current year” data), reducing I/O and improving maintenance tasks such as backups, index rebuilds, or data archiving.

4) or columns close together on the hard disk — **CD (Vertical Partition – Part 1)** – Identifying and descriptive columns
`| CDNo | CDTitle | DistID | | ---- | -------------- | ------ | | 101 | Southern Tales | 102 | | 102 | Northern Tales | 101 | | 103 | Western Tales | 102 |`

**CD (Vertical Partition – Part 2)** – Numeric/value columns
`| CDNo | Price | AgeGroup | | ---- | ----- | -------- | | 101 | 12.95 | YA-11 | | 102 | 13.95 | YA-13 | | 103 | 14.95 | YA-11 |`
Vertical partitioning groups columns that are often accessed together into narrower tables. Queries that only need identifying information avoid reading less frequently used or larger columns, reducing I/O and improving cache utilization.

Physical Design Overview

The process of producing a description of how the database is implemented on secondary storage is known as physical design. A physical design describes:

The base relations and how they are stored
The file organizations and access methods used for tables and indexes
The indexes, partitions, clustering strategies, and related constraints and security measures

During this phase, the designer chooses among the options offered by a specific RDBMS to meet performance, scalability, and reliability goals. Because different systems implement indexes, clustering, and partitioning differently, physical design must be tailored to the target DBMS. Decisions taken here may feed back into logical and application design when significant performance improvements are possible.

The next lesson focuses on indexing in more detail and shows how index design affects query performance.

[1]block: The smallest unit of storage that the operating system or RDBMS addresses as a whole when reading from or writing to disk.