Describe the characteristics of Horizontal Partitioning
Characteristics of Horizontal Partitioning in Database Design
Horizontal partitioning, also known as data sharding or partitioning by row, is a technique in database design that divides a table into multiple separate fragments based on a specific criterion or partition key. This approach aims to improve performance, scalability, and manageability of databases, particularly in large-scale applications. The key characteristics of horizontal partitioning are as follows:
Data distribution: In horizontal partitioning, the rows of a table are divided into several partitions, each containing a subset of the data. The partitioning is typically based on a specific attribute or a range of attribute values, ensuring that each partition contains a distinct subset of the data.
Partition key: The partition key is a column or a set of columns in the table used to determine the criteria for splitting the data into partitions. It is essential to choose an appropriate partition key to ensure even data distribution and optimal performance. The partition key should have a high degree of cardinality and should be relevant to the most common query patterns.
Scalability: Horizontal partitioning is an effective approach to addressing scalability issues in large-scale applications. By dividing data into smaller, more manageable partitions, it can help to reduce the load on the database system, thereby enhancing overall performance.
Query performance: Partitioning can significantly improve query performance, particularly for large datasets. Queries that involve a specific partition can be executed faster, as they only need to access the relevant partition instead of scanning the entire table. Furthermore, query optimizers can leverage partitioning metadata to optimize query execution plans, further enhancing performance.
Distributed systems: Horizontal partitioning is particularly beneficial in distributed database systems, where data can be spread across multiple servers or clusters. This can help to balance the workload, enhance fault tolerance, and provide better data locality for query processing.
Manageability: Partitioning can simplify database maintenance tasks, such as backups, indexing, and updates. By working with smaller partitions instead of the entire table, these tasks can be performed more efficiently, reducing the impact on overall system performance.
Flexibility: Horizontal partitioning allows for flexible partition management. Partitions can be added, removed, or modified independently without affecting other partitions or the entire table. This flexibility facilitates changes in the database schema or data distribution as the application requirements evolve.
Horizontal partitioning is a critical technique in database design that can enhance performance, scalability, and manageability. By dividing a table into smaller partitions based on a partition key, it enables efficient query execution, improved resource utilization, and easier maintenance in large-scale applications.
Horizontal Splitting
You should consider partitioning a table horizontally when the table grows so large that searches and joins using the table become unacceptably slow. For example, your database might contain a table listing every order placed by Stories on CD.
One technique for reducing the size of a table accessed by a query is to split it into two or more tables with the same columns and to
allocate the rows to different tables according to some criteria. In effect we are defining and implementing subtypes.
For example, although it might make sense to include historical data in the same table as the corresponding current data, it is likely that different queries access current and historical data[1]. Placing current and historical data in different tables with the same structure will certainly improve the performance of queries on current data. You may prefer to include a copy of the current data in the historical data table to enable queries on all data to be written without the UNION operator. This is duplication rather than splitting.
Historical Data in the context of Horizontal Partitioning:
Separation: Historical data is physically separated from current data, often onto different storage devices or servers.
Specific partitions: It's typically placed into designated partitions based on a partitioning key, commonly a date or time column.
Independent management: Each partition can be managed independently, allowing for:
Optimization of storage and retrieval
Distinct backup and recovery strategies
Separate retention policies
Different access controls
Reduced impact on current operations: Partitioning historical data can improve performance and reduce resource consumption for current operations by:
Minimizing the amount of data that needs to be scanned for queries
Allowing for more efficient use of indexes
Reducing contention for resources
Common use cases:
Archiving older data: Moving older records to historical partitions frees up space and improves performance for current transactions.
Compliance and auditing: Partitioning can aid compliance with data retention regulations and facilitate audits.
Analytics and reporting: Historical data can be analyzed independently for insights into trends, patterns, and historical performance.
Example:
In a sales database, historical data might include orders from previous years, while current data would encompass recent orders.
The orders table could be horizontally partitioned by year, with each year's data residing in a separate partition.
Queries for current orders would only need to access the most recent partition, while historical analysis could target specific partitions as needed.
Partition by Date
Since Stories on CD is a successful, growing business, the table will most likely grow to an unmanageable size quite quickly. Even though all of the order data is important, you may find that Stories on CD employees most frequently access records for orders placed over the previous three months. Rather than store all orders in a single table, you could choose to partition the table horizontally so that only those orders from a given three-month period (quarter) are stored in a single table. Partitioning the Orders table in this manner would reduce the size of the table the RDBMS would need to search while still meeting most of the needs of the users .The next lesson discusses the disadvantages of horizontal partitioning.
When placing a table in first normal form, one ends up with one large table with one primary key. Likewise with graphical normalization, we end up with on large graph/diagram after bringing the known attributes to first normal form.
[1]historical data: Historical data refers to rows of data that are no longer actively used or accessed frequently, but still need to be preserved for reference, compliance, or analysis purposes. These rows are separated from more current, frequently accessed data into distinct partitions within a table.