Because concurrency is not as big an issue,
denormalized data models are common. Data loading is handled by ETL (extract, transform, and load) processes that typically run on time intervals, and those processes will frequently store data in a redundant fashion that allows fast reads and minimal joins. These denormalized models are often highly indexed for fast retrieval, and index maintenance can be handled at ETL time. The indexing on an OLAP system is done with very high fill-factors resulting in compressed indexes for very fast reads.
As a SQL Server system administrator, one of the tasks you will have to perform is loading and extracting of data.
SQL Server provides multiple methods to do this, each with its own benefits.
- Learning Objectives
By the time you complete this module, you should be able to:
- Load data with the
insert
statement
- Load data with the
select into
statement
- Use the
bulk insert
statement
- Use the bcp program
- Identify key features of Data Transformation Services
- Use DTS Designer to create a package.
- Create, schedule, and run DTS packages
- Identify key features of DTS Wizard
- Run the DTS Wizard
Denormalized models, often seen in data warehouses, leverage ETL processes to precompute and redundantly store data, minimizing the need for complex joins at query time. High fill-factor indexing and compression further optimize these systems for fast retrieval, which is critical for analytical workloads. As a SQL Server system administrator, you're tasked with managing data loading and extraction, and SQL Server indeed offers several tools and methods to handle this efficiently. Let’s break down some of the key options you might use:
- SQL Server Integration Services (SSIS): This is a full-featured ETL tool built into SQL Server. You can design workflows to extract data from various sources, transform it (e.g., clean, aggregate, or denormalize), and load it into your target tables. SSIS is great for scheduled batch processes and can handle complex transformations and indexing as part of the pipeline.
- BULK INSERT / BCP (Bulk Copy Program): For high-speed loading of large datasets, these tools are ideal. BULK INSERT is a T-SQL command that quickly loads data from flat files into a table, while BCP is its command-line counterpart. Both support minimal logging under the right conditions (e.g., when the target table is empty or in simple recovery mode), making them efficient for ETL processes feeding denormalized tables.
- T-SQL (INSERT, MERGE, etc.): For more controlled or incremental loads, you might write custom T-SQL scripts. The MERGE statement, for example, is handy for upsert operations (inserting new records and updating existing ones) during ETL, especially when maintaining denormalized structures.
- Linked Servers or OPENROWSET: If you're extracting data from other databases or external sources, these methods allow you to pull data directly into SQL Server, which can then be transformed and loaded into your denormalized model.
- Partitioning and Indexing Maintenance: Since you mentioned high fill-factor indexes, SQL Server lets you manage these through index rebuilds or reorganizations (via ALTER INDEX). During ETL, you might disable indexes on the target table, load the data, and then rebuild them to ensure optimal compression and performance for OLAP queries.
Each method has trade-offs. SSIS is robust but can be overkill for simple loads. BULK INSERT is blazing fast but lacks transformation capabilities. T-SQL offers flexibility but requires more manual effort. As an admin, you’d choose based on the volume of data, transformation needs, and how frequently the ETL runs (e.g., hourly, daily).
The first part of an ETL process involves extracting the data from the source systems.
Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization/format. Common data source formats are
relational databases and flat files, but may include non-relational database structures such as
- Information Management System (IMS) or other data structures such as
- Virtual Storage Access Method (VSAM) or
- Indexed Sequential Access Method (ISAM),
or fetching from outside sources such as through web spidering or screen-scraping. The streaming of extracted data source and load on-the-fly to the destination database is another way of performing ETL when no intermediate data storage is required. In general, the goal of the extraction phase is to convert the data into a single format which is appropriate for transformation processing. An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure.
If not, the data may be rejected entirely or in part.
- Data Loading is handled by ETL
Because concurrency is not as big an issue, denormalized data models are common. Data loading is handled by ETL (extract, transform, and load) processes that typically run on time intervals, and those processes will frequently store data in a redundant fashion that allows fast reads and minimal joins. These denormalized models are often highly indexed for fast retrieval, and index maintenance can be handled at ETL time. The indexing on an OLAP system is done with very high fill-factors resulting in compressed indexes for very fast reads. OLAP systems are often light on business rules, sometimes even ignoring such simple structures as basic relational integrity constraints. It is important to remember that your data warehouse may source data from many systems, each of which may have a completely different set of business rules, so the data warehouse itself will want to be as business rule agnostic as you can manage. Data cleansing can be done by the ETLs, so the rules do not need to be enforced again here. The job of the OLAP design is to take as much time as needed (within limits) to store data in such a 1way that, at data retrieval time, the database returns information to the user quickly. OLAP data models need not even be strictly relational data models. ROLAP (relational OLAP) is where you will start here, as in any case it is likely to be where you start. Once you have created a ROLAP database (whether you do so from a Kimball or an Inmon perspective), you can use that to build a MOLAP system in SSAS.
As you might expect, when you create a new database, it does not contain any data. In most cases, your application will require some data to be in the database before you can even begin to use it. One of the jobs of a
SQL Server Database Administrator is to load data into a database and extract data from the database. Often times a system you will also need to perform a periodic data load to maintain the data that is needed for it to run. Alternately, you will need to perform data extracts, so you can archive old data out of the system to maintain the systems performance.
The first module in this course will show you multiple ways of
- loading data into a newly created database, as well as
- extracting data from existing databases.
The next lesson will cover loading data with the
insert
statement.