Lesson 1
Importing and Exporting Data in SQL-Server
Data Loading and Extracting using SQL Server
Because concurrency is not as big an issue, denormalized data models are common.
Data loading is handled by ETL (extract, transform, and load) processes that typically run on time intervals,
and those processes will frequently store data in a redundant fashion that allows fast reads and minimal joins.
These denormalized models are often highly indexed for fast retrieval, and index maintenance can be handled at ETL time.
The indexing on an OLAP system is done with very high fi ll-factors resulting in compressed indexes for very fast reads.
As a SQL Server system administrator, one of the tasks you will have to perform is loading and extracting of data.
SQL Server provides multiple methods to do this, each with its own benefits.
Learning Objectives
- Load data with the
insert
statement
- Load data with the
select into
statement
- Use the
bulk insert
statement
- Use the bcp program
- Identify key features of Data Transformation Services
- Use DTS Designer to create a package.
- Create, schedule, and run DTS packages
- Identify key features of DTS Wizard
- Run the DTS Wizard
The next lesson will cover loading data with the
insert
statement.
ETL Process
The first part of an ETL process involves extracting the data from the source systems.
Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization/format. Common data source formats are
relational databases and flat files, but may include non-relational database structures such as
- Information Management System (IMS) or other data structures such as
- Virtual Storage Access Method (VSAM) or
- Indexed Sequential Access Method (ISAM),
or fetching from outside sources such as through web spidering or screen-scraping.
The streaming of extracted data source and load on-the-fly to the destination database is another way of performing ETL when no intermediate data storage is required. In general, the goal of the extraction phase is to convert the data into a single format which is appropriate for transformation processing. An intrinsic part of the extraction involves the parsing of extracted data, resulting in a check if the data meets an expected pattern or structure.
If not, the data may be rejected entirely or in part.
Data Loading is handled by ETL
Because concurrency is not as big an issue, denormalized data models are common. Data loading is handled by ETL (extract, transform, and load) processes that typically run on time intervals, and those processes will frequently store data in a redundant fashion that allows fast reads and minimal joins. These denormalized models are often highly indexed for fast retrieval, and index maintenance can be handled at ETL time. The indexing on an OLAP system is done with very high fill-factors resulting in compressed indexes for very fast reads. OLAP systems are often light on business rules, sometimes even ignoring such simple structures as basic relational integrity constraints. It is important to remember that your data warehouse may source data from many systems, each of which may have a completely different set of business rules, so the data warehouse itself will want to be as business rule agnostic as you can manage. Data cleansing can be done by the ETLs, so the rules do not need to be enforced again here. The job of the OLAP design is to take as much time as needed (within limits) to store data in such a 1way that, at data retrieval time, the database returns information to the user quickly. OLAP data models need not even be strictly relational data models. ROLAP (relational OLAP) is where you will start here, as in any case it is likely to be where you start. Once you have created a ROLAP database (whether you do so
from a Kimball or an Inmon perspective), you can use that to build a MOLAP system in SSAS.