Lesson 6	Distributed databases
Objective	What are the features of a distributed database?

Distributed Database Features

There is an ongoing debate regarding a standard definition for distributed Oracle databases. To Oracle, a distributed database is a geographically distributed system composed entirely of Oracle products. To the GUI/tools vendors, a distributed database is a system that is distributed architecturally, having systems with different architectures and access methods. To the hardware vendors, a distributed database is a system composed of different databases, all running on the same hardware platforms. In fact, each of these descriptions fits the overall distributed model, but there are some distinguishing characteristics that differentiate a real distributed database from a loosely coupled system. The standard definition of distributed database has been developed by C. J. Date; he lists 12 specifications for an ideal distributed database:

Homogenous Distributed Database Systems
A homogenous distributed database system is a network of two or more Oracle databases that reside on one or more machines. Figure 3-6 illustrates a distributed system that connects three databases: hq, mfg, and sales. An application can simultaneously access or modify the data in several databases in a single distributed environment. For example, a single query from a Manufacturing client on local database mfg can retrieve joined data from the products table on the local database and the dept table on the remote hq database. For a client application, the location and platform of the databases are transparent. You can also create synonyms for remote objects in the distributed system so that users can access them with the same syntax as local objects. For example, if you are connected to database mfg but want to access data on database hq, creating a synonym on mfg for the remote dept table enables you to issue this query:
```
SELECT * FROM dept;
```
In this way, a distributed system gives the appearance of native data access. Users on mfg do not have to know that the data they access resides on remote databases.

Figure 3-6 Homogenous Distributed Database Systems

An Oracle distributed database system can incorporate Oracle databases of different versions. All supported releases of Oracle can participate in a distributed database system. Nevertheless, the applications that work with the distributed database must understand the functionality that is available at each node in the system. A distributed database application cannot expect an Oracle7 database to understand the SQL extensions that are only available with Oracle9i.

Continuous Operation of Remote Sites

Each site is capable of processing independently from the other remote sites, and the addition of a new site will not affect the overall system. While each site maintains its own unique identity and control, it functions as a part of a unified federation such that other remote sites can access information from it in a seamless fashion. Continuous operation also refers to the ability of each node to be available to the overall system 24 hours per day, seven days per week. To accomplish this goal, remote sites may have a "hot backup" tool such as the Oracle Enterprise Backup Utility (EBU) or the Oracle8 Recovery Manager (RMAN) to back up the database while it remains available for update. Other Oracle continuous-operation tools include Oracle Parallel Server and Oracle snapshots.

Provide Redundancy and Fault Tolerance: One of the most common reasons for developing a distributed database system is to provide redundancy and fault tolerance. By the same token, a distributed system should not require scheduled outages to perform maintenance such as adding and removing a site or upgrading software. Of course, in the ideal world we would have zero downtime, scheduled or not, however unplanned outages are difficult to avoid entirely.
Definition of Continuous Operation Distributed System: Continuous operation of a distributed system means that no maintenance tasks should require an outage of the entire system. Maintenance tasks may include upgrades to the operating system or RDBMS or the addition and deletion of participating sites. If the Oracle distributed system is built on database links and simple replication (i.e., read-only snapshots), then there are no maintenance activities that would require an outage of the entire distributed environment. Sites can be added or removed at any time, and upgrades can be executed without impacting participating sites. However, if you are using the advanced replication facilities, Oracle imposes certain limitations. Most significantly, if you wish to add a new master or snapshot site to a replicated environment, you must coordinate the addition so that the data at the new site includes data changes that may have occurred while the new site is being instantiated.

Replication Independence

Replication is the ability of a database to create copies of a master database at remote sites. These copies are sometimes called snapshots and may contain the whole database or any subcomponent of the database. For example, in a relational database, a CUSTOMER table may be snapped to many remote sites for read-only query. Subsets of the CUSTOMER table may be specified, requesting only specific rows and columns, and these replications are refreshed on a periodic basis.
The Publish/Subscribe Model for Oracle Replication is a more flexible method for replication than was the Oracle7 snapshot replication.Oracle advanced queuing has initiated a new replication model dubbed "publish/subscribe" that differ from the traditional replication model in a very important way. The sender of the replicated data (called the "publisher"), does not have to know anything about what's-up on the receiver-side (called the "subscriber"). With replication independence we no longer have to complex feed and verify procedures and the dbms_aq package allows some interesting replication functions: listen Procedure The new dbms_aq.listen procedure allows you to "listen" for incoming messages (rows), just as-if you were the registered subscriber. The listen procedure has a newer input parm method, agent_list) which is a Oracle abstract data type (ADT):

agent_list IN AQ$_AGENT_LIST_T,
wait IN BINARY_INTEGER DEFAULT DBMS_AQ.FOREVER, 
agent OUT SYS.AQ$_AGENT);

In order to meet the replication independence objective, a distributed system must provide a means of maintaining copies of the same data (i.e., tables) at multiple sites. The reasons to replicate include performance gains and failover capability, to name a few. The challenge with providing replication independence is that when data is changed, the change must propagate to all replicas, as soon as possible. Users and applications should not be concerned with how their changes to a replicated table are propagated or whether their version of the table is up to date. Technically, replication independence requires that changes be propagated to all sites and committed as a single transaction using the two-phase commit protocol. However, enforcement of this stipulation can defeat the purpose of replicating in the first place since the additional communication required impacts performance and since processing is halted if any site is unavailable.

Hardware Independence
This refers to the ability to query and update information regardless of the hardware platform on which the data resides. A single query from a PC might retrieve information from

an IBM 390 or RS6000,

a local database on the PC, and

an HP-9000

in a single transaction. Of course, an Oracle Network Services implementation does not care what type of processor is included in the network. Oracle Network Services also does not care what type of hardware is being used in your network.

Hardware Independence
Hardware independence refers to the ability to execute a select statement and update data regardless of the underlying platform on which the data resides. For example, a single query initiated on a Mac might retrieve information from an

IMS on a mainframe,

a Lotus database on a PC, and

an Oracle database on a midrange host.

The various participants in a distributed system should be able to run on whatever hardware platform suits their needs. In effect, this means that the RDBMS should run on all conceivable platforms and include the same functionality across all platforms.

Local Autonomy

Location Independence

Fragmentation Independence

Distributed Query Processing

Distributed Transaction Management (update processing)

Operating System Independence

Network Independence

Database Independence

Distributed Databases

Distributed Databases eliminates Central Database Reliance

Ideally, all sites are equally remote, and no one site has governing authority over another. Each site retains its own Oracle data dictionary and table security. A data dictionary is a set of metadata that contains definitions and representations of data elements. Within the context of a DBMS, a data dictionary is a read-only set of tables and views. Amongst other things, a data dictionary holds the following information:

Precise definition of data elements
Usernames, roles and privileges
Schema objects
Integrity constraints
Stored procedures and triggers
General database structure
Space allocations

Distributed database systems provide information on the networking, configuration, security, and design of systems. Any organization that uses the Oracle relational database management system (RDBMS) probably has multiple databases. There are a variety of reasons why you might use more than a single database in a distributed database system:

Different databases may be associated with particular business functions, such as manufacturing or human resources.
Databases may be aligned with geographic boundaries, such as a very large database at a headquarters site and smaller databases at regional offices.
Two different databases may be required to access the same data in different ways, such as an order entry database whose transactions are aggregated and analyzed in a data warehouse.
A busy internet commerce site may create multiple copies of the same database to attain horizontal scalability.
A copy of a production database may be created to serve as a development test bed.

In the next lesson, we will take a look at Oracle's topology solution to distributed databases.