The order of the table names in the FROM clause determines the
driving table for the rule-based query. It is very important to place the table that will return the smallest number of rows last
in our FROM clause.
The driving table
To review, rows from the driving table are retrieved first, and rows from the other table are then probed for corresponding values.
Therefore, it is essential that the second table return the least amount of rows based on the WHERE clause. The driving table is
the table that will return the smallest number of rows, based on the filters in the WHERE clause. This is not always the table
with the smallest number of rows.
An example
For example, lets assume that we have a distributed Oracle database with customer tables on several locations. One customer
table resides in London and another in New York, and the distribution of data is shown below.
The problem is that the data is skewed. We have the vast majority of customers in New York, so the London table should be last for
unqualified queries. For an unqualified query against these tables, we should specify the New York table first, since London has
the least amount of returned rows:
SELECT
Customer_name
FROM
customer@new_york,
customer@london
;
However, if we query for bad credit, we should put the New York table last, since New York has far fewer deadbeats than London.
SELECT
Customer_name
FROM
customer@london,
customer@new_york
WHERE
Credit_status = ‘BAD’
;
Changing the order of table names in the FROM clause can have a dramatic impact on the performance of nested loop joins[1] . It is common to double the performance of a rule-based query with this technique.
Now, let us look at how the order of Boolean operators can also affect the performance in the WHERE clause.