Here is a breakdown of when to favor EXISTS over IN for subqueries in Oracle, along with the key differences to consider:
When to use EXISTS
Checking for Existence: If you primarily care about whether the subquery returns "any" rows at all, EXISTS is generally the more efficient choice. It stops processing as soon as a single matching row is found.
Correlated Subqueries: EXISTS is often necessary in correlated subqueries (where the subquery references values from the outer query).
Potential NULL Values: EXISTS handles NULL values more gracefully than IN.
When to use IN
Retrieving Specific Values: If you need the actual values returned by the subquery for use in the outer query, IN is the way to go.
Small Result Sets: When the subquery is expected to return a very small and well-defined set of values, the performance difference between EXISTS and IN might be negligible.
Performance Considerations
In many cases, EXISTS will outperform IN, especially for larger subquery result sets. This is because IN may process the entire subquery result before evaluating the condition.
However, the Oracle optimizer is intelligent and might rewrite some queries internally. The best way to be certain for a specific scenario is to examine the execution plans.
Example: Find customers who have placed at least one order:
EXISTS:
SELECT * FROM Customers
WHERE EXISTS (SELECT 1 FROM Orders WHERE Orders.CustomerID = Customers.CustomerID);
IN:
SELECT * FROM Customers
WHERE CustomerID IN (SELECT CustomerID FROM Orders);
Correlated Subquery can produce more efficient Code
The correlated subquery seen in the previous lesson may seem like a clumsy substitute for a JOIN query. In some cases this is true.
In fact, Oracle's SQL parser resolves the correlated subquery into a JOIN query with no subquery whenever it can.
This lesson shows you how to use the correlated subquery in a way that really pays for itself in more efficient code.
The EXISTS clause combined with a correlated subquery executes faster (less CPU time, less I/O time, and less overall response time) than an equivalent subquery using the IN clause in a normal subquery.
EXISTS clause syntax
Here is the general syntax of a query that uses the EXISTS clause:
SELECT col1, col2, ...FROM table alias1
WHERE EXISTS (SELECT 'x'
FROM table alias2
WHERE alias1.col5 = alias2.col6)
The syntax is very similar to the syntax for a regular correlated subquery. The differences are:
Rather than comparing a column with a result FROM the subquery, you simply use EXISTS in the outer query
Rather than returning a data value FROM the subquery, the subquery's SELECT clause always selects one literal: 'X'
Your query will run efficiently when you follow this syntax.
How the EXISTS clause works
The literal in the subquery SELECT clause can be any literal you choose and it makes no difference.
You use a literal because this reduces the need for retrieving actual data FROM the database when executing the subquery.
The EXISTS clause works like a switch: as soon as one row is found that satisfies the subquery's WHERE clause, the EXISTS clause sets itself to TRUE and stops. If no row is found, the EXISTS clause sets itself to FALSE. You save time because the database does not go and find every single row that satisfies the subquery. It stops as soon as one row is found. In addition, since the subquery is not retrieving data FROM the table, the execution of the subquery stops after evaluating the WHERE clause and does not continue by retrieving the associated row data. Very often, Oracle has an index to use when evaluating the WHERE clause. An index stores the indexed data and the row ID of the associated table row. Accessing an indexed column is one of the fastest access methods available in an Oracle database.
Example 5-5 semi-join Using WHERE EXISTS
The following query uses a WHERE EXISTS clause to list only the departments that contain employees
SELECT department_id, department_name
FROM departments
WHERE EXISTS (SELECT 1
FROM employees
WHERE employees.department_id = departments.department_id)
The execution plan reveals a NESTED LOOPS SEMI operation in Step 1:
For each row in departments, which forms the outer loop, the database obtains the department ID, and then probes the employees.department_id index for matching entries.
Conceptually, the index looks as follows:
If the first entry in the departments table is department 30, then the database performs a range scan of the index until it finds the first 30 entry, at which point it stops reading the index and returns the matching row from departments. If the next row in the outer loop is department 20, then the database scans the index for a 20 entry, and not finding any matches, performs the next iteration of the outer loop. The database proceeds in this way until all matching rows are returned.
Using the EXISTS clause
The best way to use the EXISTS clause is to substitute it for an IN clause.
The link below converts a query with an IN clause to a query with an EXISTS clause.
Change FROM IN to EXISTS
Question: How do you change from 'IN' to 'EXISTS' in Oracle SQL?
In Oracle SQL, you can change from using "IN" to "EXISTS" by modifying your SQL query. The "IN" keyword is used to retrieve rows where the specified column value matches any value in a given list, while the "EXISTS" keyword is used to retrieve rows where a subquery returns at least one row. To convert a query from using "IN" to "EXISTS", you can replace the list of values with a subquery that returns the same values. For example, let's say you have a query like this:
SELECT *
FROM my_table
WHERE my_column IN (1, 2, 3);
You can change it to use "EXISTS" like this:
SELECT *
FROM my_table t
WHERE EXISTS (
SELECT 1
FROM my_values v
WHERE v.value = t.my_column
);
In this example, "my_values" is a table or subquery that returns the same values as the list in the original query. The "EXISTS" keyword checks if at least one row is returned by the subquery for each row in "my_table", effectively achieving the same result as the original "IN" query.
Converting Query FROM using the IN clause to using the EXISTS clause
This simulation lets you practice converting a query FROM using the IN clause to using the EXISTS clause. Here is the query you will convert. It has been executed so you can see the results. To begin the conversion, you will edit the query using a text editor. Start up the text editor by typing EDIT next to the SQL> prompt and pressing Enter.
SQL*Plus opens up a text editor and places the text of the query into the file so you can conveniently edit the query. Here you see the Microsoft Notepad is our editor. (You can define a different default editor if you prefer.) Now, begin converting this query by adding an alias for the table in the FROM clause. Type a space and the letter C immediately after the name of the table in the FROM clause.
The next step is to revise the WHERE clause. Replace "CUST_ID IN" with "EXISTS" on the third line of the query text.
Now you revise the SELECT clause in the sub-query. Go to the fourth line of the query and replace "CUST_ID" with the letter X in single quotes.
The next step is to add an alias on the table in the sub-query. Go to the fifth line of the query and add a blank space and the capital letter D.
The next step is to add another comparison, the one that relates the sub-query to the outer query. Open a new line for entering this information by placing your cursor just to the left of the closing parenthesis and pressing Enter.
The next step is to add another comparison -- the one that relates the sub-query to the outer query. Open a new line for entering this information by placing your cursor just to the left of the closing parenthesis and pressing Enter.
The query is complete! Now close the text editor. Click the X in the top right corner of the window.
Before closing the file, Notepad asks you to save your changes with this warning box. Click the Yes button to save your changes.
You are back in SQL*Plus with the modified query in your buffer and ready to execute. To execute the new query, type a forward slash and press Enter.
SQL*Plus executes the query and displays the results. As you can see, the results are the same as the original query. Click OK to exit the simulation.
EXISTS is a test for existence.
It is placed the way IN might be placed with a subquery, but it differs in that it is a logical test for the return of rows from a query, not for the rows themselves. How many authors have written more than one book on the bookshelf?
select AuthorName, COUNT(*)
from BOOKSHELF_AUTHOR
group by AuthorName
having COUNT(*) > 1;
Attempting to find both AuthorName and Title fails, however, because the group by made necessary by the COUNT(*) is on the primary key of the BOOKSHELF_AUTHOR table (AuthorName, Title).
Because each primary key, by definition, uniquely identifies only one row, the count of titles for that one row can never be greater than 1, so the having clause always tests false since it
does not find any rows:
select AuthorName, Title, COUNT(*)
from BOOKSHELF_AUTHOR
group by AuthorName, Title
having COUNT(*) > 1;
no rows selected.
EXISTS provides a solution. The following subquery asks, for each AuthorName selected in the outer query, whether an AuthorName exists in the BOOKSHELF_AUTHOR table with a count of Titles greater than one.
If the answer for a given name is yes, the EXISTS test is true, and the outer query selects an AuthorName and Title. The author names are correlated by the "BA" alias given to the first BOOKSHELF_AUTHOR table.
column AuthorName format a25
column Title format a30
select AuthorName, Title
from BOOKSHELF_AUTHOR BA
where EXISTS
(select 'x'
from BOOKSHELF_AUTHOR BA2
where BA.AuthorName = BA2.AuthorName
group by BA2.AuthorName
having COUNT(BA2.Title) > 1)
order by AuthorName, Title;
In the next lesson, you will learn the unusual case of embedding a subquery in the FROM clause of your query.