| Lesson 2 |
Function overview |
| Objective |
Understand Issues related to the Use of SQL Functions |
SQL Function Overview (Views, Column Aliases, and HAVING Explained)
Lesson 1 introduced the taxonomy of SQL functions — scalar, table-valued, aggregate, and user-defined — and showed how functions are integrated into SQL queries. This lesson focuses on the practical issues that arise when using functions in production systems: the performance and portability challenges that affect how functions should be applied, the rule that views containing functions become read-only, how to correctly name the columns that functions return, and the critical distinction between the WHERE clause and the HAVING clause when filtering aggregated results. These are the issues that SQL developers encounter most frequently when moving from function syntax to function design.
Challenges of Using SQL Functions
SQL functions are powerful tools that enable complex queries and data manipulation. However, several practical challenges arise in their use — challenges that require careful consideration in systems where performance, portability, and maintainability are critical.
Performance — Row-by-Row Execution and Index Bypass
The most significant performance challenge with SQL functions — particularly scalar functions — is row-by-row execution. A scalar function applied in a WHERE, JOIN, or SELECT clause executes once per row in the result set. For a table with one million rows, a scalar function in the WHERE clause executes one million times, regardless of how efficiently the function itself is implemented. This contrasts with set-based operations, which the database optimizer can process as a whole using index scans, hash joins, and parallelism.
Index bypass is a related problem: when a function is applied to an indexed column in a WHERE clause, the optimizer typically cannot use the index on that column. The index stores the raw column values, not the function results — so the database must evaluate the function for every row and compare the result, producing a full table scan even when an efficient index exists.
-- Index on last_name cannot be used here
-- because UPPER() is applied to the indexed column
SELECT customer_id, last_name
FROM customers
WHERE UPPER(last_name) = 'SMITH';
-- Better: store normalized data or use a function-based index
-- SQL Server: use a computed column with an index
-- Oracle / PostgreSQL: use a function-based index directly
CREATE INDEX idx_upper_last_name
ON customers (UPPER(last_name)); -- Oracle / PostgreSQL syntax
-- Then the optimizer can use the index
SELECT customer_id, last_name
FROM customers
WHERE UPPER(last_name) = 'SMITH';
SQL Server 2019 introduced scalar UDF inlining, which automatically rewrites eligible scalar UDFs as inline expressions, allowing the optimizer to reason about them and use indexes. However, not all scalar functions qualify for inlining — functions with side effects, multiple statements, or non-deterministic behavior are excluded.
Portability — Vendor-Specific Syntax
SQL functions are not fully standardized across database implementations. The SQL standard defines a core set of functions, but each vendor extends that set with proprietary functions — and even for equivalent operations, syntax often differs. The lesson 1 example of UCASE()/LCASE() versus UPPER()/LOWER() illustrates this: both sets of functions perform the same operation, but UCASE()/LCASE() are MySQL-specific while UPPER()/LOWER() are the SQL standard form supported by all major implementations.
String concatenation is a particularly common portability issue: SQL Server uses the + operator, Oracle and PostgreSQL use ||, and MySQL/SQL Server 2012+ support CONCAT(). Date functions diverge even more sharply — GETDATE() in SQL Server, SYSDATE in Oracle, NOW() in MySQL and PostgreSQL. Any SQL written with vendor-specific functions requires modification before it can run on a different database. Writing portable SQL means choosing the standard form when one exists, and documenting vendor-specific deviations explicitly when standard forms are unavailable.
Deterministic vs Non-Deterministic Functions
A deterministic function always returns the same result given the same inputs — UPPER('hello') always returns 'HELLO', ROUND(3.14159, 2) always returns 3.14. A non-deterministic function returns different results on different executions even with identical inputs — NOW() returns the current timestamp, RAND() returns a random number, NEWID() (SQL Server) / UUID() (MySQL/PostgreSQL) returns a unique identifier each time.
The deterministic/non-deterministic distinction has two practical consequences. First, indexes: SQL Server and Oracle cannot create indexes on computed columns or function-based expressions that use non-deterministic functions — because the index value would differ each time the row is accessed, making the index unreliable. Second, query plan caching: non-deterministic functions in queries prevent full execution plan reuse, since each execution may produce different results. Using NOW() in a WHERE clause typically forces the optimizer to re-evaluate the clause on each execution rather than caching a plan that assumes a fixed value.
Security and Data Type Considerations
User-defined functions that accept string input and embed it in dynamic SQL are vulnerable to SQL injection if the input is not parameterized or sanitized. A UDF that builds a query string from a parameter and executes it with EXEC or EXECUTE IMMEDIATE gives an attacker the same attack surface as a vulnerable stored procedure. The defense is identical: use parameterized queries, never concatenate user input directly into SQL strings, and apply the principle of least privilege — grant EXECUTE permission on specific functions rather than broad database-level permissions.
Data type compatibility issues arise when functions perform implicit conversions. CONCAT() with numeric arguments silently converts numbers to strings in some implementations but raises an error in others. Functions that return floating-point results can introduce precision loss when the result is stored in a DECIMAL column — for example, ROUND(1.005, 2) returns 1.00 in some implementations due to floating-point representation of 1.005. Always use explicit CAST or CONVERT when mixing data types in function arguments.
Functions in Views — Why Views Become Read-Only
The Read-Only Rule Explained
When you use functions in SQL queries, you are not simply adding values together — you are asking the engine to perform transformations that produce derived results. One important consequence of this is the effect on views: if you create a view that includes a function, the view will not be updatable. The value returned by a function is a formula result — a derived value computed from one or more underlying column values. Because the result is derived, the database engine cannot determine which underlying column or row to modify when an UPDATE is attempted against the view.
-- Create a view that includes a function (concatenation)
CREATE VIEW customer_display AS
SELECT
customer_id,
CONCAT(first_name, ' ', last_name) AS full_name,
UPPER(email) AS email_upper
FROM customers;
-- Querying the view works normally
SELECT * FROM customer_display WHERE customer_id = 42;
-- Attempting to UPDATE the derived column fails
UPDATE customer_display
SET full_name = 'Jane Doe'
WHERE customer_id = 42;
-- Error: cannot update the view because it contains a derived column
This forces the view to be read-only whenever it contains a function expression in the SELECT list. Views without functions — selecting raw column values from a single base table — are typically updatable, subject to engine-specific rules about key preservation and null constraints.
Relational Database Design Implementation
Working Around the Limitation
The standard workaround is to separate read-only display views (containing functions) from updatable views (containing only raw columns). Updates are directed to the underlying base table directly or through the updatable view; the display view is used only for SELECT queries. In SQL Server, an INSTEAD OF trigger on the view can intercept UPDATE statements and route them to the appropriate base table columns — but this adds complexity and is generally reserved for situations where the view interface must remain stable for backward compatibility reasons.
Column Aliases — Naming Function Return Values
The AS Keyword — Standard Syntax
When a function returns a value, there is by default no column header for that value in the result set — the column appears with an automatically generated or blank name depending on the database implementation. To name the column, use the AS keyword to assign a column alias:
-- Modern standard syntax — works in SQL Server, Oracle, PostgreSQL, MySQL
SELECT
CONCAT(au_fname, ' ', au_lname) AS full_name,
LEN(au_lname) AS last_name_length,
UPPER(city) AS city_upper
FROM authors;
-- The result set has named columns: full_name, last_name_length, city_upper
The legacy SQL Server syntax `SELECT 'ColumnName' = Function` — seen in older code and textbooks — places the alias before the expression using an equals sign. This syntax is SQL Server proprietary and is not supported by Oracle, PostgreSQL, or MySQL. The AS keyword is the SQL standard form and should be used in all new development.
String Concatenation — Cross-Database Syntax
The original code on this page used `SELECT 'Name' = au_fname + '' + au_lname` — two issues: the legacy alias syntax, and an empty string between the names that produces no space in the output. The corrected and cross-database form:
-- CONCAT() — SQL Server 2012+, MySQL, PostgreSQL, Oracle 11g+
SELECT CONCAT(au_fname, ' ', au_lname) AS author_name
FROM authors;
-- || operator — Oracle, PostgreSQL (ANSI SQL standard)
SELECT au_fname || ' ' || au_lname AS author_name
FROM authors;
-- + operator — SQL Server only (not portable)
SELECT au_fname + ' ' + au_lname AS author_name
FROM authors;
-- CONCAT_WS (concatenate with separator) — MySQL, PostgreSQL, SQL Server 2017+
-- Automatically handles NULLs — skips NULL segments
SELECT CONCAT_WS(' ', au_fname, au_lname) AS author_name
FROM authors;
CONCAT_WS() is particularly useful when any component might be NULL — it skips NULL arguments rather than returning a NULL result for the entire expression, which is the behavior of
CONCAT() in SQL Server when a NULL argument is present (MySQL's CONCAT() also returns NULL if any argument is NULL without explicit NULL handling).
Aggregate Functions and the HAVING Clause
COUNT(*) vs COUNT(column)
The aggregate function COUNT() has two distinct forms with different behavior.
COUNT(*) counts every row in the group, including rows where some columns are NULL — the asterisk means "count the row regardless of column values."
COUNT(column_name) counts only the rows where the specified column is not NULL — NULL values are excluded from the count. This distinction matters when the counted column contains NULLs:
-- COUNT(*) includes all rows; COUNT(column) excludes NULLs
SELECT
department_id,
COUNT(*) AS total_employees,
COUNT(manager_id) AS employees_with_manager
FROM employees
GROUP BY department_id;
-- If some employees have no manager (manager_id IS NULL),
-- COUNT(*) and COUNT(manager_id) will differ for those departments
SQL Clause Evaluation Order
Understanding why WHERE cannot filter on aggregate results requires understanding the order in which SQL clauses are evaluated. The logical processing order is:
-- SQL logical evaluation order (not the same as written order)
-- 1. FROM — identify the source tables and joins
-- 2. WHERE — filter individual rows before grouping
-- 3. GROUP BY — organize surviving rows into groups
-- 4. HAVING — filter groups after aggregation
-- 5. SELECT — compute output expressions and aliases
-- 6. ORDER BY — sort the final result set
-- 7. LIMIT/TOP — restrict the number of output rows
The WHERE clause executes at step 2 — before GROUP BY (step 3) and before aggregate functions are computed. This means that at the time WHERE is evaluated, the groups do not yet exist and aggregate values like COUNT(*) have not been calculated. Attempting to reference COUNT(*) in a WHERE clause asks the engine to filter on a value that has not yet been computed — which is why the engine raises an error.
Why WHERE Cannot Filter Aggregates
The following example demonstrates the error that occurs when a WHERE clause attempts to filter on an aggregate function — a mistake frequently made by developers who are new to GROUP BY queries:
-- INCORRECT: WHERE cannot reference aggregate functions
-- ERROR: Invalid use of group function
SELECT open_emp_id, COUNT(*) AS how_many
FROM account
WHERE COUNT(*) > 4
GROUP BY open_emp_id;
The error occurs because COUNT(*) has not been evaluated when WHERE is processed. The WHERE clause can only reference column values that exist in the source rows before grouping — raw column values, not computed group summaries.
HAVING — Filtering After GROUP BY
The HAVING clause was designed specifically to filter on grouped and aggregated results. It executes after GROUP BY (step 4 in the evaluation order), at which point COUNT(*) and all other aggregate values have been computed for each group. The corrected query using HAVING:
-- CORRECT: HAVING filters on aggregate results after grouping
SELECT open_emp_id, COUNT(*) AS how_many
FROM account
GROUP BY open_emp_id
HAVING COUNT(*) > 4;
-- Result:
-- open_emp_id | how_many
-- ------------|----------
-- 1 | 8
-- 10 | 7
-- 16 | 6
HAVING can reference any aggregate function that appears in the SELECT list or that is valid for the grouped result. It can also be combined with WHERE: WHERE filters individual rows before grouping, HAVING filters groups after aggregation — both clauses can appear in the same query.
-- WHERE and HAVING used together
SELECT open_emp_id, COUNT(*) AS how_many
FROM account
WHERE status = 'ACTIVE' -- filters rows before grouping
GROUP BY open_emp_id
HAVING COUNT(*) > 4; -- filters groups after aggregation
Other Aggregate Functions — SUM, AVG, MIN, MAX
COUNT() is the most commonly encountered aggregate function but four others complete the standard aggregate toolkit. SUM() adds all non-NULL values in the group — used for totaling sales amounts, quantities, or any numeric measure. AVG() returns the arithmetic mean of non-NULL values — equivalent to SUM(column) / COUNT(column) but more concise. MIN() and MAX() return the smallest and largest values in the group respectively — applicable to numeric, date, and string data types. All five standard aggregate functions follow the same rules: they operate on groups defined by GROUP BY, they ignore NULL values (except COUNT(*)), and their results can be filtered with HAVING.
-- All five standard aggregate functions in one query
SELECT
department_id,
COUNT(*) AS headcount,
SUM(salary) AS total_payroll,
AVG(salary) AS avg_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees
GROUP BY department_id
HAVING AVG(salary) > 50000
ORDER BY total_payroll DESC;
In the next lesson, you will learn how to use character functions to manipulate and format string data in SQL queries.
