Lesson 10	Querying XML Database
Objective	Understand How Relational Databases Store and Query XML Data.

XML in SQL Databases (How Relational Databases Store and Query XML)

The lessons in this module have covered the foundational SQL operations against relational tables — CREATE TABLE, CREATE INDEX, INSERT, and SELECT. This lesson extends that foundation into a related area: how modern relational databases handle XML data. XML remains important in enterprise integration, document storage, configuration management, and data exchange. Rather than treating XML as a separate concern, major relational database engines have integrated XML storage and querying directly into SQL — allowing XML documents to coexist with traditional rows and columns in the same database.

XML and the Relational Model

Why XML Appears in Relational Databases

XML — eXtensible Markup Language — is a text-based format for representing hierarchical, self-describing data. It has been the dominant standard for data interchange between systems, configuration files, document storage, and web services since the late 1990s. Many enterprise applications receive, generate, or exchange XML documents as part of their normal operation: web service responses, EDI transactions, product catalogs, regulatory submissions, and configuration payloads all commonly arrive as XML.

When an application needs to store XML alongside relational data — keeping a customer's contact information in normalized columns while also storing their order history as an XML document — the options are: store the XML as plain text (losing structure and queryability), convert the XML into relational tables (losing the document structure), or use the database engine's native XML type (preserving structure, enabling querying, and maintaining the relational context). Major relational databases choose the third option.

Two Approaches — XML-Enabled vs Native XML Databases

Databases that handle XML fall into two broad categories. XML-enabled relational databases extend SQL with an XML data type, allowing XML documents to be stored in columns alongside traditional relational data. The database does the conversion and management itself — the developer works with SQL and XML-specific functions without needing middleware. These are hybrid systems where relational and XML data coexist.

Native XML databases treat XML as their core data model. The fundamental unit of storage is an XML document rather than a relational row. Native XML databases are designed for document-oriented workloads where the data is inherently hierarchical and XML-structured — content management systems, digital archives, publishing workflows. They support XPath and XQuery as their primary query languages rather than SQL.

JSON vs XML — The Modern Context

JSON has largely supplanted XML as the preferred format for new applications, APIs, and web services. JSON is more compact, easier to parse in JavaScript environments, and maps naturally to the object models used in modern programming languages. All major relational databases now support JSON column types alongside or in addition to their XML support.

However, XML remains essential in established enterprise environments where decades of systems, standards, and integrations depend on it. Industries including healthcare (HL7, CDA), finance (FIX, FIXML), government (XBRL, UBL), and publishing (DocBook, DITA) continue to use XML as their primary interchange format. Understanding how relational databases handle XML is a practical skill for anyone working with these domains.

The XML Data Type in Relational Databases

What a Native XML Type Provides

Storing XML in a dedicated xml column type — rather than as plain TEXT, CLOB, or VARCHAR — provides several advantages. The database engine automatically validates that any value stored in an xml column is well-formed XML, rejecting malformed documents at insert time rather than letting bad data accumulate silently. The engine can index XML content for fast XPath and XQuery lookups, just as it indexes values in integer or text columns for fast SQL queries. XML-specific functions for extraction, transformation, and modification become available as part of the SQL query language. And the XML data retains its hierarchical structure while living inside a relational table alongside conventional columns.

The ISO SQL/XML Standard (9075-14)

The formal specification governing XML in SQL databases is ISO/IEC 9075-14, commonly called SQL/XML. This part of the SQL standard defines the XML data type, the rules for mapping between SQL data types and XML Schema types, and the XML-specific functions and operators that conforming database systems must support. PostgreSQL, SQL Server, Oracle, and IBM Db2 all implement SQL/XML, though each also extends it with engine-specific features and syntax variations.

PostgreSQL — The xml Data Type

Declaring an xml Column

PostgreSQL provides a built-in xml data type that checks all stored values for well-formedness. PostgreSQL must be compiled with --with-libxml for the type to be available — most PostgreSQL distributions include this by default. Declaring an xml column in a CREATE TABLE statement:

CREATE TABLE ProductCatalog (
    ProductID    INT         NOT NULL,
    ProductName  VARCHAR(200) NOT NULL,
    Specs        xml,
    CONSTRAINT pk_product PRIMARY KEY (ProductID)
);

The Specs column stores an XML document for each product — technical specifications, attributes, or any hierarchically structured data that does not fit cleanly into relational columns.

Inserting XML Data

XML values are inserted as string literals using the XMLPARSE function or by casting a string to the xml type. PostgreSQL validates well-formedness at insert time:

INSERT INTO ProductCatalog (ProductID, ProductName, Specs)
VALUES (
    1,
    'Widget A',
    XMLPARSE(DOCUMENT '<specs>
        <weight unit="kg">0.5</weight>
        <color>blue</color>
        <voltage>12</voltage>
    </specs>')
);

Querying with xpath() and XMLTABLE

PostgreSQL provides the xpath() function for evaluating XPath expressions against XML column values, and XMLTABLE for shredding XML into relational result sets:

-- Extract the color element value from Specs using xpath()
SELECT ProductName,
       (xpath('//color/text()', Specs))[1]::text AS Color
FROM ProductCatalog
WHERE ProductID = 1;

-- Shred XML into rows and columns using XMLTABLE
SELECT p.ProductName, x.Weight, x.Color
FROM ProductCatalog p,
     XMLTABLE('//specs' PASSING p.Specs
              COLUMNS
                  Weight  text PATH 'weight',
                  Color   text PATH 'color'
     ) x;

Microsoft SQL Server — The xml Data Type

Typed vs Untyped XML

SQL Server introduced a native xml data type in SQL Server 2005. An xml column can be either untyped — accepting any well-formed XML without schema validation — or typed — associated with an XML Schema Collection that validates documents against a registered schema definition. Untyped XML is the most common choice for general-purpose storage; typed XML is used when strict document validation is required.

-- Untyped xml column
CREATE TABLE ProductCatalog (
    ProductID    INT          NOT NULL PRIMARY KEY,
    ProductName  VARCHAR(200) NOT NULL,
    Specs        xml
);

-- Typed xml column — requires a registered XML Schema Collection
CREATE TABLE ProductCatalog (
    ProductID    INT          NOT NULL PRIMARY KEY,
    ProductName  VARCHAR(200) NOT NULL,
    Specs        xml(SpecsSchemaCollection)
);

XQuery Methods — .query(), .value(), .exist()

SQL Server exposes XML content through XQuery methods called on the xml column. These methods allow SQL queries to extract values, test for element existence, and return XML fragments:

-- .value() extracts a scalar value from the XML
SELECT ProductName,
       Specs.value('(/specs/color)[1]', 'VARCHAR(50)') AS Color,
       Specs.value('(/specs/weight)[1]', 'DECIMAL(5,2)') AS Weight
FROM ProductCatalog
WHERE ProductID = 1;

-- .exist() tests whether an XPath expression matches any node
SELECT ProductName
FROM ProductCatalog
WHERE Specs.exist('/specs/voltage') = 1;

-- .query() returns an XML fragment matching the XPath expression
SELECT ProductName,
       Specs.query('/specs/color') AS ColorFragment
FROM ProductCatalog;

XML DML — Modifying XML Stored in Columns

SQL Server supports XML Data Manipulation Language (XML DML) through the .modify() method, which allows inserting, replacing, and deleting nodes within a stored XML document without replacing the entire document:

-- Update the color element value within the stored XML
UPDATE ProductCatalog
SET Specs.modify('replace value of (/specs/color/text())[1] with "red"')
WHERE ProductID = 1;

Oracle Database — XMLType

Creating XMLType Columns

Oracle introduced XMLType in Oracle 9i as a system-defined object type for native XML handling. An XMLType column stores complete XML documents and provides member functions for querying, transformation, and validation:

CREATE TABLE ProductCatalog (
    ProductID    NUMBER       NOT NULL,
    ProductName  VARCHAR2(200) NOT NULL,
    Specs        XMLType,
    CONSTRAINT pk_product PRIMARY KEY (ProductID)
);

Storage Models — Binary XML vs Object-Relational

Oracle supports multiple internal storage models for XMLType columns. Binary XML storage stores documents in a compact binary format optimized for XQuery and XPath access. Object-relational storage (also called structured storage) decomposes the XML into relational tables based on an XML schema, enabling efficient SQL-based access to individual elements. Unstructured storage saves the XML as a CLOB — the simplest option but without the query optimization benefits of the other models.

Querying with XMLIndex

Oracle's XMLIndex creates a specialized index on the content of XMLType columns, enabling fast XPath and XQuery evaluation without scanning every document:

-- Extract a value using XMLQuery
SELECT ProductName,
       XMLQuery('/specs/color/text()' PASSING Specs RETURNING CONTENT).getStringVal() AS Color
FROM ProductCatalog
WHERE ProductID = 1;

IBM Db2 — SQL/XML and pureXML

The XML Data Type in Db2

IBM Db2 supports native XML storage through its implementation of the ISO SQL/XML standard. Db2 uses an XML data type that stores documents in a parsed, hierarchical format — not as raw text — enabling efficient XQuery and XPath access. Db2's XML support was formerly marketed as "pureXML," a branding term for its native XML storage engine:

CREATE TABLE ProductCatalog (
    ProductID    INT          NOT NULL,
    ProductName  VARCHAR(200) NOT NULL,
    Specs        XML,
    PRIMARY KEY (ProductID)
);

Hybrid Relational and XML Workloads

Db2's implementation is designed for hybrid workloads — queries that combine relational predicates on traditional columns with XQuery predicates on XML columns. A single SQL/XML query can filter by a relational column (ProductID, ProductName) while simultaneously evaluating an XPath expression against the XML column (Specs), with the optimizer treating both access paths together for an efficient execution plan.

Native XML Databases

MarkLogic — Multi-Model with Native XML

MarkLogic is a commercial multi-model database with strong native XML support at its core. It stores XML documents natively, indexes all element and attribute content automatically, and supports XQuery as a first-class query and application programming language. MarkLogic excels at large-scale XML workloads, semantic data (RDF triple stores), and content management scenarios where documents are the primary data unit rather than rows.

BaseX and eXist-db — Open-Source XQuery Engines

BaseX is an open-source native XML database with excellent support for XQuery 3.1, XPath, and interactive visualization tools. It is widely used in digital humanities, academic research, and applications requiring complex XQuery transformations. eXist-db is an open-source native XML database focused on XQuery, document collections, and web application development — it has been used extensively in XML publishing workflows and digital library projects.

When to Choose a Native XML Database

A native XML database is the appropriate choice when the primary data unit is an XML document rather than a relational row; when queries are predominantly XQuery or XPath expressions rather than SQL; when the schema evolves frequently and XML's flexibility is needed; or when the workload involves complex XML transformations, full-text search over XML content, or large collections of heterogeneous XML documents. For most applications that mix relational and XML data, a relational database with a native XML column type is sufficient and simpler to manage.

XPath and XQuery — The XML Query Languages

XPath — Locating Nodes in an XML Document

XPath is a W3C-standardized language for navigating and selecting nodes within an XML document. An XPath expression describes a path through the document's element hierarchy — similar to a file system path but applied to XML structure. XPath is the foundation that both XQuery and XSLT build on, and it appears in the XML query functions of every relational database that supports XML:

-- XPath expressions (used inside SQL functions like xpath(), .value(), XMLQuery())

/specs/color           -- selects the color element directly under specs
/specs/color/text()    -- selects the text content of the color element
//weight               -- selects any weight element anywhere in the document
/specs/weight[@unit]   -- selects weight elements that have a unit attribute
/specs/*               -- selects all child elements of specs

XQuery — The W3C Standard for XML Data Retrieval

XQuery is a W3C-standardized language for querying XML data. It is to XML what SQL is to relational tables — a declarative language for specifying what data to retrieve from an XML source. XQuery operates on the XPath Data Model, making it capable of querying any source that can be represented in that model: XML files, XML columns in relational databases, RDF stores, and in-memory XML structures.

FLWOR Expressions — for-let-where-order by-return

The most important construct in XQuery is the FLWOR expression — named for its five clauses: for, let, where, order by, and return. FLWOR is the XQuery equivalent of SQL's SELECT/FROM/WHERE/ORDER BY structure:

(: XQuery FLWOR expression — conceptual structure :)
for $item in /catalog/product
let $color := $item/specs/color
where $item/specs/voltage = "12"
order by $item/name
return <result>{$item/name/text()}: {$color/text()}</result>

The for clause iterates over a sequence of nodes; let binds a variable to an expression; where filters; order by sorts; and return constructs the output. FLWOR expressions can be nested and composed, enabling complex transformations of XML data.

XQuery vs SQL — Complementary, Not Competing

XQuery and SQL address different data models and are not in competition. SQL is designed for set-oriented operations on flat relational tables with a fixed schema. XQuery is designed for hierarchical, document-oriented data with flexible and evolving structure. In relational databases with XML column support, SQL handles the relational access and XQuery handles the XML access — the two languages are used together, with SQL providing the outer query structure and XQuery or XPath providing the inner XML navigation. The SQL/XML standard defines how they interoperate.

Practical Guidance — Choosing an Approach

When to Use a Native xml Column

Use a native xml column when an application stores documents that are inherently hierarchical and variable in structure — product specifications, order payloads, configuration files, medical records — alongside relational data about the same entities. The XML column stores the document intact while relational columns store the attributes that need to be indexed, filtered, and joined with conventional SQL. This hybrid approach is the most common pattern in enterprise applications: the customer's name, ID, and status live in relational columns; their preferences or history XML lives in an XML column.

When to Shred XML into Relational Tables

Shredding means decomposing an XML document into relational tables — extracting element values into columns and rows. This approach is appropriate when the XML structure is stable and well-defined, when the data will be queried primarily with SQL, when strong referential integrity between XML elements is needed, or when joins between XML-sourced data and other relational tables are frequent. Shredded data performs better for SQL aggregations, group-by queries, and multi-table joins than XML column queries, at the cost of losing the original document structure.

When to Consider a Native XML Database

Consider a native XML database when the application is document-centric — when the primary operations are document retrieval, XQuery transformation, and full-text search over XML content rather than relational set operations. Content management systems, digital publishing platforms, document archives, and XML-heavy integration hubs are candidates for native XML databases. For most enterprise applications with mixed relational and XML data, a relational database with XML column support is simpler to operate, easier to integrate with existing SQL tooling, and sufficient for the workload.