| Lesson 10 | Querying XML Database |
| Objective | Understand How Relational Databases Store and Query XML Data. |
Databases that handle XML fall into two broad categories. XML-enabled relational databases extend SQL with an XML data type, allowing XML documents to be stored in columns alongside traditional relational data. The database does the conversion and management itself — the developer works with SQL and XML-specific functions without needing middleware. These are hybrid systems where relational and XML data coexist.
Native XML databases treat XML as their core data model. The fundamental unit of storage is an XML document rather than a relational row. Native XML databases are designed for document-oriented workloads where the data is inherently hierarchical and XML-structured — content management systems, digital archives, publishing workflows. They support XPath and XQuery as their primary query languages rather than SQL.
JSON has largely supplanted XML as the preferred format for new applications, APIs, and web services. JSON is more compact, easier to parse in JavaScript environments, and maps naturally to the object models used in modern programming languages. All major relational databases now support JSON column types alongside or in addition to their XML support.
However, XML remains essential in established enterprise environments where decades of systems, standards, and integrations depend on it. Industries including healthcare (HL7, CDA), finance (FIX, FIXML), government (XBRL, UBL), and publishing (DocBook, DITA) continue to use XML as their primary interchange format. Understanding how relational databases handle XML is a practical skill for anyone working with these domains.
xml column type — rather than as plain TEXT, CLOB, or VARCHAR — provides several advantages. The database engine automatically validates that any value stored in an xml column is well-formed XML, rejecting malformed documents at insert time rather than letting bad data accumulate silently. The engine can index XML content for fast XPath and XQuery lookups, just as it indexes values in integer or text columns for fast SQL queries. XML-specific functions for extraction, transformation, and modification become available as part of the SQL query language. And the XML data retains its hierarchical structure while living inside a relational table alongside conventional columns.
xml data type that checks all stored values for well-formedness. PostgreSQL must be compiled with --with-libxml for the type to be available — most PostgreSQL distributions include this by default. Declaring an xml column in a CREATE TABLE statement:
CREATE TABLE ProductCatalog (
ProductID INT NOT NULL,
ProductName VARCHAR(200) NOT NULL,
Specs xml,
CONSTRAINT pk_product PRIMARY KEY (ProductID)
);
The Specs column stores an XML document for each product — technical specifications, attributes, or any hierarchically structured data that does not fit cleanly into relational columns.
XML values are inserted as string literals using the XMLPARSE function or by casting a string to the xml type. PostgreSQL validates well-formedness at insert time:
INSERT INTO ProductCatalog (ProductID, ProductName, Specs)
VALUES (
1,
'Widget A',
XMLPARSE(DOCUMENT '<specs>
<weight unit="kg">0.5</weight>
<color>blue</color>
<voltage>12</voltage>
</specs>')
);
PostgreSQL provides the xpath() function for evaluating XPath expressions against XML column values, and XMLTABLE for shredding XML into relational result sets:
-- Extract the color element value from Specs using xpath()
SELECT ProductName,
(xpath('//color/text()', Specs))[1]::text AS Color
FROM ProductCatalog
WHERE ProductID = 1;
-- Shred XML into rows and columns using XMLTABLE
SELECT p.ProductName, x.Weight, x.Color
FROM ProductCatalog p,
XMLTABLE('//specs' PASSING p.Specs
COLUMNS
Weight text PATH 'weight',
Color text PATH 'color'
) x;
xml data type in SQL Server 2005. An xml column can be either untyped — accepting any well-formed XML without schema validation — or typed — associated with an XML Schema Collection that validates documents against a registered schema definition. Untyped XML is the most common choice for general-purpose storage; typed XML is used when strict document validation is required.
-- Untyped xml column
CREATE TABLE ProductCatalog (
ProductID INT NOT NULL PRIMARY KEY,
ProductName VARCHAR(200) NOT NULL,
Specs xml
);
-- Typed xml column — requires a registered XML Schema Collection
CREATE TABLE ProductCatalog (
ProductID INT NOT NULL PRIMARY KEY,
ProductName VARCHAR(200) NOT NULL,
Specs xml(SpecsSchemaCollection)
);
xml column. These methods allow SQL queries to extract values, test for element existence, and return XML fragments:
-- .value() extracts a scalar value from the XML
SELECT ProductName,
Specs.value('(/specs/color)[1]', 'VARCHAR(50)') AS Color,
Specs.value('(/specs/weight)[1]', 'DECIMAL(5,2)') AS Weight
FROM ProductCatalog
WHERE ProductID = 1;
-- .exist() tests whether an XPath expression matches any node
SELECT ProductName
FROM ProductCatalog
WHERE Specs.exist('/specs/voltage') = 1;
-- .query() returns an XML fragment matching the XPath expression
SELECT ProductName,
Specs.query('/specs/color') AS ColorFragment
FROM ProductCatalog;
.modify() method, which allows inserting, replacing, and deleting nodes within a stored XML document without replacing the entire document:
-- Update the color element value within the stored XML
UPDATE ProductCatalog
SET Specs.modify('replace value of (/specs/color/text())[1] with "red"')
WHERE ProductID = 1;
XMLType in Oracle 9i as a system-defined object type for native XML handling. An XMLType column stores complete XML documents and provides member functions for querying, transformation, and validation:
CREATE TABLE ProductCatalog (
ProductID NUMBER NOT NULL,
ProductName VARCHAR2(200) NOT NULL,
Specs XMLType,
CONSTRAINT pk_product PRIMARY KEY (ProductID)
);
XMLType columns. Binary XML storage stores documents in a compact binary format optimized for XQuery and XPath access. Object-relational storage (also called structured storage) decomposes the XML into relational tables based on an XML schema, enabling efficient SQL-based access to individual elements. Unstructured storage saves the XML as a CLOB — the simplest option but without the query optimization benefits of the other models.
XMLIndex creates a specialized index on the content of XMLType columns, enabling fast XPath and XQuery evaluation without scanning every document:
-- Extract a value using XMLQuery
SELECT ProductName,
XMLQuery('/specs/color/text()' PASSING Specs RETURNING CONTENT).getStringVal() AS Color
FROM ProductCatalog
WHERE ProductID = 1;
XML data type that stores documents in a parsed, hierarchical format — not as raw text — enabling efficient XQuery and XPath access. Db2's XML support was formerly marketed as "pureXML," a branding term for its native XML storage engine:
CREATE TABLE ProductCatalog (
ProductID INT NOT NULL,
ProductName VARCHAR(200) NOT NULL,
Specs XML,
PRIMARY KEY (ProductID)
);
-- XPath expressions (used inside SQL functions like xpath(), .value(), XMLQuery())
/specs/color -- selects the color element directly under specs
/specs/color/text() -- selects the text content of the color element
//weight -- selects any weight element anywhere in the document
/specs/weight[@unit] -- selects weight elements that have a unit attribute
/specs/* -- selects all child elements of specs
(: XQuery FLWOR expression — conceptual structure :)
for $item in /catalog/product
let $color := $item/specs/color
where $item/specs/voltage = "12"
order by $item/name
return <result>{$item/name/text()}: {$color/text()}</result>
The for clause iterates over a sequence of nodes; let binds a variable to an expression; where filters; order by sorts; and return constructs the output. FLWOR expressions can be nested and composed, enabling complex transformations of XML data.
xml column when an application stores documents that are inherently hierarchical and variable in structure — product specifications, order payloads, configuration files, medical records — alongside relational data about the same entities. The XML column stores the document intact while relational columns store the attributes that need to be indexed, filtered, and joined with conventional SQL. This hybrid approach is the most common pattern in enterprise applications: the customer's name, ID, and status live in relational columns; their preferences or history XML lives in an XML column.