Using THE Expat XML
Parser
Converting an XML schema using EXPAT XML Parser
to an internal format so that it can be used to store XML data in relational
tables.
Increasingly, Extensible Markup Language (XML) is considered the format
of choice for the representation and exchange of information among various
applications on the Internet. The popularity of XML can be mostly attributed
to its flexibility for representing many kinds of information. The use of
tags makes XML data self-describing, and the extensible nature of XML makes
it possible to define new kinds of documents for specialized purposes.
From the database perspective, this raises an exciting possibility. With
large amount of data stored in XML documents, it should be possible to query
the contents of these documents. One should be able to issue queries over
sets of XML documents to extract, synthesize, and analyze their contents.
In fact, efficient storage of XML documents is now an active area of research
in the database community.
Cost-based strategies to derive relational configurations for XML applications
have been proposed and shown to provide substantially better configurations
than heuristic methods. The general methodology in these strategies is to
define a set of XML schema transformations that derive different relational
configurations. Given an XML query workload, the quality of the relational
configuration is evaluated by a costing function on the SQL equivalents of
the XML queries. Since the search space is large, greedy heuristics are used
to search through the associated space of relational configurations.
LegoDB is one such cost based XML storage mapping engine. LegoDB leverages
current XML and relational technologies. It models the target application
with an XML Schema, XML data statistics, and an XQuery workload. The space
of configurations is generated through XML-Schema rewritings and the best
among the derived configurations is selected using cost estimates obtained
through a standard relational optimizer.
The code for this application
can be found here.