Queries on tree-structured data have received much research attention
recently. This is due to the important role they play in
enabling technologies such as XML on one hand and complex application
areas such as data integration and computational linguistics on the
other.
In this talk, I present recent results on the efficient evaluation of
queries on tree-structured data, focusing on popular query
languages such as conjunctive and first-order queries, XPath, and XQuery.
Both (complexity-) theoretic and algorithmic results will be presented;
the latter have been shown to be of immediate usefulness in practical
systems.
I start with structural properties of tree-like data and present a
dichotomy theorem for conjunctive queries over trees. This
result characterizes the tractable classes of conjunctive queries in
terms of tree structure relations.
I then move on to structural properties of queries and their use in
obtaining efficient query processing algorithms. I present the
first polynomial-time algorithm for XPath processing and will then
discuss the syntactic properties of XPath this algorithm is based on. I
also discuss lower bounds and the parallel complexity of XPath queries,
and derive predictions about XPath processing algorithms from
these. Furthermore, I give a lower bound from communication complexity
for streaming XPath algorithms.
Finally, I study the substantial complexity of XQuery, the most popular
tree-transformation query language, and present a natural fragment,
composition-free XQuery, that can be evaluated with reasonable
efficiency while requiring no loss of power to formulate queries.
The talk concludes with a comparison of query languages on trees in
terms of their expressive power, complexity, and succinctness.
Christoph Koch did his doctoral research at CERN, a high energy physics
research laboratory near Geneva, Switzerland, and received his PhD in
2001. He was a postdoctoral researcher first at TU Vienna and later at
the University of Edinburgh. From 2003 to 2005 he was on the faculty of
TU Vienna and obtained his Habilitation in 2004. Since April 2005 he is
professor (W2) and Chair of Information Systems at Saarland University,
Saarbruecken, Germany. He has authored or co-authored about 50
publications in the areas of data management and Artificial
Intelligence, two of which won best paper awards at PODS 2002 and ICALP
2005. He co-chaired DBPL 2005 and is on the editorial board of ACM
Transactions on Internet Technology. His current research interests are
in database systems and database theory, in particular in queries on
tree-structured data, XML, data stream processing, managing incomplete
information, efficient query evaluation and query optimization, visual
query languages, and scientific databases.