XPath is a W3C standard that plays a crucial role in several
influential query, transformation, and schema standards for XML.
Motivated by the larger challenge of XML query optimization, we
investigate the problem of containment of XPath expressions under
integrity constraints that are in turn formulated with the help of
XPath expressions. Our core formalism consists of a fragment of XPath
that we call {\em simple} and a corresponding class of of integrity
constraints that we call {\em simple XPath integrity constraints\/}
(\sxic). \sxic's can express many database-style constraints,
including key and foreign key constraints specified in the XML Schema
standard proposal, as well as many constraints implied by DTDs. We
identify a subclass of {\em bounded}\ \sxic's under which containment of
simple XPath expressions is {\em decidable},
but we show that even modest use of unbounded \sxic's
makes the problem {\em undecidable}. In particular, the addition of
(unbounded) constraints implied by DTDs leads to undecidability.
We give tight $\Pi_2^p$ bounds for the simple XPath containment
problem and tight NP bounds for the disjunction-free subfragment,
while even identifying a PTIME subcase. We also show that decidability of
containment under \sxic's still holds if the expressions contain
certain additional features (e.g.., wildcard) although the complexity
jumps to $\Pi_2^p$ even for the disjunction-free subfragment.
We know that our results can be extended to some but not all of the
XPath features that depend on document order. The decidability of
containment of simple XPath expressions in the presence of DTDs only
remains open (although we can show that the problem is
PSPACE-hard) as well as the problem for full-fledged XPath
expressions, even in the absence of integrity constraints.
The talk is based on the one given at KRDB'01 in Rome, and is joint work
with Val Tannen.