Thesis Proposal Defense: Todd J. Green Levine 612 Wednesday, Nov 12 11am-1pm Advisors: Zack Ives & Val Tannen Committee: Susan Davidson (chair), Sanjeev Khanna, Benjamin Pierce, Jeffrey Naughton (external, University of Wisconsin) Title: Foundations and Applications of Collaborative Data Sharing Abstract: This research will investigate the foundations and applications of collaborative data sharing systems. Collaborative data sharing systems (CDSS) support exchange of data and updates across loosely-coupled, heterogeneous collections of relational databases related by declarative schema mappings. Key requirements in a CDSS include tracking and recording of data provenance, enforcement of provenance-based trust policies, tolerance for data conflicts and inconsistencies, and efficient propagation of updates to data, schemas, and mappings. In the proposed thesis, we develop a notion of data provenance suitable for CDSS, based on semiring-annotated relations. The framework also neatly captures the semantics of positive relational queries on other forms of annotated relations, including SQL-style bag (multiset) relations, why-provenance relations, lineage-annotated relations, and incomplete databases. We describe the design and evaluation of a prototype CDSS called Orchestra that is based on these foundations and supports update exchange with semiring-based data provenance and trust policies. We study the fundamental problems of query containment and equivalence of positive relational queries on annotated relations, which turn out to be highly sensitive to the presence of provenance information, and obtain positive decidability results for several forms of provenance that can be captured using semiring annotations. As a corollary, we also resolve a longstanding open problem by showing that bag-equivalence of positive relational queries is decidable. We consider the problem of answering queries using materialized views in the context of CDSS, and we develop a sound and complete query reformulation algorithm. This leads to a uniform approach to solving three different problems in CDSS: optimizing queries using materialized views, incremental view maintenance, and mapping evolution. Finally, we outline proposed work to implement and evaluate this reformulation algorithm in Orchestra in conjunction with a cost-based query optimizer.