Penn DB Group's logo
SHARQ: Sharing Heterogeneous Resources and Queries
Arrow; just used for page layout. People
Arrow, used for page layout Publications
Arrow, used for page layout Research
Arrow, used for page layout Classes
Arrow, used for page layout Seminar
Arrow, used for page layout Resources
   
Search this website

SHARQ

Executive Summary

Over the past decade, biological research has been transformed from a science of the "small" to a science of the "large". Fueled by novel technologies capable of producing massive amounts of data for a single experiment, scientists are faced with an explosion of information which must be rapidly analyzed and combined with other data to form hypotheses and create knowledge. Thus a number of new research challenges have arisen in data modeling and data integration that must be solved to further biological as well as other scientific research.
A major challenge lies in effective sharing of information among collaborating, yet autonomous, parties. They are characterized by having a diversity of perspectives (and hence heterogeneous schemas), dynamic data, and the possibility of intermittent connectivity or participation. The parties are peers in the sense that they are fully autonomous, they contribute and use resources as they choose, and they may join or leave at any point.

SHARQ (Sharing Heterogenous and Autonomous Resources and Queries) aims to develop generic tools and technologies for creating and maintaining confederations whose purpose is distributed data sharing that is, data cooperatives. In response to the difficulties outlined, our solution emphasizes

  • (1) decentralization for both scalability and flexibility,
  • (2) incremental development of resources such as schemas, mappings between different schemas, and queries,
  • (3) rapid discovery mechanisms for finding the resources relevant to a topic, and
  • (4) tolerance for intermittent participation of members and for approximate consistency of mappings.
SHARQ is a collaborative work with two biological partners: the Computational Biology and Informatics Laboratory, leaded by Chris Stoeckert, and the Pew project group leaded by Pete White from the Children hospital of Philadelphia. We propose to develop a specific data cooperative as a biological testbed for evaluating the proposed technologies.

More precisely we introduce briefly two modules of SHARQ: Orcherstra and SHARQ Guide.

The Orchestra system (Ives et al, 2005, Taylor et al, 2006, Green et al, 2006) is the core engine of SHARQ. Orcherstra builds upon concepts from the Piazza peer data management system (PDMS) (Halevy et al, 2004 & 2005). Orchestra supports the exchange of data and updates among cooperating, heterogeneous databases, making use of policies to quickly and automatically manage disagreement among conflicting data.

Knowing what information is available in the peer network may be difficult to determine. SHARQ Guide is therefore being designed to enable biologists to find relevant information within a peer data management system. It provides assistance not only for users who ask queries, but also for owners of peers who wish to be registered within the Guide.
Key ideas of the SHARQ Guide include:
  • (i) Representing biological entities and relationships as a graph, following the approach of BioGuide (Cohen-Boulakia et al, 2005) (http://bioguide-project.net). This graph can be extended in a collaborative way by the peer administrators.
  • (ii) Expressing queries (a) without having to know/cite the schemas to use for querying (transparent queries) and (b) using query schema templates.
  • (iii) Proposing new features to maximize the amount of data returned to the user, by allowing some fields in the query to be optional.
  • (iv) Helping administrators of peers to register their schema.

Links

Some references

  • BioGuideSRS: Querying Multiple Sources with a user-centric perspective [.pdf] 
    BioInformatics (2007)
    Sarah Cohen Boulakia   Olivier Biton   Susan Davidson   Christine Froidevaux   

  • Reconciling while tolerating disagreement in collaborative data sharing [.pdf] 
    Proceedings of ACM SIGMOD International Conference on Management of Data (SIGMOD) (2006)
    Nicholas Taylor   Zachary Ives   

  • Path-based systems to guide life scientists in the maze of biological data sources. [.pdf] 
    Journal of Bioinformatics and Computational Biology 4:5 (October 2006), pp. 1069-1095 (2006)
    Sarah Cohen Boulakia   Susan Davidson   Christine Froidevaux   Zoe Lacroix   Maria-Esther Vidal   

  • SHARQ Guide: Finding relevant biological data and queries in a peer data management system [.pdf] 
    International Workshop on Data Integration in the Life Sciences (DILS), Data Integration for the Life Sciences, Poster proceedings (Selected for oral presentation). (2006)
    Sarah Cohen Boulakia   Olivier Biton   Shirley Cohen   Zachary Ives   Val Tannen   Susan Davidson   

  • Orchestra: Rapid, collaborative sharing of dynamic data [.pdf] 
    Biennial Conference on Innovative Data Systems Research (CIDR) (2005)
    Zachary Ives   Nitin Khandelwal   Aneesh Kapur   Murat Cakir   

  • Schema mediation for large-scale data sharing [.pdf] 
    International Conference on Very Large Databases (VLDB) (2005)
    Alon Halevy   Zachary Ives   Dan Suciu   Igor Tatarinov   

  • A User-centric Framework for Accessing Biological Sources and Tools. [.pdf] 
    International Workshop on Data Integration in the Life Sciences (DILS), Data Integration for the Life sciences, Lecture Notes in Bioinformatics (LNBI), Num. 3615, pp. 3-18. (2005)
    Sarah Cohen Boulakia   Christine Froidevaux   Susan Davidson   

  • Schema Mediation in Peer Data Management Systems [.pdf] 
    International Conference on Data Engineering (ICDE) (2003)
    Alon Halevy   Zachary Ives   Dan Suciu   Igor Tatarinov   

Project Members

Zachary Ives   Val Tannen   Susan Davidson   Sarah Cohen Boulakia   Olivier Biton   Nicholas Taylor   Todd J. Green   Grigoris Karvounarakis   Shirley Cohen   

Funding

This material is based upon work supported by the National Science Foundation under Grants No. 0513778, and 0477972.

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


Levine Hall
3330 Walnut Street
Philadelphia, PA 19104
 

Last update: 08/02/11     Comments