Data Sources

We use the term "data source" instead of "database" a lot when talking about K2, and there's a good reason for this. Everyone has their own idea of the best way to store and represent data. Some rely on commercial database systems like Sybase and Oracle to store their data, some use spreadsheet programs like Excel, some simply dump them into structured (or sometimes unstructured) flat files. There are also many ways to make these data available, from customized visualization tools and standalone applications, to loosely structured Web pages and ASCII text files, to a simple direct connection to a database. Finally, data sources may reside on remote servers, requiring a distributed approach to query processing. Part of the challenge of mediation is finding ways to access all of these sources, and transforming them into a useful representation.

The primary application area for K2 to this point has been Bioinformatics. In this area there are many, many sources of data, but a very small number of them reside in traditional relational, or even object-oriented, databases. Most of the information out there is kept in "home-grown" systems that have very limited query capabilities. In addition, a lot of information can be gained by running standalone data analysis programs, and by surfing the Web. Here are some examples of the different types of data sources to which we have connected K2:

