Executive Summary

AKIRA is an attempt to offer a unified interface for information retrieval, information extraction and data-manipulation.

Most Web services today only provide information retrieval. The user has to express a boolean expression of keywords which have to be explicitly mentioned in the document he interested in. The user has to read returned documents to extract information. Moreover, it is not possible to combine information from different documents but by hand.

In AKIRA, the user expresses a query (today a OQL-like query, tomorrow a NL query). The evaluation of the query consists in retrieving relevant documents from the Web and extracting and storing information in an object-oriented database (smart-cache) which substitutes for the usual unstructured cache. The tasks of information processing (IR and IE) and data manipulation are thus automatically performed in a transparent way for the user.

The organization of extracted data should not be seen as a constraint the system imposes on the user. AKIRA's smart-cache must be structured according to the user's understanding. The AKIRA system does not provide a rigid global schema, but small schema components representing its extraction capabilities and that can be combined together to match the structure expressed by the user.

When the user sends a query, the system first computes the target structure (i.e. the schema that is expressed by the user in his query) and uses it as a schema for the cache. Of course, there is no magic, and the system is able to answer a query when the structure expressed by the user matches its extraction capabilities. However, the system allows flexibility with a fuzzy-matching module in charge of mapping the user's vocabulary to its own.

Lastly, the user can define the output format of the result. We can imagine using AKIRA to convert Web poorly structured HTML documents into XML pages. This feature has not been fully implemented yet.

AKIRA's Web site

Project Members

Arnaud Sahuguet   Raman Chandrasekar   Zo¨¦ Lacroix   


