Data Integration/Web Integration
Combining Artificial Intelligence and Databases for Data Integration over the Internet/Web
This research theme is concerned with integrating multiple heterogeneous data sources over the Internet/Web.
We deal with pre-existing and autonomous data sources that have been created independently.
The research lies in the intersection of AI, Databases and Web technology. These data
sources can be (a) databases distributed over the Internet or corporate Intranets; (b)
hidden databases on the deep Web that are accessible via a Web interface only; (c)
semi-structured Web sources, accessible via a wrapper; (d) databases hosted by individual
peers in a peer-to-peer environment. The overall goal of this research to provide easy,
unified access to the underlying data sources enabling efficient and effective data
integration and sharing from these sources.
Our current research topics include:
- Novel algorithms for query reformulation in mediator-based data integration systems -We develop novel algorithms to deal with the problem of missing query rewritings in the presence of various integrity constraints. Our algorithms aim to solve this problem while still retaining the main properties of the existing algorithms.
- Schema matching and mapping - To integrate and share heterogeneous data sources, we need to reconcile their semantic heterogeneity. This typically involves two tasks, schema matching and schema mapping, with the former trying on finding semantic correspondences between attributes of different schemas and the latter aiming to establish and represent relationships between relations of different schemas. Our research is focused on automatic/semi-automatic schema matching and mapping.
- Deep Web data integration - Databases on the Web are accessible via their front-end interfaces only. Our research aims to automatically discover relevant Web databases, extract their interface schemas, and semantically match these schemas so that unified access can be provided to these interfaces integrating data from the underlying databases.
- Peer data management - We aim to develop a new framework for peer-to-peer data management. In this context, we deal with a collection of peers who host their individual databases and agree to share some of their data with each other. The peers have their own schemas that describe the data they provide and can also be used to make queries. Our research involves neighbour discovery, schema matching and mapping between peers, peer introduction, and query answering and routing.
Our research has led to a great number of top-quality publications, including a paper in the International Journal of Artificial Intelligence Tools, an invited paper in the International Journal of Web Intelligence and Agent
Systems, papers in Springer Lecture Notes, top ACM and IEEE conferences. Our research was partly funded by an EC grant in the MISSION project.