DB&IR: both sides now

G Weikum - Proceedings of the 2007 ACM SIGMOD international …, 2007 - dl.acm.org
Proceedings of the 2007 ACM SIGMOD international conference on Management of …, 2007dl.acm.org
Database systems (DB) and information retrieval (IR) are two separate fields of computer
science by historical accident. Both study concepts, models, and computational methods for
managing large amounts of complex information, but thirty or forty years ago they started
with very different application areas as major motivations and technology drivers: accounting
systems (online reservations, banking, etc.) for DB, and library systems (bibliographic
catalogs, patent collections, etc.) for IR. Thus, the two directions and their research …
Database systems (DB) and information retrieval (IR) are two separate fields of computer science by historical accident. Both study concepts, models, and computational methods for managing large amounts of complex information, but thirty or forty years ago they started with very different application areas as major motivations and technology drivers: accounting systems (online reservations, banking, etc.) for DB, and library systems (bibliographic catalogs, patent collections, etc.) for IR. Thus, the two directions and their research communities emphasized very different aspects of information management: data consistency, precise query processing, and efficiency on the DB side [53], and text understanding, statistical ranking models, and user satisfaction on the IR side [35, 47]. Decades later, there is now rapidly growing awareness of the needs for integrating DB and IR technologies [3, 7, 14]. There have been various attempts of addressing this integration already ten years ago (eg,[20, 29, 49]), but only recently important killer applications are emerging with really strong desire for an integrated DB&IR platform. From an IR viewpoint, digital libraries of all kinds are becoming very rich information repositories with documents augmented by metadata and annotations captured in semistructured data formats like XML [26]; enterprise search on intranet data can be seen as a specific variant of this theme. From a
ACM Digital Library