Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/646102.681184guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Precision in Processing Data from Heterogeneous Resources (Invited Paper)

Published: 03 July 2000 Publication History

Abstract

Much information is becoming available on the world-wide-web, on Intranets, and on publicly accessible databases. The benefits of integrating related data from distinct sources are great, since it allows the discovery or validation of relationships among events and trends in many areas of science and commerce. But most sources are established autonomously, and hence are heterogeneous in form and content. Resolution of heterogeneity of form has been an exciting research topic for many years now. We can access information from diverse computers, alternate data representations, varied operating systems, multiple database models, and deal with a variety of transmission protocols. But progress in these areas is raising a new problem: semantic heterogeneity. Semantic heterogeneity comes about because the meaning of words depends on context, and autonomous sources are developed and maintained within their own contexts. Types of semantic heterogeneity include spelling variations, use of synonyms, and the use of identically spelled words to refer to different objects. The effect of semantic heterogeneity is not only failure to find desired material, but also lack of precision in selection, aggregation, comparison, etc., when trying to integrate information. While browsing we may complain of 'information overload'. But when trying to automate these processes, an essential aspect of business-oriented operations, the imprecision due to semantic heterogeneity can be become fatal. Manual resolutions to the problem do work today, but it forces businesses to limit the scope of their partnering. In expanding supply chains and globalized commerce we have to deal in many more contexts, but cannot afford manual, case-by-case resolution. In business we become efficient by rapidly carrying out processes on regular schedules. XML is touted as the new universal medium for electronic commerce, but the meaning of the tags identifying data fields remains context dependent. Attempting a global resolution of the semantic mismatch is futile. The number of participants is immense, growing, and dynamic. Terminology changes, and must be able to change as our knowledge grows. Using precise, finely differentiated terms and abbreviations is important for efficiency within a domain, but frustrating to outsiders. In this paper we indicate research directions to resolve inconsistencies incrementally, so that we may be able to interoperate effectively in the presence of inter-domain inconsistencies. This work is an early stage, and will provide research opportunities for a range of disciplines, including databases, artificial intelligence, and formal linguistics. We also sketch an information systems architecture which is suitable for such services and their infrastructure. Research issues in managing complexity of multiple services arise here as well. The conclusion of this paper can be summarized as stating that today, and even more in the future, precision and relevance will be more valuable than completeness and recall. Solutions are best composed from many small-scale efforts rather than by overbearing attempts at standardization. This observation will, in turn, affect research directions in information sciences.

References

[1]
Neal Coulter, et al: ACM Computing Classification System http://www.acm.org/class
[2]
Adobe Corporation: PDF and Printing; http://www.adobe.com/prodindex/postscript/pdf.html
[3]
Art museum image consortium (AMICO) http://www.amico.net/docs/vra
[4]
C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz: The HARVEST Information Discovery and Access System"; Proceedings of the Second International World Wide Web Conference, Chicago, Illinois, October 1994, pp 763--771.
[5]
Chen-Chuan K. Chang, Hector Garcia-Molina, Andreas Paepcke : Boolean Query Mapping Across Heterogeneous Information Sources; IEEE Transactions on Knowledge and Data Engineering; Vol. 8 no., pp. 515-521, Aug., 1996.
[6]
Anthony Chavez and Pattie Maes: "Kasbah: An Agent Marketplace for Buying and Selling Goods'; First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, London, UK, April 1996.
[7]
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J.Widom: The TSIMMIS Project: Integration of Heterogeneous Information Sources; IPSJ Conference, Tokyo Japan, 1994.
[8]
Peter P.S. Chen: The Entity-Relationship Model --- Toward a Unified View of Data; ACM Transactions on Database Systems, March 1976.
[9]
J.J. Cimino: "Review paper: coding systems in health care"; Methods of Information in Medicine, Schattauer Verlag, Stuttgart Germany, Vol. 35 Nos. 4-5, Dec. 1996, pp. 273-284.
[10]
C. Collet, M. Huhns, and W-M. Shen: "Resource Integration Using a Large Knowledge Base in CARNOT"; IEEE Computer, Vol. 24 No. 12, Dec. 1991.
[11]
Dan Connolly (ed.): XML: Principles, Tools, and Techniques; O'Reilly, 1997.
[12]
R. ElMasri and G. Wiederhold: Data Model Integration Using the Stuctural Model; ACM SIGMOD Conf. On the Management of Data, May 1979, pp. 191-202.
[13]
L. Gravano, H. Garcia-Molina, and A. Tomasic: "Precision and Recall of GlOSS Estimators for Database Discovery"; Parallel and Distributed Information Systems, 1994.
[14]
Stathes Hadjiefthymiades and Lazaros Merakos: "A Survey of Web Architectures for Wireless Communication Environments"; Computer Networks and ISDN Systems, Vol. 28, May 1996, p. 1139, http://www.imag.fr/Multimedia/www5cd/www139/overview.htm.
[15]
Scott Hamilton: Taking Moore's Law into the Next Century; IEEE Computer, Jan. 99, pp. 43-48.
[16]
Marty Hearst: "Interfaces for Searching the Web"; in {SA:97}.
[17]
Michael Huhns and J. Singh: Readings in Agents; Morgan Kaufmann, October, 1997, pp. 185-196.
[18]
Betsy Humphreys and Don Lindberg: "The UMLS project: Making the conceptual connection between users and the information they need"; Bulletin of the Medical Library Association, 1993, see also http://www.lexical.com
[19]
Inktomi and NEC: Size of the Web; http://www.inktomi.com/webmap/, 17Jan2000).
[20]
Jan Jannink, Pichai Srinivasan, Danladi Verheijen, and Gio Wiederhold: "Encapsulation and Composition of Ontologies"; Proc. AAAI Workshop on Information Integration, AAAI Summer Conference, Madison WI, July 1998.
[21]
Th. Jelassi, H.-S. Lai: CitiusNet: The Emergence of a Global Electronic Market, INSEAD, The European Institue of Business Administration, Fontainebleau, France; http://www.simnet.org/public/programs/capital/96paper/paper3/3.html; Society for Information Management, 1996.
[22]
Robert E. Kent: Ontology Markup Language; http://wave.eecs.wsu.edu/CKRMI/OML.html, Feb. 1999.
[23]
Steven P. Ketchpel, Hector Garcia-Molina, Andreas Paepcke: Shopping Models: A Flexible Architecture for Information Commerce; Digital Libraries '97, ACM 1997.
[24]
Y. Labrou and Tim Finin: A Semantics Approach for KQML, a general Purpose Language for Software Agents; Proc. CIKM 94, ACM, 1994.
[25]
Thomas Langer: "MeBro - A Framework for Metadata-Based Information Mediation"; First International Workshop on Practical Information Mediation and Brokering, and the Commerce of Information on the Internet, Tokyo Japan, September 1998, http://context.mit.edu/imediat98/paper2/
[26]
D. Lenat and R.V. Guha: Building Large Knowledge-Based Systems; Addison-Wesley (Reading MA), 372 pages.
[27]
Peter Lockeman et al.: "The Network as a Global Database: Challenges of Interoperability, Proactivity, Interactiveness, Legacy"; Proc. 23 VLDB, Athens Greece, Morgan Kaufman, Aug. 1997.
[28]
Clifford Lynch: "Searching the Internet"; in {SA:97}.
[29]
David Mark et al.: "Geographic Information Science: Critical Issues in an Emerging Cross-Disciplinary Research Domain"; NCGIA, Feb. 1999, http://www.geog.buffalo.edu/ncgia/workshopreport.html.
[30]
H.E. McEwen (ed): Management of Data Elements in Information Processing; NTIS, US. Dept. of Commerce pub. 74-10700, 1974.
[31]
Prasenjit Mitra, Gio Wiederhold, and Martin Kersten: "A Graph-oriented Model for Articulation of Ontology Interdependencies"; in Zaniolo, Locckeman, chll and Grust: Advances in Database Technology -- EDBT 2000, Springer Verlag LNCS Vol. 1777, March 2000, pp. 86-100.
[32]
Motion Picture Group: Proposed standard for Video Metadata, MPEG7; www.cselt.it/mpeg, 2000.
[33]
D. Ponceleon, S. Srinivashan, A. Amir, D. Petkovic, D. Diklic: "Key to Effective Video Retrieval: Effective Cataloguing and Browsing"; Proc. of ACM Multimedia '98 Conference, September 1998.
[34]
Paul Resnick "Filtering Information on the Internet"; in {SA:97}.
[35]
N.F. Roy and C.D. Hafner: "The State of the Art in Ontology Design"; AI Magazine, 1997, Vol. 18 No. 3, pp. 53--74.
[36]
Scientific American Editors: The Internet: Fulfilling the Promise; Scientific American March 1997.
[37]
C.E. Shannon and W. Weaver: The Mathematical Theory of Computation;1948, reprinted by The Un. Illinois Press, 1962.
[38]
Richard T. Snodgrass (editor): The TSQL2 Temporal Query Language; Kluwer Academic Publishers, 1995.
[39]
Gary Stix: "Finding Pictures"; in {SA:97}.
[40]
James Z. Wang, Gio Wiederhold, and Jia Li: "Wavelet-based Progressive Transmission and Security Filtering for Medical Image Distribution"; in Stephen Wong (ed.): Medical Image Databases; Kluwer publishers, 1998, pp. 303-324.
[41]
Gio Wiederhold, Gio: "Mediators in the Architecture of Future Information Systems"; IEEE Computer, March 1992, pages 38-49.
[42]
Gio Wiederhold, Sushil Jajodia, and Witold Litwin: Integrating Temporal Data in a Heterogenous Environment; in Tansel, Clifford, Gadia, Jajodia, Segiv, Snodgrass: Temporal Databases Theory, Design and Implementation; Benjamin Cummins Publishing, 1993, pp. 563-579.
[43]
Gio Wiederhold: "Customer Models for Effective Presentation of Information"; Position Paper, Flanagan, Huang, Jones, Kerf (eds): Human-Centered Systems: Information, Interactivity, and Intelligence, National Science Foundation, July 1997, pp. 218-221.
[44]
Gio Wiederhold and Michael Genesereth: "The Conceptual Basis for Mediation Services"; IEEE Expert, Intelligent Systems and their Applications, Vol. 12 No. 5, Sep-Oct. 1997.
[45]
Gio Wiederhold: "Weaving Data into Information"; Database Programming and Design; Freeman pubs, Sept. 1998.
[46]
Gio Wiederhold: Trends in Information Technology; report to JETRO.MITI, currently available as http://www-db.stanford.edu/pub/gio/1999/miti.htm.

Cited By

View all
  • (2001)Evolution support in large-scale interoperable systemsProceedings of the 12th Australasian database conference10.5555/545538.545558(161-168)Online publication date: 29-Jan-2001

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
BNCOD 17: Proceedings of the 17th British National Conferenc on Databases: Advances in Databases
July 2000
224 pages
ISBN:3540677437

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 July 2000

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2001)Evolution support in large-scale interoperable systemsProceedings of the 12th Australasian database conference10.5555/545538.545558(161-168)Online publication date: 29-Jan-2001

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media