Nothing Special   »   [go: up one dir, main page]

Skip to main content

An OODBMS-IRS Integration Based on a Statistical Corpus Extraction Method for Document Management

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 1999)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1677))

Included in the following conference series:

  • 441 Accesses

Abstract

The maintenance cost is a critical issue for the success of integrating database and information retrieval systems (IRS). For a robust integration of search engines, the signature file filter can effectively eliminate the mainte-nance cost and offer a more natural fit between the database and text retrieval systems. Extending the usability of merged database and signature based text-retrieval systems by building on an object-oriented database management system (OODBMS) provides better and complementary advantages to both data-bases and information retrieval systems (IRSs). In this paper, we present a new approach for integrating OODBMSs and IRSs that maintains the flexibility and avoids overheads of mapping process, by means of encapsulating the documents and signature based IR methods into storable objects which are be-ing stored in the database. In addition, we develop a novel signature file ap-proach based on a statistical corpus extraction technique, which can effectively reduce false drop probability for text retrieval from the underneath document database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Chang, W.W. and Schek, H.J.: A Signature Access Method for the Starburst Database System, Proceedings of the 15th VLDB Conference, Amsterdam, The Netherlands (1989) 145–153

    Google Scholar 

  2. Chien, Lee-Feng: Fast and Quasi-Natural Language Search for Gigabytes of Chinese Texts, Proceedings of the 18th Annual International ACM SIGIR conference on Research and De-velopment in Information Retrieval, (1995) 112–120

    Google Scholar 

  3. Christophides, V., Abiteboul, S., Cluet, S. and Scholl, M.: From Structured Documents to Novel Query Facilities, Proceedings of the ACM SIGMOD’94, (1994) 313–324

    Google Scholar 

  4. Croft, W.B. Smith, L.A. and Turtle, H.R.: A Loosely-Coupled Integration of a Text Retrieval System and an Object-Oriented Database System. Proc. ACM SIGIR Conference. (1992) 223–231

    Google Scholar 

  5. Faloutsos, C., Access Methods for Text, ACM Computing Surveys, (1985) 49–74

    Google Scholar 

  6. Fuhr, Norbert.: A Probabilistic Relational Model for the Integration of IR and Databases., Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Pittsburgh, (1993) 309–317

    Google Scholar 

  7. Fuhr, Norbert.: Integration of Information Retrieval and Database Systems., Proceedings of the 17th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Dublin, (1994) 360

    Google Scholar 

  8. Lee, W.L. and Woelk, D.: Integration of Text Research with ORION, Database Engineering, Vol. 9. (1991) 58–64

    Google Scholar 

  9. Lee, D.L. et al.: Efficient Signature File Methods for Text Retrieval, IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No 3, (1995) 423–435

    Article  Google Scholar 

  10. Macleod, I.A. and Narine, D.: A Depository for Structured Text Objects. Proc. DEXA’ 95, (1995) 272–282

    Google Scholar 

  11. Schutze, H. Part-of-speech Induction from Scratch, In Proceedings of the ACL’93, (1993) 251–258

    Google Scholar 

  12. Shoens, K. et al.: The Rufus System: Information Organization for Semi-Structured Data, Proceedings of the 19th VLDB Conference, Dublin, Ireland, (1993) 97–107

    Google Scholar 

  13. Stanfill, C. and Kahle, B.: Parallel Free-Text Search on the Connection Machine System, Comm. ACM, Vol. 29, No 12, (1986). 1229–1239

    Article  Google Scholar 

  14. Stonebraker, M., Stettner, H., Lynn, N., Kalash, J. and Guttman, A.: “Document Processing in a Relational Database System.” ACM TOIS, 1(2): (1983) 143–158

    Article  Google Scholar 

  15. Volz, M. et al.: Applying a Flexible OODBMS-IRS-Coupling to Structured Document Handling, Proceedings of the Twelfth International Conference on Data Engineering, New Orleans, Louisiana, USA (1996) 10–19

    Google Scholar 

  16. Yan, T.W. and Annevelink, J.: Integrating a Structured-Text Retrieval System with an Object-Oriented Database System, Proceedings of the 20th VLDB Conference, Santiago, Chile (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, CH., Chien, LF. (1999). An OODBMS-IRS Integration Based on a Statistical Corpus Extraction Method for Document Management. In: Bench-Capon, T.J., Soda, G., Tjoa, A.M. (eds) Database and Expert Systems Applications. DEXA 1999. Lecture Notes in Computer Science, vol 1677. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48309-8_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-48309-8_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66448-2

  • Online ISBN: 978-3-540-48309-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics