Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A Knowledge-Based Approach to Effective Document Retrieval

  • Published:
Journal of Systems Integration

Abstract

This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, L. A. Rossi, and C. Saraceno, “ToCAI: a framework for indexing and retrieval of multimedia documents,” in Proceedings of the 10th International Conference on Image Analysis and Processing, Venice, Italy, pp. 1027–1032, 1999.

  2. E. Appiani, L. Boato, S. Bruzzo, A.M. Colla, M. Davite, and D. Sciarra, “STRETCH: a system for document storage and retrieval by content,” in Proceedings of the 10th International Workshop on Database & Expert Systems Applications, Florence, Italy, pp. 588–592, 1999.

  3. R. Baeza-Yates and G. Navarro, “Block addressing indices for approximate text retrieval,” Journal of the American Society for Information Science 51, pp. 69–82, 2000.

    Google Scholar 

  4. E. Bertino, B. Catania, B. Black, J. McNaught, F. Rinaldi, A. Brasher, D. Deavin, A. Persidis, V. Candela, F. Esposito, G. Semeraro, and G. P. Zarri, “CONCERTO: Conceptual indexing, querying and retrieval of digital documents,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 1106–1109, 1999.

  5. A. Celentano, M. Fugini, and S. Pozzi, “Knowledgebased document retrieval in office environments: the kabiria system,” ACM Transactions on Office Information Systems 13, pp. 237–268, 1995.

    Google Scholar 

  6. C. Chang and C. Hsu, “Enabling concept-based relevance feedback for information retrieval on the WWW.” IEEE Transactions on Knowledge and Data Engineering 11, pp. 595–609, 1999.

    Google Scholar 

  7. H. Chen, “Knowledge-based document retrieval: framework and design,” Journal of Information Science: Principles & Practice (Amsterdam) 18, pp. 293–314, 1992.

    Google Scholar 

  8. S. Chen and Y. Horng, “Fuzzy query processing for document retrieval based on extended fuzzy concept networks,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29, pp. 96–104, 1999.

    Google Scholar 

  9. S. Chen and J. Wang, “Document retrieval using knowledge-based fuzzy information retrieval techniques,” IEEE Transactions on Systems, Man and Cybernetics 25, pp. 793–803, 1995.

    Google Scholar 

  10. W. W. Chu, C. C. Hsu, A. F. Cárdenas, and R. K. Taira, “Knowledge-based image retrieval with spatial and temporal constructs,” IEEE Transactions on Knowledge and Data Engineering 10, pp. 872–888, 1998.

    Google Scholar 

  11. J. F. Cullen, J. J. Hull, and P. E. Hart, “Document image database retrieval and browsing using texture analysis,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 718–721, 1997.

  12. M. Cutler, H. Deng, S. S. Maniccam, and W. Meng, “New study on using html structures to improve retrieval,” in Proceedings of IEEE International Conference on Tools with Artificial Intelligence, Chicago, IL, USA, pp. 406–409, 1999.

  13. Y. Dong, “A more efficient document retrieval method for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 2001.

    Google Scholar 

  14. X. Fan, “Knowledge-based document filing for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1998.

    Google Scholar 

  15. X. Fan, Q. Liu, and P. Ng, “An automated document filing systems,” Journal of Systems Integration 9, pp. 223–262, 1999.

    Google Scholar 

  16. X. Fan, F. Sheng, and P. Ng, “DOCPROS: a knowledge-based personal document management system,” in Proceedings of the 10th International Workshop on Database and Expert Systems Applications, Florence, Italy, pp. 527–531, 1999.

  17. X. Fan, F. Sheng, S. Doong, P. Ng, and C. Wei, “A process for constructing a personal folder organization,” in Proceedings of the International Workshop on Multimedia Database, Dayton, Ohio, pp. 20–27, 1998.

  18. L. Gravano, H. García-Molina and A. Tomasic, “GIOSS: text-source discovery over the internet,” ACM Transactions on Database Systems 24, pp. 229–264, 1999.

    Google Scholar 

  19. X. Hao, “Automatic office document classification and information extraction,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1995.

    Google Scholar 

  20. Y. He, Z. Jiang, B. Liu, and H. Zhao, “Content-based indexing and retrieval method of Chinese document images,” in Proceedings of the Fifth International Conference on Document Analysis and Recognition, Bangalore, India, pp. 685–688, 1998.

  21. J. Horng and C. Yeh, “Applying genetic algorithms to query optimization in document retrieval.” Information Processing and Management 36, pp. 737–759, 2000.

    Google Scholar 

  22. J. Hu, “Knowledge management for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1999.

    Google Scholar 

  23. S. C. Hui and A. Goh, “Incorporating fuzzy logic with neural networks for document retrieval,” Engineering Applications of Artificial Intelligence 9, pp. 551–560, 1996.

    Google Scholar 

  24. P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast and effective retrieval of medical tumor shapes,” IEEE Transactions on Knowledge and Data Engineering 10, pp. 889–904, 1998.

    Google Scholar 

  25. P. Lambrix and N. Shahmehri, “Towards creating a knowledge base for world-wide web documents,” in Proceedings of the International Conference on Intelligent Information Systems, Grand Bahama Island, Bahamas, 1997, pp. 507–511.

  26. X. Li, J. Hu, X. Fan, C. Y. Wang, and P. A. Ng, “Automated document filing and retrieval system: an overview,” in Proceedings of the Third Biennial World Conference on Integrated Design and Process Technology, Berlin, Germany, 1998, pp. 231–241.

  27. X. Li, “Automatic document classification and extraction system (ADoCES),” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1999.

    Google Scholar 

  28. J. H. Lim, “Learning visual keywords for content-based retrieval,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 169–173, 1999.

  29. Q. Liu, “An office document system with the capability of processing incomplete and vague queries,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1994.

    Google Scholar 

  30. Q. Liu and P. Ng, “A browser of supporting vague query processing in an office document system,” Journal of Systems Integration 5, pp. 61–82, 1995.

    Google Scholar 

  31. Q. Liu and P. Ng, Document Processing and Retrieval: Text Processing. Kluwer Academic Publishers: Norwell, Massachusetts, 1996.

    Google Scholar 

  32. R. Marega and M. T. Pazienza, “CoDHIR: An information retrieval system based on semantic document representation,” Journal of Information Science 20, pp. 399–412, 1994.

    Google Scholar 

  33. P. Martin and P. W. Eklund, “Knowledge retrieval and the world wide web,” IEEE Intelligent Systems 15, pp. 18–25, 2000.

    Google Scholar 

  34. M. Mechkour, P. Mulhem, F. Fourel, and E. F. C. Berrut, “PRIME-GC: A medical information retrieval prototype on the Web,” in Proceedings of the 7th International Workshop on Research Issues in Data Engineering, Birmingham, UK, pp. 2–9, 1997.

  35. R. M. Rohrer, J. L. Sibert, and D. S. Ebert, “A shape-based visual interface for text retrieval,” IEEE Computer Graphics and Applications 19, pp. 40–46, 1999.

    Google Scholar 

  36. P. O'Neil, “An incremental approach to text representation, categorization and retrieval,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 714–717, 1997.

  37. M. Ortega-Binderberger, S. Mehrotra, K. Chakrabarti and K. Porkaew, “WebMARS: A multimedia search engine,” in Proceedings of the International Society for Optical Engineering on Internet Imaging, San Jose, CA, USA, 2000, pp. 314–321.

  38. E. Ozkarahan, “Multimedia document retrieval,” Information Processing and Management 31, pp. 113–131, 1995.

    Google Scholar 

  39. U. Schiel, I. M. S. F. Sousa, and E. Ferneda, “SIM—a system for semi-automatic indexing of multilingual documents,” in Proceedings of the 10th International Workshop on Database & Expert Systems Applications, Florence, Italy, pp. 577–581, 1998.

  40. F. Sheng, “Knowledge-based document retrieval with application to TEXPROS,” Ph.D. Dissertation, New Jersey Institute of Technology, Newark, New Jersey, 2001.

    Google Scholar 

  41. A. F. Smeaton and A. L. Spitz, “Using character shape coding for information retrieval,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 974–978, 1997.

  42. D. Skuce, “Integrating web-based documents, shared knowledge bases, and information retrieval for user help,” Computational Intelligence 16, pp. 95–113, 2000.

    Google Scholar 

  43. C. Wang, “An intelligent browser for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1998.

    Google Scholar 

  44. C. Wei, “Knowledge discovering for document classification using tree matching in TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1996.

    Google Scholar 

  45. C. Wei, Q. Liu, J. Wang, and P. Ng, “Knowledge discovering for document classification using tree matching in TEXPROS,” Information Sciences 100, pp. 255–310, 1997.

    Google Scholar 

  46. L. Wilcox and J. Boreezky, “Annotation and segmentation for multimedia indexing and retrieval,” in Proceedings of the 31st Hawaii International Conference on System Sciences, Kohala Coast, HI, pp. 259–266, 1998.

  47. Z. Zhu, Q. Liu, J. McHugh, and P. Ng, “A predicate driven document filing system,” Journal of Systems Integration 6, pp. 373–403, 1996.

    Google Scholar 

  48. Z. Zhu, J. McHugh, J. Wang, and P. Ng, “A formal approach to modeling office information systems,” Journal of Systems Integration 4, pp. 373–403, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheng, F., Fan, X., Thomas, G. et al. A Knowledge-Based Approach to Effective Document Retrieval. Journal of Systems Integration 10, 411–436 (2001). https://doi.org/10.1023/A:1011262119636

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1011262119636

Navigation