Abstract
This paper presents a knowledge-based approach to effective document retrieval. This approach is based on a dual document model that consists of a document type hierarchy and a folder organization. A predicate-based document query language is proposed to enable users to precisely and accurately specify the search criteria and their knowledge about the documents to be retrieved. A guided search tool is developed as an intelligent natural language oriented user interface to assist users formulating queries. Supported by an intelligent question generator, an inference engine, a question base, and a predicate-based query composer, the guided search collects the most important information known to the user to retrieve the documents that satisfy users' particular interests. A knowledge-based query processing and search engine is devised as the core component in this approach. Algorithms are developed for the search engine to effectively and efficiently retrieve the documents that match the query.
Similar content being viewed by others
References
N. Adami, A. Bugatti, A. Corghi, R. Leonardi, P. Migliorati, L. A. Rossi, and C. Saraceno, “ToCAI: a framework for indexing and retrieval of multimedia documents,” in Proceedings of the 10th International Conference on Image Analysis and Processing, Venice, Italy, pp. 1027–1032, 1999.
E. Appiani, L. Boato, S. Bruzzo, A.M. Colla, M. Davite, and D. Sciarra, “STRETCH: a system for document storage and retrieval by content,” in Proceedings of the 10th International Workshop on Database & Expert Systems Applications, Florence, Italy, pp. 588–592, 1999.
R. Baeza-Yates and G. Navarro, “Block addressing indices for approximate text retrieval,” Journal of the American Society for Information Science 51, pp. 69–82, 2000.
E. Bertino, B. Catania, B. Black, J. McNaught, F. Rinaldi, A. Brasher, D. Deavin, A. Persidis, V. Candela, F. Esposito, G. Semeraro, and G. P. Zarri, “CONCERTO: Conceptual indexing, querying and retrieval of digital documents,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 1106–1109, 1999.
A. Celentano, M. Fugini, and S. Pozzi, “Knowledgebased document retrieval in office environments: the kabiria system,” ACM Transactions on Office Information Systems 13, pp. 237–268, 1995.
C. Chang and C. Hsu, “Enabling concept-based relevance feedback for information retrieval on the WWW.” IEEE Transactions on Knowledge and Data Engineering 11, pp. 595–609, 1999.
H. Chen, “Knowledge-based document retrieval: framework and design,” Journal of Information Science: Principles & Practice (Amsterdam) 18, pp. 293–314, 1992.
S. Chen and Y. Horng, “Fuzzy query processing for document retrieval based on extended fuzzy concept networks,” IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29, pp. 96–104, 1999.
S. Chen and J. Wang, “Document retrieval using knowledge-based fuzzy information retrieval techniques,” IEEE Transactions on Systems, Man and Cybernetics 25, pp. 793–803, 1995.
W. W. Chu, C. C. Hsu, A. F. Cárdenas, and R. K. Taira, “Knowledge-based image retrieval with spatial and temporal constructs,” IEEE Transactions on Knowledge and Data Engineering 10, pp. 872–888, 1998.
J. F. Cullen, J. J. Hull, and P. E. Hart, “Document image database retrieval and browsing using texture analysis,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 718–721, 1997.
M. Cutler, H. Deng, S. S. Maniccam, and W. Meng, “New study on using html structures to improve retrieval,” in Proceedings of IEEE International Conference on Tools with Artificial Intelligence, Chicago, IL, USA, pp. 406–409, 1999.
Y. Dong, “A more efficient document retrieval method for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 2001.
X. Fan, “Knowledge-based document filing for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1998.
X. Fan, Q. Liu, and P. Ng, “An automated document filing systems,” Journal of Systems Integration 9, pp. 223–262, 1999.
X. Fan, F. Sheng, and P. Ng, “DOCPROS: a knowledge-based personal document management system,” in Proceedings of the 10th International Workshop on Database and Expert Systems Applications, Florence, Italy, pp. 527–531, 1999.
X. Fan, F. Sheng, S. Doong, P. Ng, and C. Wei, “A process for constructing a personal folder organization,” in Proceedings of the International Workshop on Multimedia Database, Dayton, Ohio, pp. 20–27, 1998.
L. Gravano, H. García-Molina and A. Tomasic, “GIOSS: text-source discovery over the internet,” ACM Transactions on Database Systems 24, pp. 229–264, 1999.
X. Hao, “Automatic office document classification and information extraction,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1995.
Y. He, Z. Jiang, B. Liu, and H. Zhao, “Content-based indexing and retrieval method of Chinese document images,” in Proceedings of the Fifth International Conference on Document Analysis and Recognition, Bangalore, India, pp. 685–688, 1998.
J. Horng and C. Yeh, “Applying genetic algorithms to query optimization in document retrieval.” Information Processing and Management 36, pp. 737–759, 2000.
J. Hu, “Knowledge management for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1999.
S. C. Hui and A. Goh, “Incorporating fuzzy logic with neural networks for document retrieval,” Engineering Applications of Artificial Intelligence 9, pp. 551–560, 1996.
P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, “Fast and effective retrieval of medical tumor shapes,” IEEE Transactions on Knowledge and Data Engineering 10, pp. 889–904, 1998.
P. Lambrix and N. Shahmehri, “Towards creating a knowledge base for world-wide web documents,” in Proceedings of the International Conference on Intelligent Information Systems, Grand Bahama Island, Bahamas, 1997, pp. 507–511.
X. Li, J. Hu, X. Fan, C. Y. Wang, and P. A. Ng, “Automated document filing and retrieval system: an overview,” in Proceedings of the Third Biennial World Conference on Integrated Design and Process Technology, Berlin, Germany, 1998, pp. 231–241.
X. Li, “Automatic document classification and extraction system (ADoCES),” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1999.
J. H. Lim, “Learning visual keywords for content-based retrieval,” in Proceedings of the IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 169–173, 1999.
Q. Liu, “An office document system with the capability of processing incomplete and vague queries,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1994.
Q. Liu and P. Ng, “A browser of supporting vague query processing in an office document system,” Journal of Systems Integration 5, pp. 61–82, 1995.
Q. Liu and P. Ng, Document Processing and Retrieval: Text Processing. Kluwer Academic Publishers: Norwell, Massachusetts, 1996.
R. Marega and M. T. Pazienza, “CoDHIR: An information retrieval system based on semantic document representation,” Journal of Information Science 20, pp. 399–412, 1994.
P. Martin and P. W. Eklund, “Knowledge retrieval and the world wide web,” IEEE Intelligent Systems 15, pp. 18–25, 2000.
M. Mechkour, P. Mulhem, F. Fourel, and E. F. C. Berrut, “PRIME-GC: A medical information retrieval prototype on the Web,” in Proceedings of the 7th International Workshop on Research Issues in Data Engineering, Birmingham, UK, pp. 2–9, 1997.
R. M. Rohrer, J. L. Sibert, and D. S. Ebert, “A shape-based visual interface for text retrieval,” IEEE Computer Graphics and Applications 19, pp. 40–46, 1999.
P. O'Neil, “An incremental approach to text representation, categorization and retrieval,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 714–717, 1997.
M. Ortega-Binderberger, S. Mehrotra, K. Chakrabarti and K. Porkaew, “WebMARS: A multimedia search engine,” in Proceedings of the International Society for Optical Engineering on Internet Imaging, San Jose, CA, USA, 2000, pp. 314–321.
E. Ozkarahan, “Multimedia document retrieval,” Information Processing and Management 31, pp. 113–131, 1995.
U. Schiel, I. M. S. F. Sousa, and E. Ferneda, “SIM—a system for semi-automatic indexing of multilingual documents,” in Proceedings of the 10th International Workshop on Database & Expert Systems Applications, Florence, Italy, pp. 577–581, 1998.
F. Sheng, “Knowledge-based document retrieval with application to TEXPROS,” Ph.D. Dissertation, New Jersey Institute of Technology, Newark, New Jersey, 2001.
A. F. Smeaton and A. L. Spitz, “Using character shape coding for information retrieval,” in Proceedings of the 4th International Conference on Document Analysis and Recognition, Ulm, Germany, pp. 974–978, 1997.
D. Skuce, “Integrating web-based documents, shared knowledge bases, and information retrieval for user help,” Computational Intelligence 16, pp. 95–113, 2000.
C. Wang, “An intelligent browser for TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1998.
C. Wei, “Knowledge discovering for document classification using tree matching in TEXPROS,” Ph.D. dissertation, Department of Computer and Information Science, New Jersey Institute of Technology, Newark, New Jersey, 1996.
C. Wei, Q. Liu, J. Wang, and P. Ng, “Knowledge discovering for document classification using tree matching in TEXPROS,” Information Sciences 100, pp. 255–310, 1997.
L. Wilcox and J. Boreezky, “Annotation and segmentation for multimedia indexing and retrieval,” in Proceedings of the 31st Hawaii International Conference on System Sciences, Kohala Coast, HI, pp. 259–266, 1998.
Z. Zhu, Q. Liu, J. McHugh, and P. Ng, “A predicate driven document filing system,” Journal of Systems Integration 6, pp. 373–403, 1996.
Z. Zhu, J. McHugh, J. Wang, and P. Ng, “A formal approach to modeling office information systems,” Journal of Systems Integration 4, pp. 373–403, 1994.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Sheng, F., Fan, X., Thomas, G. et al. A Knowledge-Based Approach to Effective Document Retrieval. Journal of Systems Integration 10, 411–436 (2001). https://doi.org/10.1023/A:1011262119636
Issue Date:
DOI: https://doi.org/10.1023/A:1011262119636