Abstract
We describe the design and implementation of the NETBOOK prototype system for collecting, structuring and efficiently creating semantic vectors for concepts, noun phrases, and documents from a corpus of free full text ebooks available on the World Wide Web. Automatic generation of concept maps from correlated index terms and extracted noun phrases are used to build a powerful conceptual index of individual pages. To ensure scalabilty of our system, dimension reduction is performed using Random Projection [13]. Furthermore, we present a complete evaluation of the relative effectiveness of the NETBOOK system versus the Google Desktop [8].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Belkin, N.J., Croft, W.B.: Retrieval techniques. Annual Review of Information Science and Technology 22, 109–145 (1987)
Briggs, G., Shamma, D., Caas, A.J., Carff, R., Scargle, J., Novak, J.D.: Concept Maps Applied to Mars Exploration Public Outreach. In: Caas, A.J., Novak, J.D., Gonzlez, F. (eds.) Concept Maps: Theory, Methodology, Technology, Proceedings of the First International Conference on Concept Mapping. Universidad Pblica de Navarra, Pamplona (2004)
Caas, A.J., Hill, G., Carff, R., Suri, N., Lott, J., Eskridge, T., Gmez, G., Arroyo, M., Carvajal, R.: CmapTools: A Knowledge Modeling and Sharing Environment. In: Caas, A.J., Novak, J.D., Gonzlez, F.M. (eds.) Concept Maps: Theory, Methodology, Technology, Proceedings of the First International Conference on Concept Mapping. Universidad Pblica de Navarra, Pamplona (2004)
Brand, M.: Incremental singular value decomposition of uncertain data with missing values. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 707–720. Springer, Heidelberg (2002)
Broder, A.Z.: On the resemblance and containment of documents. Compression and Complexity of Sequences (1997)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Edward, A.F.: Lexical relations: Enhancing effectiveness of IR systems. ACM SIGIR Forum 15(3), 5–36 (Winter 1980)
Leake, D., Maguitman, A., Reichherzer, T., Caas, A., Carvalho, M., Arguedas, M., Brenes, S., Eskridge, T.: Aiding knowledge capture by searching for extensions of knowledge models. In: Proceedings of KCAP-2003. ACM Press, St. Augustine (2003)
Sahlgren, M.: The Word-Space Model: Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD thesis, Department of Linguistics, Stockholm University (2006)
Salton, G.: The SMART system 1961-1976: Experiments in dynamic document processing. In: Encyclopedia of Library and Information Science, pp. 1–36 (1980)
Van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, Cambridge (2004)
Widdows, D., Ferraro, K.: Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application, Google Code (2009)
Hinrich Schutze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the Web: The Public and their Queries. Journal of the American Society for Information Sciences and Technology 52(3), 226–234
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Daoud, A.M. (2010). Effective Web and Desktop Retrieval with Enhanced Semantic Spaces. In: Kim, Th., Kim, HK., Khan, M.K., Kiumi, A., Fang, Wc., Ślęzak, D. (eds) Advances in Software Engineering. ASEA 2010. Communications in Computer and Information Science, vol 117. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17578-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-17578-7_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17577-0
Online ISBN: 978-3-642-17578-7
eBook Packages: Computer ScienceComputer Science (R0)