Abstract
Information in a deep Web source can be accessed through queries submitted on its query interface. Many Web applications need to interact with the query interfaces of deep Web sources such as deep Web crawling and comparison-shopping. Analyzing the querying capability of a query interface is critical in supporting such interactions automatically and effectively. In this paper, we propose a querying capability model based on the concept of atomic query which is a valid query with a minimal attribute set. We also provide an approach to construct the querying capability model automatically by identifying atomic queries for any given query interface. Our experimental results show that the accuracy of our algorithm is good.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bergman, M.: The Deep Web: Surfacing Hidden Value (September 2001), http://www.BrightPlanet.com
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB, pp. 129–138 (2001)
Chang, K.C-C, He, B., Li, C., Patel, M., Zhang, Z.: Structured Databases on the Web: Observations and Implications. SIGMOD Record 33(3), 61–70 (2004)
Zhang, Z., He, B., Chang, K.C-C: Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. In: SIGMOD Conference, pp. 107–118 (2004)
He, H., Meng, W., Yu, C., Wu, Z.: Constructing Interface Schemas for Search Interfaces of Web Databases. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 29–42. Springer, Heidelberg (2005)
He, H., Meng, W., Yu, C., Wu, Z.: Automatic integration of Web search interfaces with WISE-Integrator. VLDB J. 13(3), 256–273 (2004)
Wu, P., Wen, J., Liu, H., Ma, W.: Query Selection Techniques for Efficient Crawling of Structured Web Sources. In: ICDE (2006)
Levy, A., Rajaraman, A., Ordille, J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: VLDB, pp. 251–262 (1996)
Ipeirotis, P., Agichtein, E., Jain, P., Gravano, L.: To Search or to Crawl? Towards a Query Optimizer for Text-Centric Tasks. In: SIGMOD Conference (2006)
BrightPlanet.com, http://www.brightplanet.com
Bergholz, A., Chidlovskii, B.: Crawling for Domain-Specific Hidden Web Resources. In: WISE 2003, pp. 125–133 (2003)
Arasu, A., Garcia-Molina, H.: Extracting Structured Data from Web Pages. In: SIGMOD Conference, pp. 337–348 (2003)
Chang, K.C.-C., He, B., Li, C., Zhang, Z.: The UIUC web integration repository. CS Dept., Uni. of Illinois at Urbana-Champaign (2003), http://metaquerier.cs.uiuc.edu/repository
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shu, L., Meng, W., He, H., Yu, C. (2007). Querying Capability Modeling and Construction of Deep Web Sources. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-76993-4_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)