Abstract
Many databases have become Web-accessible through form-based search interfaces (i.e., search forms) that allow users to specify complex and precise queries to access the underlying databases. In general, such a Web search interface can be considered as containing an interface schema with multiple attributes and rich semantic/meta information; however, the schema is not formally defined on the search interface. Many Web applications, such as Web database integration and deep Web crawling, require the construction of the schemas. In this paper, we introduce a schema model for complex search interfaces, and present a tool (WISE-iExtractor) for automatically extracting and deriving all the needed information to construct the schemas. Our experimental results on real search interfaces indicate that this tool is highly effective.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bergamaschi, S., Castano, S., Vincini, M., Beneventano, D.: Semantic Integration of Heterogeneous Information Sources. Data & Knowledge Engineering 36, 215–249 (2001)
Chang, K., He, B., Li, C., Patel, M., Zhang, Z.: Structured Databases on the Web: Observations and Implications. SIGMOD Record 33(3) (September 2004)
Chang, K., Garcia-Molina, H.: Mind Your Vocabulary: Query Mapping Across Heterogeneous Information Sources. In: SIGMOD Conference (1999)
Gal, A., Modica, G., Jamil, H.: OntoBuilder: Fully Automatic Extraction and Consolidation of Ontologies from Web Sources. In: ICDE Conference (2004)
He, B., Chang, K.: Statistical Schema Matching across Web Query Interfaces. In: SIGMOD Conference (2003)
He, B., Tao, T., Chang, K.: Clustering Structured Web Sources: a Schema-based, Model-Differentiation Approach. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 536–546. Springer, Heidelberg (2004)
He, H., Meng, W., Yu, C., Wu, Z.: WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-commerce. In: VLDB Conference (2003)
He, H., Meng, W., Yu, C., Wu, Z.: Automatic Extraction of Web Search Interfaces for Interface Schema Integration. In: WWW Conference (2004)
Kaljuvee, O., Buyukkokten, O., Garcia-Molina, H., Paepcke, A.: Efficient Web Form Entry on PDAs. In: WWW Conference (2000)
Kushmerick, N.: Learning to Invoke Web Forms. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 997–1013. Springer, Heidelberg (2003)
Levy, A., Rajaraman, A., Ordille, J.: Querying Heterogeneous Information Sources Using Source Descriptions. In: VLDB Conference (1996)
Peng, Q., Meng, W., He, H., Yu, C.: WISE-Cluster: Clustering E-Commerce Search Engines Automatically. In: WIDM workshop (2004)
Raghavan, S., Garcia-Molina, H.: Crawling the Hidden Web. In: VLDB Conference (2001)
Wu, W., Yu, C., Doan, A., Meng, W.: An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web. In: SIGMOD Conference (2004)
Wang, J., Lochovsky, F.H.: Data Extraction and Label Assignment for Web Databases. In: WWW Conference (2003)
Zhang, Z., He, B., Chang, K.: Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. In: SIGMOD Conference (2004)
MetaQuerier: http://metaquerier.cs.uiuc.edu/formex
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
He, H., Meng, W., Yu, C., Wu, Z. (2005). Constructing Interface Schemas for Search Interfaces of Web Databases. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, JY., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2005. WISE 2005. Lecture Notes in Computer Science, vol 3806. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11581062_3
Download citation
DOI: https://doi.org/10.1007/11581062_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30017-5
Online ISBN: 978-3-540-32286-3
eBook Packages: Computer ScienceComputer Science (R0)