A Learning Classifier-Based Approach to Aligning Data Items and Labels

Neil Anderson¹⁹ &
Jun Hong¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7968))

Included in the following conference series:

British National Conference on Databases

5096 Accesses
1 Citations

Abstract

Web databases are now pervasive. Query result pages are dynamically generated from these databases in response to user-submitted queries. A query result page contains a number of data records, each of which consists of data items and their labels. In this paper, we focus on the data alignment problem, in which individual data items and labels from different data records on a query page are aligned into separate columns, each representing a group of semantically similar data items or labels from each of these data records. We present a new approach to the data alignment problem, in which learning classifiers are trained using supervised learning to align data items and labels. Previous approaches to this problem have relied on heuristics and manually-crafted rules, which are difficult to be adapted to new page layouts and designs. In contrast we are motivated to develop learning classifiers which can be easily adapted. We have implemented the proposed learning classifier-based approach in a software prototype, rAligner, and our experimental results have shown that the approach is highly effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Database Schema Matching Using Machine Learning with Feature Selection

A semi-supervised hierarchical classifier based on local information

Article 27 September 2024

Automatically Annotating Structured Web Data Using a SVM-Based Multiclass Classifier

References

Anderson, N., Hong, J.: Visually extracting data records from query result pages. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds.) APWeb 2013. LNCS, vol. 7808, pp. 392–403. Springer, Heidelberg (2013)
Chapter Google Scholar
Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with lixto. The VLDB Journal, 119–128 (2001)
Google Scholar
Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proc. VLDB Endow. 4(4), 219–230 (2011)
Google Scholar
Derouiche, N., Cautis, B., Abdessalem, T.: Automatic extraction of structured web data with domain knowledge. In: ICDE, Washington, DC, USA, pp. 726–737 (2012)
Google Scholar
Furche, T., Gottlob, G., Grasso, G., Orsi, G., Schallhart, C., Wang, C.: Little knowledge rules the web: Domain-centric result page extraction. In: Rudolph, S., Gutierrez, C. (eds.) RR 2011. LNCS, vol. 6902, pp. 61–76. Springer, Heidelberg (2011)
Chapter Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD 11(1), 10–18 (2009)
Article Google Scholar
Kushmerick, N.: Wrapper induction for information extraction. PhD thesis (1997)
Google Scholar
Liu, W., Meng, X., Meng, W.: Vide: A vision-based approach for deep web data extraction. IEEE Transactions on Knowledge and Data Engineering 22, 447–460 (2010)
Article Google Scholar
Lu, Y., He, H., Meng, W., Zhao, H., Yu, C.: Annotating structured data of the deep web. In: 23rd Conf. on Data Engineering, pp. 376–385. Society Press (2007)
Google Scholar
Simon, K., Lausen, G.: Viper: augmenting automatic information extraction with visual perceptions. In: CIKM Conference, New York, NY, USA, pp. 381–388 (2005)
Google Scholar
Singhal, A.: Modern information retrieval: a brief overview. A bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24 (2001)
Google Scholar
Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: WWW Conference, New York, NY, USA, pp. 187–196 (2003)
Google Scholar
Yamada, Y., Craswell, N., Nakatoh, T., Hirokawa, S.: Testbed for information extraction from deep web. In: WWW Conference, New York, pp. 346–347 (2004)
Google Scholar
Zhai, Y., Liu, B.: Web data extraction based on partial tree alignment. In: WWW Conference, New York, NY, USA, pp. 76–85 (2005)
Google Scholar
Zhao, H., Meng, W., Wu, Z., Raghavan, Yu, C.: Fully automatic wrapper generation for search engines. In: WWW Conference, pp. 66–75 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, UK
Neil Anderson & Jun Hong

Authors

Neil Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Jun Hong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, OX1 3 QD, Oxford, UK
Georg Gottlob
Department of Computer Science, Oxford University, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Giovanni Grasso & Christian Schallhart &
University of Oxford, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Dan Olteanu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anderson, N., Hong, J. (2013). A Learning Classifier-Based Approach to Aligning Data Items and Labels. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds) Big Data. BNCOD 2013. Lecture Notes in Computer Science, vol 7968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39467-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-39467-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39466-9
Online ISBN: 978-3-642-39467-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Learning Classifier-Based Approach to Aligning Data Items and Labels

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Database Schema Matching Using Machine Learning with Feature Selection

A semi-supervised hierarchical classifier based on local information

Automatically Annotating Structured Web Data Using a SVM-Based Multiclass Classifier

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Learning Classifier-Based Approach to Aligning Data Items and Labels

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Database Schema Matching Using Machine Learning with Feature Selection

A semi-supervised hierarchical classifier based on local information

Automatically Annotating Structured Web Data Using a SVM-Based Multiclass Classifier

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation