Article

2D Conditional Random Fields for Web information extraction

Authors:

Wei-Ying MaAuthors Info & Claims

ICML '05: Proceedings of the 22nd international conference on Machine learning

Pages 1044 - 1051

https://doi.org/10.1145/1102351.1102483

Published: 07 August 2005 Publication History

Abstract

The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, as the information on a Web page is two-dimensionally laid out, previous linear-chain CRFs have their limitations for Web information extraction. To better incorporate the two-dimensional neighborhood interactions, this paper presents a two-dimensional CRF model to automatically extract object information from the Web. We empirically compare the proposed model with existing linear-chain CRF models for product information extraction, and the results show the effectiveness of our model.

References

[1]

Besag, J. (1974), Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36:192--236.

[2]

Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39--71.

Digital Library

[3]

Bunescu, R. C., & Mooney, R. J. (2004). Collective information extraction with relational Markov networks. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 439--446, Barcelona, Spain.

Digital Library

[4]

Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2004). Block-based web search. In ACM SIGIR Conference, 2004.

Digital Library

[5]

Collins, M. (2002a), Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm. In Proceedings of EMNLP, 2002.

Digital Library

[6]

Collins, M. (2002b). Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 489--496). Philadelphia, PA.

Digital Library

[7]

Freitag, D. & McCallum, A. (1999). Information Extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.

Digital Library

[8]

Hammersley, J., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Unpublished manuscript.

[9]

Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001) Factor Graphs and the Sum-Product Algorithm. IEEE Transactions on Information Theory, 47(2):498--519.

Digital Library

[10]

Kumar, S., & Hebert, M. (2003). Discriminative random fields: A discriminative framework for contextual interaction in classification. IEEE Int, Conf. on Computer Vision, 2:1150--1157.

Digital Library

[11]

Kumar, S., & Hebert, M. (2004). Discriminative Fields for Modeling Spatial Dependencies in Natural Images. Advances in Neural Information Processing Systems, NIPS 16.

[12]

Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML.

Digital Library

[13]

Leck, T. (1997). Information extraction using hidden Markov models. Master's thesis, University of California, San Diego.

[14]

Li, J., Najmi, A., & Gray, R. M. (2000), Image Classification by a Two-dimensional Hidden Markov Model. IEEE Trans. on Signal Processing, Vol. 48, No. 2.

Digital Library

[15]

Li, S. Z. (2001). Markov Random Field Modeling in Image Analysis, Springer-Verlag, Tokyo.

Digital Library

[16]

Liu, B., Grossman, R. & Zhai, Y. (2003). Mining data records in web pages. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

Digital Library

[17]

Liu, D. C., & Nocedal, J. (1989). On The Limited Memory BFGS Method for Large Scale Optimization. Mathmetical Programming 45, pp. 503--528.

Digital Library

[18]

Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Sixth Conf. on Natural Language Learning, pages 49--55.

Digital Library

[19]

McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. Proc. ICML 2000, pp. 591--598.

Digital Library

[20]

Nie, Z., Zhang, Y., Wen, J. R., & Ma, W. Y. (2005). Object-Level Ranking: Bringing Order to Web Objects. WWW2005.

Digital Library

[21]

Peng, F., & McCallum, A. (2004). Accurate Information Extraction from Research Papers using Conditional Random Fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics.

[22]

Sha, F., & Pereira, F. (2003). Shallow Parsing with Conditional Random Fields. Proceedings of Human Language Technology, NAACL 2003.

Digital Library

[23]

Sutton, C., Rohanimanesh, K., & McCallum, A. (2004). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Proc.ICML2004.

Digital Library

[24]

Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of 18th Conference on Uncertainty in Artificial Intelligence (UAI-02), pp. 485--492, Edmonton, Canada.

Digital Library

[25]

Zhai Y., and Liu B. (2005). Web Data Extraction Based on Partial Tree Alignment. WWW2005.

Digital Library

Cited By

Qiu QTian MTao LXie ZMa K(2024)Semantic information extraction and search of mineral exploration data using text mining and deep learning methodsOre Geology Reviews10.1016/j.oregeorev.2023.105863(105863)Online publication date: Jan-2024
https://doi.org/10.1016/j.oregeorev.2023.105863
Zhong LWu JLi QPeng HWu X(2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
https://dl.acm.org/doi/10.1145/3618295
Qiu QWang BMa KXie Z(2023)Geological profile-text information association model of mineral exploration reports for fast analysis of geological contentOre Geology Reviews10.1016/j.oregeorev.2022.105278153(105278)Online publication date: Feb-2023
https://doi.org/10.1016/j.oregeorev.2022.105278
Show More Cited By

Recommendations

Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Table extraction using conditional random fields
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional ...
Triangular-Chain Conditional Random Fields

Sequential modeling is a fundamental task in scientific fields, especially in speech and natural language processing, where many problems of sequential data can be cast as a sequential labeling or a sequence classification. In many applications, the two ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICML '05: Proceedings of the 22nd international conference on Machine learning

August 2005

1113 pages

ISBN:1595931805

DOI:10.1145/1102351

General Chair:
Saso Dzeroski
Jozef Stefan Institute, Slovenia
,
Program Chairs:
Luc De Raedt,
Stefan Wrobel

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

90
Total Citations
View Citations
668
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Qiu QTian MTao LXie ZMa K(2024)Semantic information extraction and search of mineral exploration data using text mining and deep learning methodsOre Geology Reviews10.1016/j.oregeorev.2023.105863(105863)Online publication date: Jan-2024
https://doi.org/10.1016/j.oregeorev.2023.105863
Zhong LWu JLi QPeng HWu X(2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
https://dl.acm.org/doi/10.1145/3618295
Qiu QWang BMa KXie Z(2023)Geological profile-text information association model of mineral exploration reports for fast analysis of geological contentOre Geology Reviews10.1016/j.oregeorev.2022.105278153(105278)Online publication date: Feb-2023
https://doi.org/10.1016/j.oregeorev.2022.105278
Parthasarathy SPattanaik LKhatry AIyer ARadhakrishna ARajamani SRaza MJhala RDillig I(2022)Landmarks and regions: a robust approach to data extractionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523705(993-1009)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523705
Yang JLi SGao SGuo J(2022)CorefDPR: A Joint Model for Coreference Resolution and Dropped Pronoun Recovery in Chinese ConversationsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.314054530(571-581)Online publication date: 2022
https://doi.org/10.1109/TASLP.2022.3140545
Lin BSheng YVo NTata SGupta RLiu YShah MRajan STang JPrakash B(2020)FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web DocumentsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403153(1092-1102)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403153
Lin JZhou ZSun GShen JPritchard DCui TXu DLi LBeydoun G(2020)Deep Sequence Labelling Model for Information Extraction in Micro Learning Service2020 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN48605.2020.9206606(1-10)Online publication date: Jul-2020
https://doi.org/10.1109/IJCNN48605.2020.9206606
Sun YWang LXie QDong YLin X(2020)Online Programming Education Modeling and Knowledge TracingKnowledge Science, Engineering and Management10.1007/978-3-030-55130-8_23(259-270)Online publication date: 20-Aug-2020
https://doi.org/10.1007/978-3-030-55130-8_23
Gupta RKondapally RGuha S(2019)Large-Scale Information Extraction from Emails with Data ConstraintsBig Data Analytics10.1007/978-3-030-37188-3_8(124-139)Online publication date: 12-Dec-2019
https://doi.org/10.1007/978-3-030-37188-3_8
Yan JWang CCheng WGao MZhou A(2018)A retrospective of knowledge graphsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5228-912:1(55-74)Online publication date: 1-Feb-2018
https://dl.acm.org/doi/10.1007/s11704-016-5228-9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents