Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1102351.1102483acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

2D Conditional Random Fields for Web information extraction

Published: 07 August 2005 Publication History

Abstract

The Web contains an abundance of useful semistructured information about real world objects, and our empirical study shows that strong sequence characteristics exist for Web information about objects of the same type across different Web sites. Conditional Random Fields (CRFs) are the state of the art approaches taking the sequence characteristics to do better labeling. However, as the information on a Web page is two-dimensionally laid out, previous linear-chain CRFs have their limitations for Web information extraction. To better incorporate the two-dimensional neighborhood interactions, this paper presents a two-dimensional CRF model to automatically extract object information from the Web. We empirically compare the proposed model with existing linear-chain CRF models for product information extraction, and the results show the effectiveness of our model.

References

[1]
Besag, J. (1974), Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, Series B, 36:192--236.
[2]
Berger, A. L., Pietra, S. A. D., & Pietra, V. J. D. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39--71.
[3]
Bunescu, R. C., & Mooney, R. J. (2004). Collective information extraction with relational Markov networks. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 439--446, Barcelona, Spain.
[4]
Cai, D., Yu, S., Wen, J. R., & Ma, W. Y. (2004). Block-based web search. In ACM SIGIR Conference, 2004.
[5]
Collins, M. (2002a), Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithm. In Proceedings of EMNLP, 2002.
[6]
Collins, M. (2002b). Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 489--496). Philadelphia, PA.
[7]
Freitag, D. & McCallum, A. (1999). Information Extraction with HMMs and shrinkage. In Proceedings of the AAAI-99 Workshop on Machine Learning for Information Extraction.
[8]
Hammersley, J., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Unpublished manuscript.
[9]
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001) Factor Graphs and the Sum-Product Algorithm. IEEE Transactions on Information Theory, 47(2):498--519.
[10]
Kumar, S., & Hebert, M. (2003). Discriminative random fields: A discriminative framework for contextual interaction in classification. IEEE Int, Conf. on Computer Vision, 2:1150--1157.
[11]
Kumar, S., & Hebert, M. (2004). Discriminative Fields for Modeling Spatial Dependencies in Natural Images. Advances in Neural Information Processing Systems, NIPS 16.
[12]
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. ICML.
[13]
Leck, T. (1997). Information extraction using hidden Markov models. Master's thesis, University of California, San Diego.
[14]
Li, J., Najmi, A., & Gray, R. M. (2000), Image Classification by a Two-dimensional Hidden Markov Model. IEEE Trans. on Signal Processing, Vol. 48, No. 2.
[15]
Li, S. Z. (2001). Markov Random Field Modeling in Image Analysis, Springer-Verlag, Tokyo.
[16]
Liu, B., Grossman, R. & Zhai, Y. (2003). Mining data records in web pages. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[17]
Liu, D. C., & Nocedal, J. (1989). On The Limited Memory BFGS Method for Large Scale Optimization. Mathmetical Programming 45, pp. 503--528.
[18]
Malouf, R. (2002). A comparison of algorithms for maximum entropy parameter estimation. In Sixth Conf. on Natural Language Learning, pages 49--55.
[19]
McCallum, A., Freitag, D., & Pereira, F. (2000). Maximum entropy Markov models for information extraction and segmentation. Proc. ICML 2000, pp. 591--598.
[20]
Nie, Z., Zhang, Y., Wen, J. R., & Ma, W. Y. (2005). Object-Level Ranking: Bringing Order to Web Objects. WWW2005.
[21]
Peng, F., & McCallum, A. (2004). Accurate Information Extraction from Research Papers using Conditional Random Fields. Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics.
[22]
Sha, F., & Pereira, F. (2003). Shallow Parsing with Conditional Random Fields. Proceedings of Human Language Technology, NAACL 2003.
[23]
Sutton, C., Rohanimanesh, K., & McCallum, A. (2004). Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. Proc.ICML2004.
[24]
Taskar, B., Abbeel, P., & Koller, D. (2002). Discriminative probabilistic models for relational data. In Proceedings of 18th Conference on Uncertainty in Artificial Intelligence (UAI-02), pp. 485--492, Edmonton, Canada.
[25]
Zhai Y., and Liu B. (2005). Web Data Extraction Based on Partial Tree Alignment. WWW2005.

Cited By

View all
  • (2024)Semantic information extraction and search of mineral exploration data using text mining and deep learning methodsOre Geology Reviews10.1016/j.oregeorev.2023.105863(105863)Online publication date: Jan-2024
  • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
  • (2023)Geological profile-text information association model of mineral exploration reports for fast analysis of geological contentOre Geology Reviews10.1016/j.oregeorev.2022.105278153(105278)Online publication date: Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '05: Proceedings of the 22nd international conference on Machine learning
August 2005
1113 pages
ISBN:1595931805
DOI:10.1145/1102351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic information extraction and search of mineral exploration data using text mining and deep learning methodsOre Geology Reviews10.1016/j.oregeorev.2023.105863(105863)Online publication date: Jan-2024
  • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
  • (2023)Geological profile-text information association model of mineral exploration reports for fast analysis of geological contentOre Geology Reviews10.1016/j.oregeorev.2022.105278153(105278)Online publication date: Feb-2023
  • (2022)Landmarks and regions: a robust approach to data extractionProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523705(993-1009)Online publication date: 9-Jun-2022
  • (2022)CorefDPR: A Joint Model for Coreference Resolution and Dropped Pronoun Recovery in Chinese ConversationsIEEE/ACM Transactions on Audio, Speech, and Language Processing10.1109/TASLP.2022.314054530(571-581)Online publication date: 2022
  • (2020)FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web DocumentsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403153(1092-1102)Online publication date: 23-Aug-2020
  • (2020)Deep Sequence Labelling Model for Information Extraction in Micro Learning Service2020 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN48605.2020.9206606(1-10)Online publication date: Jul-2020
  • (2020)Online Programming Education Modeling and Knowledge TracingKnowledge Science, Engineering and Management10.1007/978-3-030-55130-8_23(259-270)Online publication date: 20-Aug-2020
  • (2019)Large-Scale Information Extraction from Emails with Data ConstraintsBig Data Analytics10.1007/978-3-030-37188-3_8(124-139)Online publication date: 12-Dec-2019
  • (2018)A retrospective of knowledge graphsFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-5228-912:1(55-74)Online publication date: 1-Feb-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media