Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Purple SOX extraction management system

Published: 20 March 2009 Publication History

Abstract

We describe the Purple SOX (PSOX) EMS, a prototype Extraction Management System currently being built at Yahoo!. The goal of the PSOX EMS is to manage a large number of sophisticated extraction pipelines across different application domains, at the web scale and with minimum human involvement. Three key value propositions are described: extensibility, the ability to swap in and out extraction operators; explainability, the ability to track the provenance of extraction results; and social feedback support, the facility for gathering and reconciling multiple, potentially conflicting sources.

References

[1]
Ilkay Altintas et al. Introduction to scientific workflow management and the Kepler system. In SC, 2006.
[2]
A. Chapman, H. V. Jagadish, and P. Ramanan. Efficient provenance storage. In SIGMOD, 2008.
[3]
F. Chen, A. Doan, J. Yang, and R. Ramakrishnan. Efficient information extraction over evolving text data. In ICDE, 2008.
[4]
S. Cohen, S. Boulakia, and S. Davidson. Towards a model of provenance and user views in scientific workflows. In DILS, 2006.
[5]
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A framework and graphical development environment for robust NLP tools and applications. In ACL, 2002.
[6]
P. DeRose, W. Shen, F. Chen, A. Doan, and R. Ramakrishnan. Building structured web community portals: A top-down, compositional, and incremental approach. In VLDB, 2007.
[7]
A. Doan, R. Ramakrishnan, and S. Vaithyanathan. Managing information extraction: state of the art and research directions. In SIGMOD, 2006.
[8]
D. Ferrucci and A. Lally. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4):327--348, 2004.
[9]
F. Manola and E. Miller. RDF Primer W3C Recommendation, 2004.
[10]
W. Shen, A. Doan, J. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In VLDB, 2007.
[11]
J. Turmo, A. Ageno, and N. Català. Adaptive information extraction. ACM Comput. Surv., 38(2):4, 2006.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 37, Issue 4
December 2008
116 pages
ISSN:0163-5808
DOI:10.1145/1519103
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2009
Published in SIGMOD Volume 37, Issue 4

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2018)Web Information ExtractionEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_459(4620-4629)Online publication date: 7-Dec-2018
  • (2017)Web Information ExtractionEncyclopedia of Database Systems10.1007/978-1-4899-7993-3_459-2(1-9)Online publication date: 27-Jan-2017
  • (2015)Crowdsourced Data ManagementFoundations and Trends in Databases10.1561/19000000446:1-2(1-161)Online publication date: 1-Dec-2015
  • (2014)Finish them!Proceedings of the VLDB Endowment10.14778/2733085.27331017:14(1965-1976)Online publication date: 1-Oct-2014
  • (2014)UIMA Ruta: Rapid development of rule-based information extraction applicationsNatural Language Engineering10.1017/S135132491400011422:01(1-40)Online publication date: 8-Oct-2014
  • (2014)Soft-constrained inference for Named Entity RecognitionInformation Processing and Management: an International Journal10.1016/j.ipm.2014.04.00550:5(807-819)Online publication date: 1-Sep-2014
  • (2013)GPTextProceedings of the Second Workshop on Data Analytics in the Cloud10.1145/2486767.2486774(31-35)Online publication date: 23-Jun-2013
  • (2013)An Approach for Characterizing Group-Based Interactive EnvironmentsDistributed Systems and Applications of Information Filtering and Retrieval10.1007/978-3-642-40621-8_5(79-100)Online publication date: 8-Nov-2013
  • (2012)Minimizing uncertainty in pipelinesProceedings of the 25th International Conference on Neural Information Processing Systems - Volume 210.5555/2999325.2999463(2942-2950)Online publication date: 3-Dec-2012
  • (2012)Measuring Popularity in Social Network GroupsProceedings of the 2012 Second International Conference on Cloud and Green Computing10.1109/CGC.2012.85(485-492)Online publication date: 1-Nov-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media