Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Data service generation framework from heterogeneous printed forms using semantic link discovery

Published: 01 February 2018 Publication History

Abstract

Printed forms contain rich information in business process and daily life. However, tremendous heterogeneous printed forms containing same categories of information are difficult to manage and share, which lead to massive data in printed forms remaining waste. To automatically integrate and share these data remarkably improves the efficiency of enterprises, the key problem is how to extract heterogeneous data in printed forms and integrate them for quick use. To solve this issue, we propose a framework that discovers semantic links in printed forms and generates data services for easy data management and rapid data sharing in the enterprise systems. First, a multiple-OCR-based form recognition approach is proposed to make forms computer-readable. Next, forms are modeled into semi-structured data using structure-based semantic link discovery and refining with massive data. Then, a linked data model is built by table matching to align data. Finally, data services are generated based on the linked data model. A series of experiments on printed resumes are conducted, and the results illustrate our framework performs well in recognition rate, link discovery accuracy, data compression ratio and data resource accuracy. A prototype system is presented to illustrate the feasibility of the proposed framework. A complete and feasible framework from printed forms to data service is proposed.An automatic form data extraction and structuration approach is presented.A usable prototype system integrating heterogeneous printed resumes is implemented.

References

[1]
C. Patel, A. Patel, D. Patel, Optical character recognition by open source OCR Tool tesseract: A case study, Int. J. Comput. Appl., 55 (2012) 50-56.
[2]
N. Pernelle, F. Sas, D. Symeonidou, An automatic key discovery approach for data linking, Web Semant. Sci. Serv. Agents World Wide Web, 23 (2013) 16-30.
[3]
Q. Zheng, H. Chen, T. Yu, G. Pan, Collaborative semantic association discovery from linked data, in: IEEE International Conference on Information Reuse & Integration, 2009, IEEE, 2009, pp. 394-399.
[4]
Y. Peng, J. Wei, Springer Berlin Heidelberg, 2015.
[5]
K.L. Skillen, L. Chen, C.D. Nugent, M.P. Donnelly, W. Burns, I. Solheim, Ontological user modelling and semantic rule-based reasoning for personalisation of Help-On-Demand services in pervasive environments, Future Gener. Comput. Syst., 34 (2014) 97-109.
[6]
D. Ritze, O. Lehmberg, C. Bizer, Matching html tables to dbpedia, in: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics, ACM, 2015, pp. 10.
[7]
J. Fan, M. Lu, B.C. Ooi, W.-C. Tan, M. Zhang, A hybrid machine-crowdsourcing system for matching web tables, in: 2014 IEEE 30th International Conference onData Engineering, IEEE, 2014, pp. 976-987.
[8]
Z. Zhang, Towards efficient and effective semantic table interpretation, in: International Semantic Web Conference, 2014, pp. 487502.
[9]
S.A. Mcilraith, T.C. Son, H. Zeng, Semantic web services, IEEE Intell. Syst., 16 (2011) 46-53.
[10]
H. Cai, C. Xie, L. Jiang, L. Fang, C. Huang, An ontology-based semantic configuration approach to constructing data as a Service for enterprises, Enterp. Inf. Syst., 10 (2016) 325-348.
[11]
M. Laitkorpi, P. Selonen, T. Systa, Towards a model-driven process for designing restful web services, in: IEEE International Conference on Web Services, ICWS 2009, Los Angeles, Ca, Usa, 6-10 July 2009, pp. 173180.
[12]
B. You, X.R. Liu, N. Li, Y.S. Yan, Using information content to evaluate semantic similarity on hownet, in: Eighth International Conference on Computational Intelligence and Security 2012, pp. 142145.
[13]
X. Yu, L. Peng, Z. Huang, H. Zhuge, A framework for automated construction of resource space based on background knowledge, Future Gener. Comput. Syst., 32 (2014) 222231.
[14]
A. Ting, M.K.H. Leung, Form recognition using linear structure, Pattern Recognit., 32 (1999) 645-656.
[15]
X.U. Jun-Gang, Overview of data extraction,transformation and loading, Comput. Sci. (2011).
[16]
C.-F. Lin, C.-Y. Hsiao, Structural recognition for table-form documents using relaxation techniques, Int. J. Pattern Recognit. Artif. Intell., 12 (1998) 985-1005.
[17]
J. Hirayama, H. Shinjo, T. Takahashi, T. Nagasaki, Development of template-free form recognition system, in: 2011 International Conference on Document Analysis and Recognition, IEEE, 2011, pp. 237-241.
[18]
P. Emilie, B. Yolande, B. Abdel, Use of semantic and physical constraints in bayesian networks for form recognition, in: International Conference on Document Analysis and Recognition, 2011, pp. 946950.
[19]
T. Kasar, T.K. Bhowmik, A. Belaid, Table information extraction and structure recognition using query patterns, in: 2015 13th International Conference on Document Analysis and Recognition, IEEE, 2015, pp. 1086-1090.
[20]
D. Ritze, O. Lehmberg, Y. Oulabi, C. Bizer, Profiling the potential of web tables for augmenting cross-domain knowledge bases, in: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2016, pp. 251261.
[21]
S.A.R. Dezfouli, J. Habibi, S.H. Yeganeh, Semantic web services for handling data heterogeneity in an e-business framework, in: Advances in Computer Science and Engineering, Springer, 2008, pp. 453-460.
[22]
C. Yu, L. Huang, CluCF: a clustering CF algorithm to address data sparsity problem, Serv. Oriented Comput. Appl., 11 (2017) 1-13.
[23]
A. Alsaig, V. Alagar, M. Mohammad, W. Alhalabi, A user-centric semantic-based algorithm for ranking services: design and analysis, Serv. Oriented Comput. Appl., 11 (2017) 1-20.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Future Generation Computer Systems
Future Generation Computer Systems  Volume 79, Issue P2
February 2018
310 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2018

Author Tags

  1. Data service generation
  2. Form recognition
  3. Heterogeneous data integration
  4. Semantic data model
  5. Table matching

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media