Streamlining Tax and Administrative Document Management with AI-Powered Intelligent Document Management System
<p>Global workflow of the DMS architecture.</p> "> Figure 2
<p>Ontology overview.</p> "> Figure 3
<p>Workflow—document classification and information extraction.</p> "> Figure 4
<p>Information extraction.</p> "> Figure 5
<p>Architecture overview of the mapping process.</p> "> Figure 6
<p>Overview of mapping steps.</p> "> Figure 7
<p>Interaction between the Document Classification/Information Extraction module and the UUID generator.</p> "> Figure 8
<p>Interaction between the information and extraction merger component and RabbitMQ/Kafka queues/topics to handle events and data.</p> "> Figure 9
<p>Third experiment—confusion matrix computed on the dataset comprising real documents.</p> "> Figure 10
<p>Extraction of RDF data compliance with SHACL rules.</p> "> Figure 11
<p>Inferences regarding multi-labelling rules in TopBraid.</p> "> Figure 12
<p>Inferences regarding direct rules in TopBraid.</p> "> Figure 13
<p>Inferences regarding inverse rules in TopBraid.</p> "> Figure 14
<p>Results of applying the profile classification and document labelling rules to the mapped data.</p> "> Figure 15
<p>Individuals belonging to the same tax household.</p> "> Figure 16
<p>Documents that taxpayer A who belongs to a tax household has to deliver.</p> "> Figure 17
<p>Error message addressed to taxpayer for missed family allowances.</p> "> Figure 18
<p>The taxpayer who has two profiles during a tax year.</p> "> Figure 19
<p>Types of documents to be delivered based on the status held by the taxpayer.</p> ">
Abstract
:1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Global Workflow and Architecture
- Within the component titled Doc. Categorisation/Information Extraction, documents (native PDFs or scanned documents) are processed by a Section 3.2.2. The module (i) classifies the type of document and (ii) extracts the information.
- Continuing from right to left, the two boxes labelled JSON File (doc) and JSON File (Profiles) demonstrate the generation of JSON files for each document with extracted information regarding document keywords and profiles (Section 3.2.3).
- Arrows labelled RDF mapping indicate that the JSON files are processed through RDF mapping and stored into the RDF data triple store (Section 3.2.4).
- Arrows connecting the RDF triple store component with the Ontology and Rules components indicate that the RDF data are integrated into the ontology and semantic rules.
- The Ontology component covers the domain of fiscal and administrative document management, describing fiscal budget, individuals, documents, profiles, and tax categories. The ontology is constructed from Swiss tax return data based on actual legal documents and required documents for tax return completion, as shown through the arrow connecting the tax declaration and legal documents to the Ontology component (Section 3.2.1).
- The Rules component includes a series of rules, including those related to document validation, profile updates, the identification of missing documents, and document labelling. The rules are derived from the documents provided by Addmin concerning the legal tax rules in the Canton of Geneva and Switzerland, as indicated by the arrow pointing left with the label Rules modelling. These rules are applied by the reasoning engine to RDF data (Section 3.2.5).
- The reasoner updates JSON profiles based on new information (e.g., a new child added to the household) and identifies missing documents based on existing profiles (e.g., missing health fees for a household member), as shown by the arrows labelled update and identifies at the bottom right of the figure.
3.2. Detailed Insight into DMS Architecture Modules
3.2.1. Ontology Development
- We defined the domain and purpose of the ontology, which allowed us to outline the context in which the ontology will be used and the main functions it must perform (the management of fiscal and administrative documents in our case). We clarified the main objectives of the ontology as well as its content (including document classification, user profile definition, and organisation of information regarding fiscal items and changes in marital status or domicile).
- Subsequently, we conducted an in-depth analysis of requirements and available data sources. We examined tax declaration forms, instructions for their completion issued by competent authorities, and other related documents, such as payroll statements and health insurance communications.
- We then established the classes and class hierarchy, which consisted of defining the main classes of the ontology including documents, user profiles, fiscal elements, and changes in marital status or domicile.
- We also defined class properties, which represent the attributes or characteristics of documents and user profiles. We defined keywords or attributes that documents can have in relation to the data we want to extract from them. These data are useful for classifying or labelling the document or for creating a profile of a person or a family. For example, in the case of tax returns, by defining attributes such as total income, deductions, and marital status, documents can be classified according to these attributes. The definition of attributes or keywords facilitates the structuring and organisation of data.
- Once the conceptual design of the ontology was completed, we proceeded with the technical implementation of the ontology using the Protege version Protégé-owl 5.5.0-beta-9 software.
- The ontology was validated by tax experts to ensure its validity and consistency over time. Corrections or adjustments were made based on the feedback received during this phase. In particular, the final validation phase was preceded by interim monitoring phases of the documentation used to define the classes and explain any doubts about the interpretation of the legal requirements. This process ensures that all the documents used in the system are adequate and comply with regulatory requirements. During the documentation monitoring phase, the team carefully reviewed all the documents used to define classes in the system. In addition, explanations were provided for any doubts regarding the interpretation of legal requirements that could affect the preparation of tax documents. Finally, the final validation was carried out by the project partner together with experts in the field of tax return preparation.
amount some (HealthInsurancePremium or ExpensesSicknessNotReimbursed)
3.2.2. Document Classification and Information Extraction Module
3.2.3. Document Schema and Profiles Generation
3.2.4. Data Mapping to RDF
3.2.5. Reasoning Engine Development
3.3. An Overview of Component Implementation
- Document classification and information extraction: This module consists of sub-projects that perform (i) document classification on image documents, (ii) information extraction from image documents using templates, and (iii) generation of image documents for a document category/class (data augmentation).
- Mapping rules: This module defines custom mapping rules that are used to transform data from JSON to RDF. They are written using RML language and processed by an RMLStreamer processor. The module requires an Apache Kafka instance to run as the input data sources used by the mapping rules are Kafka topics. They are available at https://gitlab.unige.ch/addmin/rml-mappings (accessed on 14 July 2024).
- Information and Extraction Merger: This is a Python 3.9.1 module that performs a merge on the data derived from the Information and Extraction module. It is released as a Docker image. It requires a connection to a RabbitMQ and an Apache Kafka instance to work.
- Information and Extraction Cleanser: This is a Flink application used to clean, and merge when needed, the data derived from the Information and Extraction module. The application is released as a JAR that can be downloaded at https://gitlab.unige.ch/addmin/ie-cleanser/-/releases (accessed on 14 July 2024). This component must be deployed on an Apache Flink cluster.
- Universal Unique Identifier (UUID) Generator: This provides a set of REST APIs that generate a UUID. The component provides two entry points that return v1 and v4 UUIDs, respectively. It is released as a Docker.
- Profiling: This module contains the fiscal profiles and the minimum information to be extracted for each of them. The information is represented through the JSON schema, available at https://gitlab.unige.ch/addmin/profiles (accessed on 14 July 2024).
- Constraints, Rules, and Validation: This module contains the rules written using SHACL, which is an official W3C standard language for describing a set of conditions that data—specifically, data in the knowledge graphs—should satisfy. SHACL is supported by TopBraid, developed by TopQuadrant, Inc. We used TopBraid Composer Maestro Edition version 7.1.1 in our current research project. We also used SHACL to validate the graphs. A SHACL validation engine takes as input a data graph to be validated and a shape graph containing SHACL shape declarations and produces a validation report, also expressed as a graph. The result of SHACL validation describes whether the data graph matches the shape graph and, if it does not, describes any mismatch. In this way, SHACL can be used to validate that data conform to the desired requirements.
4. Results
4.1. Module Results
4.1.1. Ontology Outcome
4.1.2. Document Classification and Information Extraction Outcomes
4.1.3. Document Schema and Profile Generation Outcomes
Listing 1. The JSON schema corresponding to the Third-Pillar Attestation. |
Listing 2. An instantiation of the JSON schema of the Third-Pillar Attestation. |
4.1.4. Data Mapping and Reasoning Engine Outcome
- (i) Multi-labelling rules: Figure 11 shows that the reasoner assigned tags to document instances that were integrated. The different instances of health insurance (such as AAssuranceMaladie/12.3.204 and so on) were classified under the categories of tax (Impôt, insurance Assurance, while the salary certificates (such as CSalaire/13.3.234 were classified as tax (Impôt, income Revenu. The system generated a total of 14 such inferences.
- (ii) Direct rules: Figure 12 displays the inferences generated by the system regarding the direct relationships between documents and profiles. For example, the system knows that Mrs. Zola Giovanna has submitted a salary statement. Based on the type of document submitted by Mrs. Giovanna, the system infers that she is an employee. This means that the system uses the documents she uploaded to infer her profile, rather than relying on explicit user declarations. The information in Mrs. Zola Giovanna’s user profile is useful for preparing her tax return in several ways. For example, as an employee, Mrs. Zola may be entitled to tax deductions for work-related expenses incurred during the tax year. These expenses can include transportation costs to get to work or the purchase of materials necessary to perform her job. Knowing that Mrs. Zola is an employee, the system may suggest that she includes these expenses as deductions in her tax return. Being an employee may also mean that she has received income from employment during the tax year. This income needs to be declared correctly on her tax return, and the system can help her calculate the exact amount to include in her return. The reasoner generated a total of two such inferences.
- (iii) Inverse rules: Figure 13 shows the inferences related to the inverse relationships between documents and profiles. For example, Mr. Ladoumegue Jules has declared himself an employee. The system will ask Mr. Ladoumegue to upload either a salary certificate or a certificate of training, development, or conversion statement, depending on what is still missing in his profile. Furthermore, the DMS requests Mr. Jules to submit his health insurance, not as an employee but as an individual, in compliance with the rules that require every person to have insurance coverage. The system generated a total of 12 such inferences.
4.2. Use Cases
4.2.1. Use Case No. 1—A Household with Working Parents and Children
Listing 3. JSON data extracted from the salary statement. |
Listing 4. An example of the information extracted from a salary certificate represented using RDF. |
4.2.2. Use Case No. 2—Detection of Missing Documents
Listing 5. The initial dataset for the use case n. 2. |
4.2.3. Use Case No. 3—Profile Update Following Status Change
Listing 6. The initial dataset for the use case no. 3. |
Listing 7. An example of pension statement represented in RDF. |
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
AVS | Assurance-vieillesse et survivants |
CNN | Convolutional Neural Network |
CNNs | Convolutional Neural Networks |
DMS | Document Management System |
DMSs | Document Management Systems |
LLMs | Large Language Models |
ML | Machine Learning |
NLP | Natural Language Processing |
OCR | Optical Character Recognition |
OWL | Web Ontology Language |
RDF | Resource Description Framework |
RML | RDF Mapping Language |
SPARQL | SPARQL Protocol and RDF Query Language |
UUID | Universal Unique Identifier |
References
- Stylianou, N.; Vlachava, D.; Konstantinidis, I.; Bassiliades, N.; Peristeras, V. Doc2KG: Transforming Document Repositories to Knowledge Graphs. Int. J. Semantic Web Inf. Syst. 2022, 18, 1–20. [Google Scholar] [CrossRef]
- Serugendo, G.D.M.; Cappelli, M.A.; Glass, P.; Caselli, A. The Semantic Approach to Recognise the Components of the Underground Cadastre; Technical Report; University of Geneva: Geneva, Switzerland, 2024; Available online: https://archive-ouverte.unige.ch/unige:175632 (accessed on 14 July 2024).
- Cappelli, M.A.; Di Marzo Serugendo, G.; Cutting-Decelle, A.F.; Strohmeier, M. A semantic-based approach to analyze the link between security and safety for Internet of Vehicle (IoV) and Autonomous Vehicles (AVs). In Proceedings of the CARS 2021 6th International Workshop on Critical Automotive Applications: Robustness & Safety; HAL Inserm: Münich, Germany, 2021; Available online: https://hal.archives-ouvertes.fr/hal-03366378 (accessed on 14 July 2024).
- Staab, S.; Studer, R. (Eds.) Handbook on Ontologies; Springer Science & Business Media: New York, NY, USA, 2010. [Google Scholar]
- Staab, S.; Studer, R.; Schnurr, H.; Sure, Y. Knowledge processes and ontologies. IEEE Intell. Syst. 2001, 16, 26–34. [Google Scholar] [CrossRef]
- Noy, N.F.; McGuinness, D.L. Ontology Development 101: A Guide to Creating Your First Ontology; Protege: Portland, OR, USA, 2001. [Google Scholar]
- Augereau, O.; Journet, N.; Vialard, A.; Domenger, J.P. Improving classification of an industrial document image database by combining visual and textual features. In Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, Tours, France, 7–10 April 2014; pp. 314–318. [Google Scholar]
- Shovon, S.S.F.; Mohsin, M.M.A.B.; Tama, K.T.J.; Ferdaous, J.; Momen, S. CVR: An Automated CV Recommender System Using Machine Learning Techniques. In Data Science and Algorithms in Systems: Proceedings of 6th Computational Methods in Systems and Software 2022, Vol. 2; Springer: New York, NY, USA, 2023; pp. 312–325. [Google Scholar]
- Eswaraiah, P.; Syed, H. An efficient ontology model with query execution for accurate document content extraction. Indones. J. Electr. Eng. Comput. Sci. 2023, 29, 981–989. [Google Scholar] [CrossRef]
- Bratarchuk, T.; Milkina, I. Development of electronic document management system in tax authorities. E-Management 2021, 3, 37–48. [Google Scholar] [CrossRef]
- Sambetbayeva, M.; Kuspanova, I.; Yerimbetova, A.; Serikbayeva, S.; Bauyrzhanova, S. Development of Intelligent Electronic Document Management System Model Based on Machine Learning Methods. East.-Eur. J. Enterp. Technol. 2022, 1, 115. [Google Scholar] [CrossRef]
- Justina, I.A.; Abiodun, O.E.; Orogbemi, O.M. A Secured Cloud-Based Electronic Document Management System. Int. J. Innov. Res. Dev. 2022, 11, 38–45. [Google Scholar] [CrossRef]
- Ustenko, S.; Ostapovych, T. Amazon Kendra at banking document management system. Access J. 2023, 4, 34–45. [Google Scholar] [CrossRef] [PubMed]
- Martiri, E.; Muca, G.; Xhina, E.; Hoxha, K. DMS-XT: A Blockchain-based Document Management System for Secure and Intelligent Archival. In Proceedings of the RTA-CSIT, Tirana, Albania, 23–24 November 2018; pp. 70–74. [Google Scholar]
- Sladić, G.; Cverdelj-Fogaraši, I.; Gostojić, S.; Savić, G.; Segedinac, M.; Zarić, M. Multilayer document model for semantic document management services. J. Doc. 2017, 73, 803–824. [Google Scholar] [CrossRef]
- ISO IEC 82045-1; International Organization for Standardization (ISO). Document Management—Part 1: Principles and Methods. ISO: Geneva, Switzerland, 2001.
- Errico, F.; Corallo, A.; Barriera, R.; Prato, M. Dematerialization, Archiving and Recovery of Documents: A Proposed Tool Based on a Semantic Classifier and a Semantic Search Engine. In Proceedings of the 2020 9th International Conference on Industrial Technology and Management (ICITM), Oxford, UK, 11–13 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 297–301. [Google Scholar]
- Fuertes, A.; Forcada, N.; Casals, M.; Gangolells, M.; Roca, X. Development of an ontology for the document management systems for construction. In Complex Systems Concurrent Engineering; Springer: New York, NY, USA, 2007; pp. 529–536. [Google Scholar]
- Ferrando, J.; Domínguez, J.L.; Torres, J.; García, R.; García, D.; Garrido, D.; Cortada, J.; Valero, M. Improving accuracy and speeding up document image classification through parallel systems. In Proceedings of the Computational Science–ICCS 2020: 20th International Conference, Amsterdam, The Netherlands, 3–5 June 2020; Proceedings, Part II 20. Springer: New York, NY, USA, 2020; pp. 387–400. [Google Scholar]
- Knublauch, H.; Kontokostas, D. Shapes Constraint Language (SHACL), W3C Recommendation 20 July 2017. Available online: https://www.w3.org/TR/shacl (accessed on 14 July 2024).
- Di Marzo Serugendo, G.; Falquet, G.; Metral, C.; Cappelli, M.A.; Wade, A.; Ghadfi, S.; Cutting-Decelle, A.F.; Caselli, A.; Cutting, G. Addmin: Private Computing for Consumers’ Online Documents Access: Scientific Technical Report; University of Geneve: Geneve, Switzerland, 2022; Available online: https://archive-ouverte.unige.ch/unige:162549 (accessed on 14 July 2024).
- Cappelli, M.A.; Caselli, A.; Di Marzo Serugendo, G. Designing an Efficient Document Management System (DMS) using Ontology and SHACL Shapes. J. Vis. Lang. Comput. 2023, 2, 15–28. Available online: https://ksiresearch.org/jvlc/journal/JVLC2023N2/paper034.pdf (accessed on 14 July 2024). [CrossRef]
- Cappelli, M.A.; Caselli, A.; Di Marzo Serugendo, G. Enriching RDF-based Document Management System with Semantic-based Reasoning. In Proceedings of the The 29th International DMS Conference on Visualization and Visual Languages, DMSVIVA 2023, KSIR Virtual Conference Center, Pittsburgh, PA, USA, 29 June–3 July 2023; KSI Research Inc.: Pittsburgh, PA, USA, 2023; pp. 44–50. [Google Scholar] [CrossRef]
- Caselli, A.; Serugendo, G.D.M.; Falquet, G. A Framework for Regulatory Compliance using Knowledge Graphs. In Proceedings of the 2023 Digital Sciences Day, Geneva, Switzerland, 28 June–1 July 2023; Centre Universitaire d’Informatique (CUI): Geneva, Switzerland, 2023. [Google Scholar]
Name | Description | Outcome |
---|---|---|
Section 3.2.1 | Development of an ontology for fiduciary, insurance and user profiles | → Representation of concepts of the fiduciary and insurance domains → Representation of concepts related to tax profiles |
Section 3.2.2 | Classification of documents into their respective categories. Extraction of relevant information using appropriate template based on the document category | → File containing extracted information |
Section 3.2.3 | Defining documents schema and tax profiles | → JSON schema for document schema and tax profiles |
Section 3.2.4 | Map extracted information to RDF | → Convert JSON file to RDF triples by leveraging the ontology vocabulary |
Section 3.2.5 | Definition of SHACL shapes and SHACL rules | → Document classification and recognition → Rules for multi-label classification → User profile classification rules |
N | DOCUMENT | FEATURES | CLASSIFICATION | TAG |
---|---|---|---|---|
1 | Tax return and/or ID (taxpayer number and declaration code) | Tax household | Tax return previous year | Tax Family |
2 | Employee’s salary statement | Employee | Salary statement | Income Tax |
3 | Third-pillar A certificate and/or Second-pillar buy-back | Third-pillar | Third-pillar contributions | Insurance Miscellaneous Expenses Tax |
4 | Bank accounts, shares, bonds, participation, cryptocurrencies, lottery winnings, etc. | Securities | Bank statements | Securities Finance Tax |
5 | Bank account maintenance fees | Stocks | Bank account vouchers | Stocks Financial Tax |
PROFILE | NAME |
---|---|
P1 | Single with property |
P2 | Single with property with children |
P3 | Single with a property with children and dependants |
P28 | Cohabitants owning a property without children |
P29 | Cohabiting with dependants |
P31 | Cohabitation with a self-employed person |
P32 | Cohabitation without property |
P33 | Cohabiting with children without property |
P34 | Living together without own property with children and other dependants |
P36 | Living together without a property with dependants |
P38 | Cohabiting with no children and no property |
P39 | Divorced with property |
P40 | Divorced with property with children |
P41 | Divorced with property with children and dependants |
P43 | Divorced with property and dependants |
P72 | Married with property and children |
P74 | Married with property with children and dependants |
P135 | Unmarried without property with children |
P136 | Unmarried with children and dependants |
P138 | Unmarried without property and with dependants |
TAG ID | TAG LABEL |
---|---|
1 | Tax |
2 | Expenses/Other Costs |
3 | Pension |
4 | Children |
5 | Family |
6 | Finances |
7 | Income |
8 | Formation |
9 | Real Estate |
10 | Medical Expenses |
11 | Insurance |
12 | Job |
13 | Securities |
DOCUMENT → USER PROFILE | ||
---|---|---|
Subject | Predicate | Object |
User providing health insurance | is | Person |
User providing salary statement | is | Employee |
User delivering Training, Education, Retraining Document | is | Employee |
User providing Third-pillar pension | is | Pensioner |
USER PROFILE → DOCUMENT | ||
---|---|---|
Subject | Predicate | Object |
Person | delivers | Health insurance |
Employee | delivers | Salary statement |
Employee | delivers | Education, Training, Retraining Document |
Retiree | delivers | Pension 3 Pillar |
Class Index | Class | Number of Documents (in the Test Split) | F1 Score | Accuracy | Precision | Recall |
---|---|---|---|---|---|---|
0 | InsuranceBenefitStatement_dataset | 514 | 100% | 100% | 100% | 100% |
1 | InsurancePremiums_dataset | 548 | 100% | 100% | 100% | 100% |
2 | 3thPillarContributionDeclaration_dataset | 248 | 100% | 100% | 100% | 100% |
3 | AVSIncome Certification_dataset | 17 | 53.96% | 99.04% | 36.95% | 100% |
4 | SalaryStatement_dataset | 29 | 0% | 99.04% | 0% | 0% |
5 | TaxDeclaration_dataset | 63 | 100% | 100% | 100% | 100% |
6 | BankClosingDocument_dataset | 70 | 100% | 100% | 100% | 100% |
7 | HealthInsurance_dataset | 70 | 100% | 100% | 100% | 100% |
8 | 2ndPillarPensionDocument_dataset | 49 | 100% | 100% | 100% | 100% |
9 | OTHER_CLASSES | 1415 | 100% | 100% | 100% | 100% |
Class Index | Class | Number of Documents (in the Test Split) | F1 Score | Accuracy | Precision | Recall |
---|---|---|---|---|---|---|
0 | InsuranceBenefitStatement_dataset | 4 | 100% | 100% | 100% | 100% |
1 | InsurancePremiums_dataset | 1 | 100% | 100% | 100% | 100% |
2 | 3thPillarContributionDeclaration_dataset | 3 | 0% | 88% | 0% | 0% |
3 | SalaryStatOrAVSIncomeCert_dataset | 10 | 0% | 60% | 0% | 0% |
4 | TaxDeclaration_dataset | 5 | 100% | 100% | 100% | 100% |
5 | BankClosingDocument_dataset | 2 | 18.18% | 64% | 11.11% | 50% |
6 | AHealthInsurance | 0 | - | - | - | - |
7 | 2ndPillarPensionDocument_dataset | 0 | - | - | - | - |
8 | OTHER_CLASSES | 0 | - | - | - | - |
Class Index | Class | Number of Documents (in the Test Split) | F1 Score | Accuracy | Precision | Recall |
---|---|---|---|---|---|---|
0 | InsuranceBenefitStatement_dataset | 4 | 100% | 100% | 100% | 100% |
1 | InsurancePremiums_dataset | 1 | 100% | 100% | 100% | 100% |
2 | 3thPillarContributionDeclaration_dataset | 3 | 50% | 92% | 100% | 33.33% |
3 | SalaryStatOrAVSIncomeCert_dataset | 10 | 94.73% | 96% | 100% | 90% |
4 | TaxDeclaration_dataset | 5 | 100% | 100% | 100% | 100% |
5 | BankClosingDocument_dataset | 2 | 66.66% | 92% | 50% | 100% |
6 | HealtInsurance_dataset | 0 | - | - | - | - |
7 | 2ndPillarPensionDocument_dataset | 0 | - | - | - | - |
8 | UNIGE_cards | 0 | - | - | - | - |
9 | NIST_TAX 1040_1 | 0 | - | - | - | - |
... | ... | ... | ... | ... | ... | ... |
28 | NIST_TAX_se_2 | 0 | - | - | - | - |
Experiment | F1 Score per Class | Accuracy per Class | Precision per Class | Recall per Class | Macro-F1 Score | Average F1 Score | Micro-F1 Score | Average Accuracy |
---|---|---|---|---|---|---|---|---|
First Experiment | ✓ | ✓ | ✓ | ✓ | 86.73% | 85.39% | 99.04% | 99.80% |
Second Experiment | ✓ | ✓ | ✓ | ✓ | 92.17% | 92.55% | 96.00% | 96.15% |
Third Experiment | ✓ | ✓ | ✓ | ✓ | 92.17% | 92.55% | 96.00% | 96.15% |
Class | No. Instances |
---|---|
AVS Contributions and Others | 2 |
Gross salary | 2 |
Individuals | 8 |
Insurance | 6 |
Salary Certificate | 2 |
TaxHousehold | 3 |
Evaluation Criteria | Description | Methods and Tools Used | Result |
---|---|---|---|
Data Compliance with Semantic Rules | Validation of RDF data against SHACL to ensure that the data conform to the desired requirements. | SHACL validation engine, SHACL shape statements | 33 non-conformities, such as missing employee entry/exit dates and incorrect AVS format. |
Effectiveness of Inference Rules | Evaluated rules to generate relevant and accurate inferences from data, including multi-labelling, direct and inverse rules. | Semantic reasoning engine in TopBraid | Inference including document classification, user profiling and document presentation requirements. |
Manual Evaluation of Reasoning Engine Results | Manual examination of reported errors and the correctness of the inferences to verify the accuracy and legitimacy of the semantic rules and RDF data. | Manual inspection and verification | Confirmed accuracy of compliance errors, correct classification of individuals and document requirements. |
Multi-Labelling Rules | Evaluate the system’s ability to assign multiple categories to document instances based on the embedded data. | Multi-labelling rules in TopBraid | 14 inferences, classifying documents such as health insurance and salary certificates. |
Direct Rules | Evaluate direct relationships between documents and user profiles, inferring profile information based on submitted documents. | Direct Rules in TopBraid | 2 inferences, such as classifying Mrs. Zola Giovanna as an employee. |
Inverse Rules | Evaluate inverse relations where the user’s profile information determines the requested documents. | Inverse Rules in TopBraid | 12 inferences, requesting documents based on the user’s declared status. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Di Marzo Serugendo, G.; Cappelli, M.A.; Falquet, G.; Métral, C.; Wade, A.; Ghadfi, S.; Cutting-Decelle, A.-F.; Caselli, A.; Cutting, G. Streamlining Tax and Administrative Document Management with AI-Powered Intelligent Document Management System. Information 2024, 15, 461. https://doi.org/10.3390/info15080461
Di Marzo Serugendo G, Cappelli MA, Falquet G, Métral C, Wade A, Ghadfi S, Cutting-Decelle A-F, Caselli A, Cutting G. Streamlining Tax and Administrative Document Management with AI-Powered Intelligent Document Management System. Information. 2024; 15(8):461. https://doi.org/10.3390/info15080461
Chicago/Turabian StyleDi Marzo Serugendo, Giovanna, Maria Assunta Cappelli, Gilles Falquet, Claudine Métral, Assane Wade, Sami Ghadfi, Anne-Françoise Cutting-Decelle, Ashley Caselli, and Graham Cutting. 2024. "Streamlining Tax and Administrative Document Management with AI-Powered Intelligent Document Management System" Information 15, no. 8: 461. https://doi.org/10.3390/info15080461
APA StyleDi Marzo Serugendo, G., Cappelli, M. A., Falquet, G., Métral, C., Wade, A., Ghadfi, S., Cutting-Decelle, A. -F., Caselli, A., & Cutting, G. (2024). Streamlining Tax and Administrative Document Management with AI-Powered Intelligent Document Management System. Information, 15(8), 461. https://doi.org/10.3390/info15080461