Tarride et al., 2023 - Google Patents
Large-scale genealogical information extraction from handwritten Quebec parish recordsTarride et al., 2023
View PDF- Document ID
- 3087314192841217172
- Author
- Tarride S
- Maarand M
- Boillet M
- McGrath J
- Capel E
- Vézina H
- Kermorvant C
- Publication year
- Publication venue
- International Journal on Document Analysis and Recognition (IJDAR)
External Links
Snippet
This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family information highly valuable for genetic, demographic and social studies of the Quebec …
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30289—Database design, administration or maintenance
- G06F17/30303—Improving data quality; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30908—Information retrieval; Database structures therefor; File system structures therefor of semistructured data, the undelying structure being taken into account, e.g. mark-up language structure data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/72—Methods or arrangements for recognition using electronic means using context analysis based on the provisionally recognized identity of a number of successive patterns, e.g. a word
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00442—Document analysis and understanding; Document recognition
- G06K9/00469—Document understanding by extracting the logical structure, e.g. chapters, sections, columns, titles, paragraphs, captions, page number, and identifying its elements, e.g. author, keywords, ZIP code, money amount
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Baviskar et al. | Efficient automated processing of the unstructured documents using artificial intelligence: A systematic literature review and future directions | |
Jung | Semantic vector learning for natural language understanding | |
Barman et al. | Combining visual and textual features for semantic segmentation of historical newspapers | |
CN113806563B (en) | Architect knowledge graph construction method for multi-source heterogeneous building humanistic historical material | |
CN109460552B (en) | Method and equipment for automatically detecting Chinese language diseases based on rules and corpus | |
Bowern | Chirila: Contemporary and historical resources for the indigenous languages of Australia | |
Tarride et al. | Large-scale genealogical information extraction from handwritten Quebec parish records | |
Fu et al. | Automatic record linkage of individuals and households in historical census data | |
Wang et al. | A probabilistic address parser using conditional random fields and stochastic regular grammar | |
Toselli et al. | Transcribing a 17th-century botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription | |
CN113377916A (en) | Extraction method of main relations in multiple relations facing legal text | |
Sarkhel et al. | Improving information extraction from visually rich documents using visual span representations | |
Monroc et al. | A comprehensive study of open-source libraries for named entity recognition on handwritten historical documents | |
Thorvaldsen et al. | A tale of two transcriptions. Machine-assisted transcription of historical sources | |
Cristea et al. | Data Structure and Acquisition in DeLORo–a Technology for Deciphering Old Cyrillic-Romanian Documents | |
Heidorn et al. | Automatic metadata extraction from museum specimen labels | |
Besagni et al. | Citation recognition for scientific publications in digital libraries | |
Tual et al. | A benchmark of nested named entity recognition approaches in historical structured documents | |
Tarride et al. | SIMARA: A Database for Key-Value Information Extraction from Full-Page Handwritten Documents | |
Hirayama et al. | Development of template-free form recognition system | |
Klampfl et al. | Reconstructing the logical structure of a scientific publication using machine learning | |
Sarkar et al. | A memory-based learning approach for named entity recognition in Hindi | |
Boillet et al. | The Socface Project: Large-Scale Collection, Processing, and Analysis of a Century of French Censuses | |
CN114881015A (en) | Event extraction method and system for multi-modal financial document | |
Mahadevkar et al. | Exploring AI-driven approaches for unstructured document analysis and future horizons |