Banasiak et al., 2018 - Google Patents
Extended N-Gram Model for Analysis of Polish TextsBanasiak et al., 2018
- Document ID
- 5800075568816895911
- Author
- Banasiak D
- Mierzwa J
- Sterna A
- Publication year
- Publication venue
- Man-Machine Interactions 5: 5th International Conference on Man-Machine Interactions, ICMMI 2017 Held at Kraków, Poland, October 3-6, 2017
External Links
Snippet
The paper presents extended N-gram model designed for analysis of texts in Polish language. One of possible applications of the model is automatic detection and correction of errors that occur during computerized text edition. N-grams belong to the group of statistical …
- 238000004458 analytical method 0 title abstract description 30
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/277—Lexical analysis, e.g. tokenisation, collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/22—Manipulating or registering by use of codes, e.g. in sequence of text characters
- G06F17/2217—Character encodings
- G06F17/2223—Handling non-latin characters, e.g. kana-to-kanji conversion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2705—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2765—Recognition
- G06F17/2775—Phrasal analysis, e.g. finite state techniques, chunking
- G06F17/278—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30684—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2809—Data driven translation
- G06F17/2827—Example based machine translation; Alignment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2785—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2872—Rule based translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/24—Editing, e.g. insert/delete
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/2863—Processing of non-latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/27—Automatic analysis, e.g. parsing
- G06F17/2795—Thesaurus; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/21—Text processing
- G06F17/211—Formatting, i.e. changing of presentation of document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
- G06F17/28—Processing or translating of natural language
- G06F17/289—Use of machine translation, e.g. multi-lingual retrieval, server side translation for client devices, real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Amjad et al. | “Bend the truth”: Benchmark dataset for fake news detection in Urdu language and its evaluation | |
Şeker et al. | Initial explorations on using CRFs for Turkish named entity recognition | |
Sawalha et al. | SALMA: standard Arabic language morphological analysis | |
Kruczek et al. | Are n-gram categories helpful in text classification? | |
Jabbar et al. | An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach | |
Cing et al. | Improving accuracy of part-of-speech (POS) tagging using hidden markov model and morphological analysis for Myanmar Language | |
Parupalli et al. | Bcsat: A benchmark corpus for sentiment analysis in telugu using word-level annotations | |
Zhang et al. | PKU paraphrase bank: A sentence-level paraphrase corpus for Chinese | |
Wong et al. | iSentenizer‐μ: Multilingual Sentence Boundary Detection Model | |
Chizhikova et al. | Multilingual case-insensitive named entity recognition | |
Onyenwe et al. | Toward an effective igbo part-of-speech tagger | |
Alian et al. | Syntactic-semantic similarity based on dependency tree kernel | |
Mokanarangan et al. | Tamil morphological analyzer using support vector machines | |
Rezai et al. | FarsiTag: A part-of-speech tagging system for Persian | |
Tukeyev et al. | Universal programs for stemming, segmentation, morphological analysis of Turkic words | |
Kapočiūtė-Dzikienė et al. | A comparison of Lithuanian morphological analyzers | |
Al_Janabi et al. | Pragmatic text mining method to find the topics of citation network | |
Nguyen-Son et al. | Detecting machine-translated paragraphs by matching similar words | |
Aejas et al. | Named entity recognition for cultural heritage preservation | |
Myint et al. | Disambiguation using joint entropy in part of speech of written Myanmar text | |
Banasiak et al. | Extended N-Gram Model for Analysis of Polish Texts | |
Reentovich et al. | The first one-million corpus for the Belarusian NooJ module | |
Almujaiwel | Grammatical construction of function words between old and modern written Arabic: A corpus-based analysis | |
Turki Khemakhem et al. | POS tagging without a tagger: using aligned corpora for transferring knowledge to under-resourced languages | |
Worku et al. | Amharic fake news detection on social media using feature fusion |