|
TOPICS: Browse articles of the conference sorted by topic
A - C - D - E - G - H - I - K - L - M - N - O - P - Q - S - T - U - V - W
C |
Cognitive methods |
Pedagogical stances and their multimodal signals.
Word Sense Inventories by Non-Experts.
Pursing power in Arabic on-line discussion forums
Reclassifying subcategorization frames for experimental analysis and stimulus generation
Assigning Connotation Values to Events
A Repository of Rules and Lexical Resources for Discourse Structure Analysis: the Case of Explanation Structures
From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information
LIE: Leadership, Influence and Expertise
A large scale annotated child language construction database
Sense Meets Nonsense - Sense Meets Nonsense - a dual-layer Danish speech corpus for perception studies
German and English Treebanks and Lexica for Tree-Adjoining Grammars
Is it Useful to Support Users with Lexical Resources? A User Study.
Evaluating Hebbian Self-Organizing Memories for Lexical Representation and Access
Corpus Annotation as a Scientific Task
Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research
Polish Multimodal Corpus ― a collection of referential gestures
|
Controlled languages |
Risk Analysis and Prevention: LELIE, a Tool dedicated to Procedure and Requirement Authoring
Conventional Orthography for Dialectal Arabic
English to Indonesian Transliteration to Support English Pronunciation Practice
CLCM - A Linguistic Resource for Effective Simplification of Instructions in the Crisis Management Domain and its Evaluations
|
Corpus (creation, annotation, etc.) |
Alignment-based reordering for SMT
Annotations for Power Relations on Email Threads
Foundations of a Multilayer Annotation Framework for Twitter Communications During Crisis Events
An audiovisual political speech analysis incorporating eye-tracking and perception data
Word Sense Inventories by Non-Experts.
PAMOCAT: Automatic retrieval of specified postures
Constructing a Question Corpus for Textual Semantic Relations
Matching Cultural Heritage items to Wikipedia
ATLIS: Identifying Locational Information in Text Automatically
A Speech and Gesture Spatial Corpus in Assisted Living
3rd party observer gaze as a continuous measure of dialogue flow
Project FLY: a multidisciplinary project within Linguistics
Pursing power in Arabic on-line discussion forums
The Dependency-Parsed FrameNet Corpus
Incorporating an Error Corpus into a Spellchecker for Maltese
Building a 70 billion word corpus of English from ClueWeb
Semantic Annotations in Japanese FrameNet: Comparing Frames in Japanese and English
Buildind a Resource of Patterns Using Semantic Types
A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis
Causal analysis of task completion errors in spoken music retrieval interactions
Investigating Engagement - intercultural and technological aspects of the collection, analysis, and use of the Estonian Multiparty Conversational video data
Corpus+WordNet thesaurus generation for ontology enriching
LDC Forced Aligner
Assessing the Comparability of News Texts
Boosting statistical tagger accuracy with simple rule-based grammars
A New Twitter Verb Lexicon for Natural Language Processing
A Corpus for Research on Deliberation and Debate
Annotating progressive aspect constructions in the spoken section of the British National Corpus
Annotating Spatial Containment Relations Between Events
Annotating Agreement and Disagreement in Threaded Discussion
NeoTag: a POS Tagger for Grammatical Neologism Detection
Using DiAML and ANVIL for multimodal dialogue annotations
Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner
An Annotated Corpus of Film Dialogue for Learning and Characterizing Character Style
SPPAS: a tool for the phonetic segmentation of speech
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
A Phonemic Corpus of Polish Child-Directed Speech
The KIT Lecture Corpus for Speech Translation
Orthographic Transcription: which enrichment is required for phonetization?
The Role of Model Testing in Standards Development: The Case of ISO-Space
Automatic Speech Recognition on a Firefighter TETRA Broadcast Channel
Ubiquitous Usage of a Broad Coverage French Corpus: Processing the Est Republicain corpus
A High-Quality Web Corpus of Czech
QurAna: Corpus of the Quran annotated with Pronominal Anaphora
MLSA ― A Multi-layered Reference Corpus for German Sentiment Analysis
Versatile Speech Databases for High Quality Synthesis for Basque
A Gold Standard for Relation Extraction in the Food Domain
WebAnnotator, an Annotation Tool for Web Pages
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
Automatic Annotation and Manual Evaluation of the Diachronic German Corpus TüBa-D/DC
Grammatical Error Annotation for Korean Learners of Spoken English
The Use of Parallel and Comparable Data for Analysis of Abstract Anaphora in German and English
Light Verb Constructions in the SzegedParalellFX English--Hungarian Parallel Corpus
CoALT: A Software for Comparing Automatic Labelling Tools
Balanced data repository of spontaneous spoken Czech
The coding and annotation of multimodal dialogue acts
DutchSemCor: Targeting the ideal sense-tagged corpus
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
QurSim: A corpus for evaluation of relatedness in short texts
Automatic annotation of head velocity and acceleration in Anvil
Building a multilingual parallel corpus for human users
EmpaTweet: Annotating and Detecting Emotions on Twitter
The BladeMistress Corpus: From Talk to Action in Virtual Worlds
Event Nominals: Annotation Guidelines and a Manually Annotated Corpus in French
The Parallel-TUT: a multilingual and multiformat treebank
AnIta: a powerful morphological analyser for Italian
CAT: the CELCT Annotation Tool
ROMBAC: The Romanian Balanced Annotated Corpus
A French Fairy Tale Corpus syntactically and semantically annotated
ConanDoyle-neg: Annotation of negation cues and their scope in Conan Doyle stories
GerNED: A German Corpus for Named Entity Disambiguation
A voting scheme to detect semantic underspecification
Interplay of Coreference and Discourse Relations: Discourse Connectives with a Referential Component
Robust clause boundary identification for corpus annotation
NKI-CCRT Corpus - Speech Intelligibility Before and After Advanced Head and Neck Cancer Treated with Concomitant Chemoradiotherapy
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web
Making Ellipses Explicit in Dependency Conversion for a German Treebank
Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing
Identifying equivalents of specialized verbs in a bilingual comparable corpus of judgments: A frame-based methodology
TimeBankPT: A TimeML Annotated Corpus of Portuguese
Korp ― the corpus infrastructure of Spräkbanken
Further Developments in Treebank Error Detection Using Derivation Trees
MULTIPHONIA: a MULTImodal database of PHONetics teaching methods in classroom InterActions.
Discourse-level Annotation over Europarl for Machine Translation: Connectives and Pronouns
Logical metonymies and qualia structures: an annotated database of logical metonymies for German
HunOr: A Hungarian―Russian Parallel Corpus
Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish
A Cross-Lingual Dictionary for English Wikipedia Concepts
Language Richness of the Web
Feature Discovery for Diachronic Register Analysis: a Semi-Automatic Approach
DSim, a Danish Parallel Corpus for Text Simplification
Propbank-Br: a Brazilian Treebank annotated with semantic role labels
A Universal Part-of-Speech Tagset
A large scale annotated child language construction database
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
Multimodal Corpus of Multi-party Conversations in Second Language
A Curated Database for Linguistic Research: The Test Case of Cimbrian Varieties
Experiences in Resource Generation for Machine Translation through Crowdsourcing
Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis
AVATecH ― automated annotation through audio and video analysis
An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora
Iula2Standoff: a tool for creating standoff documents for the IULACT
Temporal Annotation: A Proposal for Guidelines and an Experiment with Inter-annotator Agreement
Introducing the Reference Corpus of Contemporary Portuguese Online
Rule-Based Detection of Clausal Coordinate Ellipsis
Evolution of Event Designation in Media: Preliminary Study
Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages
Sense Meets Nonsense - Sense Meets Nonsense - a dual-layer Danish speech corpus for perception studies
The acquisition and dialog act labeling of the EDECAN-SPORTS corpus
A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let's Go Bus Information System
An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
A Basic Language Resource Kit for Persian
Re-ordering Source Sentences for SMT
Extended Named Entities Annotation on OCRed Documents: From Corpus Constitution to Evaluation Campaign
Joint Grammar and Treebank Development for Mandarin Chinese with HPSG
A tree is a Baum is an árbol is a sach'a: Creating a trilingual treebank
Investigating Verbal Intelligence Using the TF-IDF Approach
Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach
An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT
SMALLWorlds -- Multilingual Content-Controlled Monologues
A database of semantic clusters of verb usages
Annotating dropped pronouns in Chinese newswire text
Building a Corpus of Indefinite Uses Annotated with Fine-grained Semantic Functions
Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics
The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
Annotating Story Timelines as Temporal Dependency Structures
A PropBank for Portuguese: the CINTIL-PropBank
DeCour: a corpus of DEceptive statements in Italian COURts
Irish Treebanking and Parsing: A Preliminary Evaluation
Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
Dysarthric Speech Database for Development of QoLT Software Technology
CLTC: A Chinese-English Cross-lingual Topic Corpus
Improving corpus annotation productivity: a method and experiment with interactive tagging
Semantic Relations Established by Specialized Processes Expressed by Nouns and Verbs: Identification in a Corpus by means of Syntactico-semantic Annotation
The Language Library: supporting community effort for collective resource production
The Australian National Corpus: National Infrastructure for Language Resources
A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
ELAN development, keeping pace with communities' needs
Revealing Contentious Concepts Across Social Groups
Cost and Benefit of Using WordNet Senses for Sentiment Analysis
Rembrandt - a named-entity recognition framework
Texto4Science: a Quebec French Database of Annotated Short Text Messages
Collecting and Analysing Chats and Tweets in SoNaR
Prediction of Non-Linguistic Information of Spontaneous Speech from the Prosodic Annotation: Evaluation of the X-JToBI system
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
HamleDT: To Parse or Not to Parse?
The Icelandic Parsed Historical Corpus (IcePaHC)
Empty Argument Insertion in the Hindi PropBank
A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
The goo300k corpus of historical Slovene
Inforex -- a web-based tool for text corpus management and semantic annotation
A new semantically annotated corpus with syntactic-semantic and cross-lingual senses
Massively Increasing TIMEX3 Resources: A Transduction Approach
Prague Dependency Style Treebank for Tamil
A Distributed Resource Repository for Cloud-Based Machine Translation
Treebanking by Sentence and Tree Transformation: Building a Treebank to support Question Answering in Portuguese
Parallel Data, Tools and Interfaces in OPUS
SemSim: Resources for Normalized Semantic Similarity Computation Using Lexical Networks
Annotating and Learning Morphological Segmentation of Egyptian Colloquial Arabic
Kitten: a tool for normalizing HTML and extracting its textual content
A Portuguese-Spanish Corpus Annotated for Subject Realization and Referentiality
A Galician Syntactic Corpus with Application to Intonation Modeling
A Reference Dependency Bank for Analyzing Complex Predicates
The Influence of Corpus Quality on Statistical Measurements on Language Resources
Annotating Qualia Relations in Italian and French Complex Nominals
Terra: a Collection of Translation Error-Annotated Corpora
Speech and Language Resources for LVCSR of Russian
Automatic word alignment tools to scale production of manually aligned parallel texts
Developing and evaluating an emergency scenario dialogue corpus
A Framework for Evaluating Text Correction
Large Scale Lexical Analysis
NTUSocialRec: An Evaluation Dataset Constructed from Microblogs for Recommendation Applications in Social Networks
Creation and use of Language Resources in a Question-Answering eHealth System
Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents
Collection of a Large Database of French-English SMT Output Corrections
Announcing Prague Czech-English Dependency Treebank 2.0
First Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis
A Corpus of Spontaneous Multi-party Conversation in Bosnian Serbo-Croatian and British English
French and German Corpora for Audience-based Text Type Classification
The IULA Treebank
Modality in Text: a Proposal for Corpus Annotation
Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues
EXMARaLDA and the FOLK tools ― two toolsets for transcribing and annotating spoken language
A review corpus annotated for negation, speculation and their scope
Developing a large semantically annotated corpus
Collection of a corpus of Dutch SMS
RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Supervised Web-Corpora Building
Analyzing the Impact of Prevalence on the Evaluation of a Manual Annotation Campaign
LAST MINUTE: a Multimodal Corpus of Speech-based User-Companion Interactions
Semantic annotation of French corpora: animacy and verb semantic classes
A contrastive review of paraphrase acquisition techniques
Expanding Arabic Treebank to Speech: Results from Broadcast News
Typing Race Games as a Method to Create Spelling Error Corpora
A Search Tool for FrameNet Constructicon
Corpus Annotation as a Scientific Task
DBpedia: A Multilingual Cross-domain Knowledge Base
Designing a search interface for a Spanish learner spoken corpus: the end-user's evaluation
Design and compilation of a specialized Spanish-German parallel corpus
Conventional Orthography for Dialectal Arabic
The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser
Bulgarian X-language Parallel Corpus
The MASC Word Sense Corpus
A Multilingual Natural Stress Emotion Database
The Twins Corpus of Museum Visitor Questions
Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information
Medical Term Extraction in an Arabic Medical Corpus
Annotation of response tokens and their triggering expressions in Japanese multi-party conversations
Method for Collection of Acted Speech Using Various Situation Scripts
The Minho Quotation Resource
Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research
Morphosyntactic Analysis of the CHILDES and TalkBank Corpora
Challenges in the development of annotated corpora of computer-mediated communication in Indian Languages: A Case of Hindi
Annotating Football Matches: Influence of the Source Medium on Manual Annotation
The C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese
Coreference in Spoken vs. Written Texts: a Corpus-based Analysis
Towards Fully Automatic Annotation of Audio Books for TTS
Centroids: Gold standards with distributional variation
Multimodal Behaviour and Feedback in Different Types of Interaction
Multimedia database of the cultural heritage of the Balkans
ANALEC: a New Tool for the Dynamic Annotation of Textual Data
Feedback in Nordic First-Encounters: a Comparative Study
Annotating Opinions in German Political News
MultiUN v2: UN Documents with Multilingual Alignments
The CONCISUS Corpus of Event Summaries
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
The Joy of Parallelism with CzEng 1.0
LAMP: A Multimodal Web Platform for Collaborative Linguistic Analysis
The Polish Sejm Corpus
Creating and Curating a Cross-Language Person-Entity Linking Collection
A corpus of general and specific sentences from news
Annotation Trees: LDC's customizable, extensible, scalable, annotation infrastructure
Irony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing
YADAC: Yet another Dialectal Arabic Corpus
Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification
Annotating Near-Identity from Coreference Disagreements
The REX corpora: A collection of multimodal corpora of referring expressions in collaborative problem solving dialogues
Brand Pitt: A Corpus to Explore the Art of Naming
Evaluating automatic cross-domain Dutch semantic role annotation
Syntactic annotation of spontaneous speech: application to call-center conversation data
Korean Children's Spoken English Corpus and an Analysis of its Pronunciation Variability
DECODA: a call-centre human-human spoken conversation corpus
The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction
Intelligibility assessment in forensic applications
Spontaneous Speech Corpora for language learners of Spanish, Chinese and Japanese
TED-LIUM: an Automatic Speech Recognition dedicated corpus
The SYNC3 Collaborative Annotation Tool
Automatic Translation of Scientific Documents in the HAL Archive
The REPERE Corpus : a multimodal corpus for person recognition
Efficient Dependency Graph Matching with the IMS Open Corpus Workbench
Croatian Dependency Treebank: Recent Development and Initial Experiments
A Treebank-driven Creation of an OntoValence Verb lexicon for Bulgarian
Customization of the Europarl Corpus for Translation Studies
A Parallel Corpus of Music and Lyrics Annotated with Emotions
Creation of a bottom-up corpus-based ontology for Italian Linguistics
Rhetorical Move Detection in English Abstracts: Multi-label Sentence Classifiers and their Annotated Corpora
Polish Multimodal Corpus ― a collection of referential gestures
DEGELS1: A comparable corpus of French Sign Language and co-speech gestures
Semi-Automatic Sign Language Corpora Annotation using Lexical Representations of Signs
Expanding Parallel Resources for Medium-Density Languages for Free
Beyond SoNaR: towards the facilitation of large corpus building efforts
A GUI to Detect and Correct Errors in Hindi Dependency Treebank
Iterative Refinement and Quality Checking of Annotation Guidelines ― How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena
Annotating Factive Verbs
Annotating Errors in a Hungarian Learner Corpus
Specifying Treebanks, Outsourcing Parsebanks: FinnTreeBank 3
Romanian TimeBank: An Annotated Parallel Corpus for Temporal Information
Chinese Whispers: Cooperative Paraphrase Acquisition
The Nordic Dialect Corpus
The WeSearch Corpus, Treebank, and Treecache -- A Comprehensive Sample of User-Generated Content
Yes we can!? Annotating English modal verbs
Building a Multimodal Laughter Database for Emotion Recognition
Towards Emotion and Affect Detection in the Multimodal LAST MINUTE Corpus
Rapid creation of large-scale corpora and frequency dictionaries
Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora
Prosomarker: a prosodic analysis tool based on optimal pitch stylization and automatic syllabification
METU Turkish Discourse Bank Browser
Evaluating Multi-focus Natural Language Queries over Data Services
Development and Application of a Cross-language Document Comparability Metric
Document Attrition in Web Corpora: an Exploration
A Repository of Data and Evaluation Resources for Natural Language Generation
The Quaero Evaluation Initiative on Term Extraction
Italian and Spanish Null Subjects. A Case Study Evaluation in an MT Perspective.
DGT-TM: A freely available Translation Memory in 22 languages
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
A Tool/Database Interface for Multi-Level Analyses
A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty
New language resources for the Pashto language
Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
CALBC: Releasing the Final Corpora
Getting more data -- Schoolkids as annotators
Building Large Corpora from the Web Using a New Efficient Tool Chain
RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus
Annotated Bibliographical Reference Corpora in Digital Humanities
Corpus of Children Voices for Mid-level Markers and Affect Bursts Analysis
Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.
The I3MEDIA speech database: a trilingual annotated corpus for the analysis and synthesis of emotional speech
DramaBank: Annotating Agency in Narrative Discourse
JRC Eurovoc Indexer JEX - A freely available multi-label categorisation tool
Designing French Tale Corpora for Entertaining Text To Speech Synthesis
Le Petit Prince in UNL
Creating HAVIC: Heterogeneous Audio Visual Internet Collection
Multi-Layer Discourse Annotation of a Dutch Text Corpus
The Language Archive ― a new hub for language resources
A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
Evaluation of Discourse Relation Annotation in the Hindi Discourse Relation Bank
From Grammar Rule Extraction to Treebanking: A Bootstrapping Approach
ULex: new data models and a mobile environment for corpus enrichment.
UniDic for Early Middle Japanese: a Dictionary for Morphological Analysis of Classical Japanese
An Annotation Scheme for Quantifier Scope Disambiguation
A generic formalism to represent linguistic corpora in RDF and OWL/DL
A Concise Query Language with Search and Transform Operations for Corpora with Multiple Levels of Annotation
Large aligned treebanks for syntax-based machine translation
Collecting and Using Comparable Corpora for Statistical Machine Translation
The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language
Clause-based Discourse Segmentation of Arabic Texts
Building Japanese Predicate-argument Structure Corpus using Lexical Conceptual Structure
An Examination of Cross-Cultural Similarities and Differences from Social Media Data with respect to Language Use
Latvian and Lithuanian Named Entity Recognition with TildeNER
Collecting humorous expressions from a community-based question-answering-service corpus
The Political Speech Corpus of Bulgarian
A Database of Attribution Relations
A Mandarin-English Code-Switching Corpus
KPWr: Towards a Free Corpus of Polish
Structural alignment of plain text books
Turkish Paraphrase Corpus
Resource Evaluation for Usable Speech Interfaces: Utilizing Human-Human Dialogue
GATEtoGerManC: A GATE-based Annotation Pipeline for Historical German
PET: a Tool for Post-editing and Assessing Machine Translation
Enriching the ISST-TANL Corpus with Semantic Frames
Construction of the Turkish National Corpus (TNC)
Building a learner corpus
|
I |
Information Extraction, Information Retrieval |
Foundations of a Multilayer Annotation Framework for Twitter Communications During Crisis Events
Corpus based Semi-Automatic Extraction of Persian Compound Verbs and their Relations
Statistical Section Segmentation in Free-Text Clinical Records
NLP Challenges for Eunomos a Tool to Build and Manage Legal Knowledge
Tree-Structured Named Entity Recognition on OCR Data: Analysis, Processing and Results
Dependency parsing for interaction detection in pharmacogenomics
Buildind a Resource of Patterns Using Semantic Types
A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
A New Twitter Verb Lexicon for Natural Language Processing
Annotating Spatial Containment Relations Between Events
Aleda, a free large-scale entity database for French
Automatic Speech Recognition on a Firefighter TETRA Broadcast Channel
Large Scale Semantic Annotation, Indexing and Search at The National Archives
TIMEN: An Open Temporal Expression Normalisation Resource
A Gold Standard for Relation Extraction in the Food Domain
Learning Categories and their Instances by Contextual Features
Textual Characteristics for Language Engineering
A Survey of Text Mining Architectures and the UIMA Standard
Automatic annotation of head velocity and acceleration in Anvil
Detecting Reduplication in Videos of American Sign Language
EmpaTweet: Annotating and Detecting Emotions on Twitter
EVALIEX ― A Proposal for an Extended Evaluation Methodology for Information Extraction Systems
Event Nominals: Annotation Guidelines and a Manually Annotated Corpus in French
Task-Driven Linguistic Analysis based on an Underspecified Features Representation
GerNED: A German Corpus for Named Entity Disambiguation
Distractorless Authorship Verification
Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing
Challenges in the Knowledge Base Population Slot Filling Task
A Cross-Lingual Dictionary for English Wikipedia Concepts
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
SUTime: A library for recognizing and normalizing time expressions
AVATecH ― automated annotation through audio and video analysis
Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese
Evolution of Event Designation in Media: Preliminary Study
Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries
A good space: Lexical predictors in word space evaluation
Semi-Supervised Technical Term Tagging With Minimal User Feedback
Relating Dominance of Dialogue Participants with their Verbal Intelligence Scores
Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach
Expertise Mining for Enterprise Content Management
Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
CLTC: A Chinese-English Cross-lingual Topic Corpus
Revealing Contentious Concepts Across Social Groups
Temporal Tagging on Different Domains: Challenges, Strategies, and Gold Standards
Constructing Large Proposition Databases
Semantic Role Labeling with the Swedish FrameNet
A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content
Identifying Nuggets of Information in GALE Distillation Evaluation
Creation and use of Language Resources in a Question-Answering eHealth System
Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks
The SERENOA Project: Multidimensional Context-Aware Adaptation of Service Front-Ends
An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction
Federated Search: Towards a Common Search Infrastructure
A review corpus annotated for negation, speculation and their scope
Evaluating the Impact of Phrase Recognition on Concept Tagging
Evaluation of Unsupervised Information Extraction
Extraction of unmarked quotations in Newspapers
Ontoterminology: How to unify terminology and ontology into a single paradigm
Págico: Evaluating Wikipedia-based information retrieval in Portuguese
Addressing polysemy in bilingual lexicon extraction from comparable corpora
Applying Random Indexing to Structured Data to Find Contextually Similar Words
The CONCISUS Corpus of Event Summaries
Building and Exploring Semantic Equivalences Resources
Creating and Curating a Cross-Language Person-Entity Linking Collection
The TARSQI Toolkit
YADAC: Yet another Dialectal Arabic Corpus
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
From medical language processing to BioNLP domain
Designing an Evaluation Framework for Spoken Term Detection and Spoken Document Retrieval at the NTCIR-9 SpokenDoc Task
Effects of Document Clustering in Modeling Wikipedia-style Term Descriptions
Evaluation of a Complex Information Extraction Application in Specific Domain
The WeSearch Corpus, Treebank, and Treecache -- A Comprehensive Sample of User-Generated Content
Evaluating Query Languages for a Corpus Processing System
Identification of Manner in Bio-Events
A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty
Corpus of Children Voices for Mid-level Markers and Affect Bursts Analysis
Creating a Data Collection for Evaluating Rich Speech Retrieval
A hierarchical approach with feature selection for emotion recognition from speech
Combining Formal Concept Analysis and semantic information for building ontological structures from texts : an exploratory study
Collecting humorous expressions from a community-based question-answering-service corpus
Structural alignment of plain text books
Dealing with unknown words in statistical machine translation
|
L |
Language Identification |
Language Richness of the Web
FreeLing 3.0: Towards Wider Multilinguality
KALAKA-2: a TV Broadcast Speech Database for the Recognition of Iberian Languages in Clean and Noisy Environments
Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
Using the International Standard Language Resource Number: Practical and Technical Aspects
An Analytical Model of Language Resource Sustainability
|
Language modelling |
An Open Source Persian Computational Grammar
Boosting statistical tagger accuracy with simple rule-based grammars
MLSA ― A Multi-layered Reference Corpus for German Sentiment Analysis
Measuring Interlanguage: Native Language Identification with L1-influence Metrics
DISLOG: A logic-based language for processing discourse structures
Corpus-based Referring Expressions Generation
Portuguese Text Generation from Large Corpora
LIE: Leadership, Influence and Expertise
Item Development and Scoring for Japanese Oral Proficiency Testing
Using Verb Subcategorization for Word Sense Disambiguation
Rule-Based Detection of Clausal Coordinate Ellipsis
Sense Meets Nonsense - Sense Meets Nonsense - a dual-layer Danish speech corpus for perception studies
Concept-based Selectional Preferences and Distributional Representations from Wikipedia Articles
Word Alignment for English-Turkish Language Pair
Dbnary: Wiktionary as a LMF based Multilingual RDF network
Improving corpus annotation productivity: a method and experiment with interactive tagging
The goo300k corpus of historical Slovene
Unsupervised acquisition of concatenative morphology
Speech and Language Resources for LVCSR of Russian
Arabic Word Generation and Modelling for Spell Checking
Multimodal Behaviour and Feedback in Different Types of Interaction
Feedback in Nordic First-Encounters: a Comparative Study
Suffix Trees as Language Models
The Romanian Neuter Examined Through A Two-Gender N-Gram Classification System
Spell Checking for Chinese
Semi-Automatic Sign Language Corpora Annotation using Lexical Representations of Signs
Specifying Treebanks, Outsourcing Parsebanks: FinnTreeBank 3
The New IDS Corpus Analysis Platform: Challenges and Prospects
Simplified guidelines for the creation of Large Scale Dialectal Arabic Annotations
Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
Linguistic Analysis Processing Line for Bulgarian
CLIMB grammars: three projects using metagrammar engineering
A platform-independent user-friendly dictionary from Italian to LIS
|
Lexicon, lexical database |
Corpus based Semi-Automatic Extraction of Persian Compound Verbs and their Relations
Word Sense Inventories by Non-Experts.
Building a fine-grained subjectivity lexicon from a web corpus
Building a database of French frozen adverbial phrases
Constraint Based Description of Polish Multiword Expressions
The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language
The Dependency-Parsed FrameNet Corpus
Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques
Semantic Annotations in Japanese FrameNet: Comparing Frames in Japanese and English
Generation of Verbal Stems in Derivationally Rich Language
Corpus+WordNet thesaurus generation for ontology enriching
A New Twitter Verb Lexicon for Natural Language Processing
Learning Sentiment Lexicons in Spanish
Logic Based Methods for Terminological Assessment
NeoTag: a POS Tagger for Grammatical Neologism Detection
Assigning Connotation Values to Events
Aleda, a free large-scale entity database for French
Cleaning noisy wordnets
Wordnet extension made simple: A multilingual lexicon-based approach using wiki resources
Building a Basque-Chinese Dictionary by Using English as Pivot
Mining Sentiment Words from Microblogs for Predicting Writer-Reader Emotion Transition
German Verb Patterns and Their Implementation in an Electronic Dictionary
Bootstrapping Sentiment Labels For Unannotated Documents With Polarity PageRank
Towards an LFG parser for Polish: An exercise in parasitic grammar development
First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin
Representing the Translation Relation in a Bilingual Wordnet
AnIta: a powerful morphological analyser for Italian
Automatic classification of German """"an"""" particle verbs
A Classification of Adjectives for Polarity Lexicons Enhancement
Mapping WordNet synsets to Wikipedia articles
SentiSense: An easily scalable concept-based affective lexicon for sentiment analysis
Identifying equivalents of specialized verbs in a bilingual comparable corpus of judgments: A frame-based methodology
Towards a richer wordnet representation of properties
The open lexical infrastructure of Spräkbanken
Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution
PoliMorf: a (not so) new open morphological dictionary for Polish
Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish
Propbank-Br: a Brazilian Treebank annotated with semantic role labels
Constructing a Class-Based Lexical Dictionary using Interactive Topic Models
Dictionary Look-up with Katakana Variant Recognition
Two Database Resources for Processing Social Media English Text
Multilingual Central Repository version 3.0
A New Method for Evaluating Automatically Learned Terminological Taxonomies
Adding Morpho-semantic Relations to the Romanian Wordnet
The Rocky Road towards a Swedish FrameNet - Creating SweFN
An Empirical Study of the Occurrence and Co-Occurrence of Named Entities in Natural Language Corpora
Vreselijk mooi! (terribly beautiful): A Subjectivity Lexicon for Dutch Adjectives.
A proposal for improving WordNet Domains
Free/Open Source Shallow-Transfer Based Machine Translation for Spanish and Aragonese
Wordnet Based Lexicon Grammar for Polish
A database of semantic clusters of verb usages
Capturing syntactico-semantic regularities among terms: An application of the FrameNet methodology to terminology
Highlighting relevant concepts from Topic Signatures
Extending a wordnet framework for simplicity and scalability
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
Dbnary: Wiktionary as a LMF based Multilingual RDF network
Customizable SCF Acquisition in Italian
Statistical Evaluation of Pronunciation Encoding
Semantic Relations Established by Specialized Processes Expressed by Nouns and Verbs: Identification in a Corpus by means of Syntactico-semantic Annotation
German and English Treebanks and Lexica for Tree-Adjoining Grammars
Texto4Science: a Quebec French Database of Annotated Short Text Messages
Linguistic knowledge for specialized text production
Empty Argument Insertion in the Hindi PropBank
Visualizing Sentiment Analysis on a User Forum
Semantic Role Labeling with the Swedish FrameNet
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
UBY-LMF -- A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF
Applying cross-lingual WSD to wordnet development
Annotating Qualia Relations in Italian and French Complex Nominals
Large Scale Lexical Analysis
Automatic lexical semantic classification of nouns
In the same boat and other idiomatic seafaring expressions
Evaluating and improving syntactic lexica by plugging them within a parser
Collaborative semantic editing of linked data lexica
Extraction of unmarked quotations in Newspapers
Association Norms of German Noun Compounds
Word Sketches for Turkish
The MASC Word Sense Corpus
Representation of linguistic and domain knowledge for second language learning in virtual worlds
Addressing polysemy in bilingual lexicon extraction from comparable corpora
Automatically Generated Online Dictionaries
Automatic Extraction and Evaluation of Arabic LFG Resources
The Minho Quotation Resource
LexIt: A Computational Resource on Italian Argument Structure
Applying Random Indexing to Structured Data to Find Contextually Similar Words
Using semi-experts to derive judgments on word sense alignment: a pilot study
Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification
Knowledge-Rich Context Extraction and Ranking with KnowPipe
The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction
A Treebank-driven Creation of an OntoValence Verb lexicon for Bulgarian
NgramQuery - Smart Information Extraction from Google N-gram using External Resources
Adapting and evaluating a generic term extraction tool
Legal electronic dictionary for Czech
A Corpus of Scientific Biomedical Texts Spanning over 168 Years Annotated for Uncertainty
Visualizing word senses in WordNet Atlas
Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations
Rendering Endangered Lexicons Interoperable through Standards Harmonization: the RELISH project
Empirical Comparisons of MASC Word Sense Annotations
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation
ULex: new data models and a mobile environment for corpus enrichment.
UniDic for Early Middle Japanese: a Dictionary for Morphological Analysis of Classical Japanese
A platform-independent user-friendly dictionary from Italian to LIS
Mapping WordNet to the Kyoto ontology
Tools for plWordNet Development. Presentation and Perspectives
Recognition of Polish Derivational Relations Based on Supervised Learning Scheme
Building Japanese Predicate-argument Structure Corpus using Lexical Conceptual Structure
Extending the adverbial coverage of a French morphological lexicon
Reconstructing the Diachronic Morphology of Romanian from Dictionary Citations
A Fast, Memory Efficient, Scalable and Multilingual Dictionary Retriever
A disambiguation resource extracted from Wikipedia for semantic annotation
Fine-grained German Sentiment Analysis on Social Media
|
LR Infrastructures and Architectures |
The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language
Building Synthetic Voices in the META-NET Framework
Representing General Relational Knowledge in ConceptNet 5
The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions
Smooth Sailing for STEVIN
Tackling interoperability issues within UIMA workflows
A High-Quality Web Corpus of Czech
Towards a comprehensive open repository of Polish language resources
Textual Characteristics for Language Engineering
A Survey of Text Mining Architectures and the UIMA Standard
Cloud Logic Programming for Integrating Language Technology Resources
Aspects of a Legal Framework for Language Resource Management
Korp ― the corpus infrastructure of Spräkbanken
The open lexical infrastructure of Spräkbanken
The Rocky Road towards a Swedish FrameNet - Creating SweFN
A Basic Language Resource Kit for Persian
The Language Library: supporting community effort for collective resource production
The Australian National Corpus: National Infrastructure for Language Resources
A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
PEARL: ProjEction of Annotations Rule Language, a Language for Projecting (UIMA) Annotations over RDF Knowledge Bases
A Scalable Architecture For Web Deployment of Spoken Dialogue Systems
Semantic metadata mapping in practice: the Virtual Language Observatory
Recent Developments in CLARIN-NL
A Distributed Resource Repository for Cloud-Based Machine Translation
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
Parallel Data, Tools and Interfaces in OPUS
A Metadata Editor to Support the Description of Linguistic Resources
A Repository for the Sustainable Management of Research Data
UBY-LMF -- A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF
German """"nach""""-Particle Verbs in Semantic Theory and Corpus Data
Federated Search: Towards a Common Search Infrastructure
EXMARaLDA and the FOLK tools ― two toolsets for transcribing and annotating spoken language
Dynamic web service deployment in a cloud environment
Towards a User-Friendly Platform for Building Language Resources based on Web Services
Proper Language Resource Centers
RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Supervised Web-Corpora Building
Typing Race Games as a Method to Create Spelling Error Corpora
Standardizing a Component Metadata Infrastructure
Citing on-line Language Resources
The C-ORAL-BRASIL I: Reference Corpus for Spoken Brazilian Portuguese
On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market
LAMP: A Multimodal Web Platform for Collaborative Linguistic Analysis
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
The SYNC3 Collaborative Annotation Tool
ELRA in the heart of a cooperative HLT world
Using Language Resources in Humanities research
Integrating NLP Tools in a Distributed Environment: A Case Study Chaining a Tagger with a Dependency Parser
Glottolog/Langdoc:Increasing the visibility of grey literature for low-density languages
Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries
Beyond SoNaR: towards the facilitation of large corpus building efforts
Example-Based Treebank Querying
The LRE Map. Harmonising Community Descriptions of Resources
The New IDS Corpus Analysis Platform: Challenges and Prospects
Evaluating Query Languages for a Corpus Processing System
Towards automation in using multi-modal language resources: compatibility and interoperability for multi-modal features in Kachako
A Tool/Database Interface for Multi-Level Analyses
Linguistic Analysis Processing Line for Bulgarian
The KnowledgeStore: an Entity-Based Storage System
Classifying Standard Linguistic Processing Functionalities based on Fundamental Data Operation Types
Linguagrid: a network of Linguistic and Semantic Services for the Italian Language.
The Language Archive ― a new hub for language resources
Ontologies of Linguistic Annotation: Survey and perspectives
A generic formalism to represent linguistic corpora in RDF and OWL/DL
RELcat: a Relation Registry for ISOcat data categories
International Multicultural Name Matching Competition: Design, Execution, Results, and Lessons Learned
GATEtoGerManC: A GATE-based Annotation Pipeline for Historical German
Building a learner corpus
|
LR national/international projects, organizational/policy issues |
The Common Orthographic Vocabulary of the Portuguese Language: a set of open lexical resources for a pluricentric language
Building Synthetic Voices in the META-NET Framework
Smooth Sailing for STEVIN
Towards a comprehensive open repository of Polish language resources
Balanced data repository of spontaneous spoken Czech
Aspects of a Legal Framework for Language Resource Management
PoliMorf: a (not so) new open morphological dictionary for Polish
Introducing the Swedish Kelly-list, a new lexical e-resource for Swedish
The Language Library: supporting community effort for collective resource production
The Australian National Corpus: National Infrastructure for Language Resources
Texto4Science: a Quebec French Database of Annotated Short Text Messages
Recent Developments in CLARIN-NL
Proper Language Resource Centers
Medical Term Extraction in an Arabic Medical Corpus
Web Service integration platform for Polish linguistic resources
The Polish Sejm Corpus
ELRA in the heart of a cooperative HLT world
Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries
Beyond SoNaR: towards the facilitation of large corpus building efforts
The LRE Map. Harmonising Community Descriptions of Resources
Romanian TimeBank: An Annotated Parallel Corpus for Temporal Information
Legal electronic dictionary for Czech
The FLaReNet Strategic Language Resource Agenda
Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.
The Open Linguistics Working Group
Collecting and Using Comparable Corpora for Statistical Machine Translation
Enriching the ISST-TANL Corpus with Semantic Frames
|
M |
Machine Translation, SpeechToSpeech Translation |
Alignment-based reordering for SMT
Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation
Tajik-Farsi Persian Transliteration Using Statistical Machine Translation
Assessing the Comparability of News Texts
A finite-state morphological transducer for Kyrgyz
Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner
The KIT Lecture Corpus for Speech Translation
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Building a Basque-Chinese Dictionary by Using English as Pivot
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
Eye Tracking as a Tool for Machine Translation Error Analysis
Can Statistical Post-Editing with a Small Parallel Corpus Save a Weak MT Engine?
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web
BLEU Evaluation of Machine-Translated English-Croatian Legislation
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
Experiences in Resource Generation for Machine Translation through Crowdsourcing
Involving Language Professionals in the Evaluation of Machine Translation
Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese
Free/Open Source Shallow-Transfer Based Machine Translation for Spanish and Aragonese
Automatic MT Error Analysis: Hjerson Helping Addicter
An Analysis (and an Annotated Corpus) of User Responses to Machine Translation Output
Re-ordering Source Sentences for SMT
An English-Portuguese parallel corpus of questions: translation guidelines and application in SMT
The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output
Word Alignment for English-Turkish Language Pair
PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
Evaluating Appropriateness Of System Responses In A Spoken CALL Game
A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
A Distributed Resource Repository for Cloud-Based Machine Translation
Terra: a Collection of Translation Error-Annotated Corpora
Automatic word alignment tools to scale production of manually aligned parallel texts
In the same boat and other idiomatic seafaring expressions
Collection of a Large Database of French-English SMT Output Corrections
Arabic-Segmentation Combination Strategies for Statistical Machine Translation
Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation
A light way to collect comparable corpora from the Web
MultiUN v2: UN Documents with Multilingual Alignments
The Joy of Parallelism with CzEng 1.0
Suffix Trees as Language Models
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
Automatic Translation of Scientific Documents in the HAL Archive
On the practice of error analysis for machine translation evaluation
Error profiling for evaluation of machine-translated text: a Polish-English case study
Expanding Parallel Resources for Medium-Density Languages for Free
VERTa: Linguistic features in MT evaluation
Development and Application of a Cross-language Document Comparability Metric
Italian and Spanish Null Subjects. A Case Study Evaluation in an MT Perspective.
DGT-TM: A freely available Translation Memory in 22 languages
New language resources for the Pashto language
Assessing Divergence Measures for Automated Document Routing in an Adaptive MT System
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
A Study of Word-Classing for MT Reordering
Large aligned treebanks for syntax-based machine translation
Dealing with unknown words in statistical machine translation
PET: a Tool for Post-editing and Assessing Machine Translation
The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation
|
Metadata |
You Seem Aggressive! Monitoring Anger in a Practical Application
Fast Labeling and Transcription with the Speechalyzer Toolkit
Introducing the Reference Corpus of Contemporary Portuguese Online
Collecting and Analysing Chats and Tweets in SoNaR
Semantic metadata mapping in practice: the Virtual Language Observatory
A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
A Metadata Editor to Support the Description of Linguistic Resources
Collection of a corpus of Dutch SMS
Standardizing a Component Metadata Infrastructure
Bulgarian X-language Parallel Corpus
Challenges in the development of annotated corpora of computer-mediated communication in Indian Languages: A Case of Hindi
Iterative Refinement and Quality Checking of Annotation Guidelines ― How to Deal Effectively with Semantically Sloppy Named Entity Types, such as Pathological Phenomena
The LRE Map. Harmonising Community Descriptions of Resources
META-SHARE v2: An Open Network of Repositories for Language Resources including Data and Tools
The KnowledgeStore: an Entity-Based Storage System
The Open Linguistics Working Group
LDC Language Resource Database: Building a Bibliographic Database
The META-SHARE Metadata Schema for the Description of Language Resources
|
Morphology |
Constraint Based Description of Polish Multiword Expressions
Generation of Verbal Stems in Derivationally Rich Language
A finite-state morphological transducer for Kyrgyz
NeoTag: a POS Tagger for Grammatical Neologism Detection
First Steps towards the Semi-automatic Development of a Wordformation-based Lexicon of Latin
A Rule-based Morphological Analyzer for Murrinh-Patha
The Impact of Automatic Morphological Analysis & Disambiguation on Dependency Parsing of Turkish
AnIta: a powerful morphological analyser for Italian
PoliMorf: a (not so) new open morphological dictionary for Polish
Word Alignment for English-Turkish Language Pair
Unsupervised acquisition of concatenative morphology
Annotating and Learning Morphological Segmentation of Egyptian Colloquial Arabic
Arabic-Segmentation Combination Strategies for Statistical Machine Translation
First Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis
Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation
Evaluating Hebbian Self-Organizing Memories for Lexical Representation and Access
A Morphological Analyzer For Wolof Using Finite-State Techniques
Arabic Word Generation and Modelling for Spell Checking
Automatic Extraction and Evaluation of Arabic LFG Resources
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
The Romanian Neuter Examined Through A Two-Gender N-Gram Classification System
Expanding Parallel Resources for Medium-Density Languages for Free
Annotating Errors in a Hungarian Learner Corpus
Analyzing and Aligning German compound nouns
Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
Recognition of Polish Derivational Relations Based on Supervised Learning Scheme
The Netlog Corpus. A Resource for the Study of Flemish Dutch Internet Language
Reconstructing the Diachronic Morphology of Romanian from Dictionary Citations
Construction of the Turkish National Corpus (TNC)
|
Multilinguality |
Alignment-based reordering for SMT
Tajik-Farsi Persian Transliteration Using Statistical Machine Translation
An Open Source Persian Computational Grammar
Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques
Building Text-to-Speech Systems for Resource Poor Languages
Representing General Relational Knowledge in ConceptNet 5
Learning Sentiment Lexicons in Spanish
Unsupervised Word Sense Disambiguation with Multilingual Representations
Measuring Interlanguage: Native Language Identification with L1-influence Metrics
Light Verb Constructions in the SzegedParalellFX English--Hungarian Parallel Corpus
Extracting Directional and Comparable Corpora from a Multilingual Corpus for Translation Studies
Representing the Translation Relation in a Bilingual Wordnet
Building a multilingual parallel corpus for human users
BiBiKit - A Bilingual Bimodal Reading and Writing Tool for Sign Language Users
The Parallel-TUT: a multilingual and multiformat treebank
Identifying equivalents of specialized verbs in a bilingual comparable corpus of judgments: A frame-based methodology
HunOr: A Hungarian―Russian Parallel Corpus
Language Richness of the Web
A Universal Part-of-Speech Tagset
Parallel Aligned Treebanks at LDC: New Challenges Interfacing Existing Infrastructures
Linguistic Resources for Entity Linking Evaluation: from Monolingual to Cross-lingual
Multimodal Corpus of Multi-party Conversations in Second Language
Multilingual Central Repository version 3.0
Cross-lingual studies of ASR errors: paradigms for perceptual evaluations
Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese
Biomedical Chinese-English CLIR Using an Extended CMeSH Resource to Expand Queries
A tree is a Baum is an árbol is a sach'a: Creating a trilingual treebank
SMALLWorlds -- Multilingual Content-Controlled Monologues
Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics
A tool for enhanced search of multilingual digital libraries of e-journals
Dbnary: Wiktionary as a LMF based Multilingual RDF network
CLTC: A Chinese-English Cross-lingual Topic Corpus
Correlation between Similarity Measures for Inter-Language Linked Wikipedia Articles
FreeLing 3.0: Towards Wider Multilinguality
A new semantically annotated corpus with syntactic-semantic and cross-lingual senses
A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content
Discovering Missing Wikipedia Inter-language Links by means of Cross-lingual Word Sense Disambiguation
French and German Corpora for Audience-based Text Type Classification
Bulgarian X-language Parallel Corpus
Automatically Generated Online Dictionaries
Feedback in Nordic First-Encounters: a Comparative Study
MultiUN v2: UN Documents with Multilingual Alignments
The CONCISUS Corpus of Event Summaries
Knowledge-Rich Context Extraction and Ranking with KnowPipe
Customization of the Europarl Corpus for Translation Studies
DEGELS1: A comparable corpus of French Sign Language and co-speech gestures
Spell Checking in Spanish: The Case of Diacritic Accents
Adaptive Dictionary for Bilingual Lexicon Extraction from Comparable Corpora
Linguistic Resources for Handwriting Recognition and Translation Evaluation
Italian and Spanish Null Subjects. A Case Study Evaluation in an MT Perspective.
DGT-TM: A freely available Translation Memory in 22 languages
Analyzing and Aligning German compound nouns
Accessing and standardizing Wiktionary lexical entries for the translation of labels in Cultural Heritage taxonomies
Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
BUCEADOR, a multi-language search engine for digital libraries
CLIMB grammars: three projects using metagrammar engineering
Le Petit Prince in UNL
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
Identifying Word Translations from Comparable Documents Without a Seed Lexicon
A Mandarin-English Code-Switching Corpus
A Fast, Memory Efficient, Scalable and Multilingual Dictionary Retriever
Measuring the Divergence of Dependency Structures Cross-Linguistically to Improve Syntactic Projection Algorithms
An implementation of a Latvian resource grammar in Grammatical Framework
The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation
|
Multimedia Document Processing |
PAMOCAT: Automatic retrieval of specified postures
AVATecH ― automated annotation through audio and video analysis
Comparing computer vision analysis of signed language video with motion capture recordings
The REPERE Corpus : a multimodal corpus for person recognition
A Parallel Corpus of Music and Lyrics Annotated with Emotions
BUCEADOR, a multi-language search engine for digital libraries
Summarizing a multimodal set of documents in a Smart Room
Creating HAVIC: Heterogeneous Audio Visual Internet Collection
Creating a Data Collection for Evaluating Rich Speech Retrieval
|
MultiWord Expressions & Collocations |
Constraint Based Description of Polish Multiword Expressions
Annotated Corpora for Word Alignment between Japanese and English and its Evaluation with MAP-based Word Aligner
German Verb Patterns and Their Implementation in an Electronic Dictionary
Light Verb Constructions in the SzegedParalellFX English--Hungarian Parallel Corpus
Wordnet Based Lexicon Grammar for Polish
Linguistic knowledge for specialized text production
Automatic word alignment tools to scale production of manually aligned parallel texts
Measuring the compositionality of NV expressions in Basque by means of distributional similarity techniques
Association Norms of German Noun Compounds
Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
Evaluation of Classification Algorithms and Features for Collocation Extraction in Croatian
Analyzing and Aligning German compound nouns
Using Noun Similarity to Adapt an Acceptability Measure for Persian Light Verb Constructions
Identifying bilingual Multi-Word Expressions for Statistical Machine Translation
Automatic Term Recognition Needs Multiple Evidence
Detecting Japanese Compound Functional Expressions using Canonical/Derivational Relation
|
T |
Text mining |
Dependency parsing for interaction detection in pharmacogenomics
Evaluating the Similarity Estimator component of the TWIN Personality-based Recommender System
Mining Sentiment Words from Microblogs for Predicting Writer-Reader Emotion Transition
Large Scale Semantic Annotation, Indexing and Search at The National Archives
QurAna: Corpus of the Quran annotated with Pronominal Anaphora
Learning Categories and their Instances by Contextual Features
The BladeMistress Corpus: From Talk to Action in Virtual Worlds
Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources
Quantising Opinions for Political Tweets Analysis
Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing
Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing
A Cross-Lingual Dictionary for English Wikipedia Concepts
Mining Hindi-English Transliteration Pairs from Online Hindi Lyrics
Expertise Mining for Enterprise Content Management
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
SemSim: Resources for Normalized Semantic Similarity Computation Using Lexical Networks
An Evaluation of the Effect of Automatic Preprocessing on Syntactic Parsing for Biomedical Relation Extraction
Associative and Semantic Features Extracted From Web-Harvested Corpora
Evaluation of Unsupervised Information Extraction
From medical language processing to BioNLP domain
Irregularity Detection in Categorized Document Corpora
Spell Checking for Chinese
Identification of Manner in Bio-Events
CALBC: Releasing the Final Corpora
Annotated Bibliographical Reference Corpora in Digital Humanities
Unsupervised document zone identification using probabilistic graphical models
A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
Automatic Term Recognition Needs Multiple Evidence
Improving K-Nearest Neighbor Efficacy for Farsi Text Classification
Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench
|
Textual Entailment and Paraphrasing |
Constructing a Question Corpus for Textual Semantic Relations
Automatic Translation of Scholarly Terms into Patent Terms Using Synonym Extraction Techniques
Logical metonymies and qualia structures: an annotated database of logical metonymies for German
DSim, a Danish Parallel Corpus for Text Simplification
A contrastive review of paraphrase acquisition techniques
Annotating Factive Verbs
Chinese Whispers: Cooperative Paraphrase Acquisition
Diversifiable Bootstrapping for Acquiring High-Coverage Paraphrase Resource
Turkish Paraphrase Corpus
|
Tools, systems, applications |
Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation
Tajik-Farsi Persian Transliteration Using Statistical Machine Translation
Statistical Section Segmentation in Free-Text Clinical Records
ATLIS: Identifying Locational Information in Text Automatically
Incorporating an Error Corpus into a Spellchecker for Maltese
Building a 70 billion word corpus of English from ClueWeb
A data and analysis resource for an experiment in text mining a collection of micro-blogs on a political topic.
Creating a Coreference Resolution System for Polish
LDC Forced Aligner
A finite-state morphological transducer for Kyrgyz
Evaluating the Similarity Estimator component of the TWIN Personality-based Recommender System
Annotating Agreement and Disagreement in Threaded Discussion
Logic Based Methods for Terminological Assessment
Fast Labeling and Transcription with the Speechalyzer Toolkit
SPPAS: a tool for the phonetic segmentation of speech
Orthographic Transcription: which enrichment is required for phonetization?
A High-Quality Web Corpus of Czech
TIMEN: An Open Temporal Expression Normalisation Resource
Risk Analysis and Prevention: LELIE, a Tool dedicated to Procedure and Requirement Authoring
WebAnnotator, an Annotation Tool for Web Pages
Towards a comprehensive open repository of Polish language resources
Constructive Interaction for Talking about Interesting Topics
Portuguese Text Generation from Large Corpora
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
From keystrokes to annotated process data: Enriching the output of Inputlog with linguistic information
Polaris: Lymba's Semantic Parser
CoALT: A Software for Comparing Automatic Labelling Tools
Textual Characteristics for Language Engineering
A Survey of Text Mining Architectures and the UIMA Standard
A Rule-based Morphological Analyzer for Murrinh-Patha
Representing the Translation Relation in a Bilingual Wordnet
Can Statistical Post-Editing with a Small Parallel Corpus Save a Weak MT Engine?
Detecting Reduplication in Videos of American Sign Language
EVALIEX ― A Proposal for an Extended Evaluation Methodology for Information Extraction Systems
BiBiKit - A Bilingual Bimodal Reading and Writing Tool for Sign Language Users
Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources
Using multimodal resources for explanation approaches in intelligent systems
CAT: the CELCT Annotation Tool
Robust clause boundary identification for corpus annotation
NKI-CCRT Corpus - Speech Intelligibility Before and After Advanced Head and Neck Cancer Treated with Concomitant Chemoradiotherapy
PaCo2: A Fully Automated tool for gathering Parallel Corpora from the Web
Making Ellipses Explicit in Dependency Conversion for a German Treebank
Item Development and Scoring for Japanese Oral Proficiency Testing
Further Developments in Treebank Error Detection Using Derivation Trees
Dictionary Look-up with Katakana Variant Recognition
SUTime: A library for recognizing and normalizing time expressions
Two Database Resources for Processing Social Media English Text
Experiences in Resource Generation for Machine Translation through Crowdsourcing
An Oral History Annotation Tool for INTER-VIEWs
Comparing computer vision analysis of signed language video with motion capture recordings
Automatic MT Error Analysis: Hjerson Helping Addicter
Semi-Supervised Technical Term Tagging With Minimal User Feedback
Diachronic Changes in Text Complexity in 20th Century English Language: An NLP Approach
Combining Language Resources Into A Grammar-Driven Swedish Parser
An ontological approach to model and query multimodal concurrent linguistic annotations
Extending a wordnet framework for simplicity and scalability
Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora
PEXACC: A Parallel Sentence Mining Algorithm from Comparable Corpora
A Framework for Spelling Correction in Persian Language Using Noisy Channel Model
Customizable SCF Acquisition in Italian
Statistical Evaluation of Pronunciation Encoding
MISTRAL+: A Melody Intonation Speaker Tonal Range semi-automatic Analysis using variable Levels
A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies
ELAN development, keeping pace with communities' needs
Resource production of written forms of Sign Languages by a user-centered editor, SWift (SignWriting improved fast transcriber)
Rembrandt - a named-entity recognition framework
An Adaptive Framework for Named Entity Combination
Linguistic knowledge for specialized text production
FreeLing 3.0: Towards Wider Multilinguality
Inforex -- a web-based tool for text corpus management and semantic annotation
Massively Increasing TIMEX3 Resources: A Transduction Approach
Towards Automatic Gesture Stroke Detection
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects
SemSim: Resources for Normalized Semantic Similarity Computation Using Lexical Networks
A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content
Kitten: a tool for normalizing HTML and extracting its textual content
A Metadata Editor to Support the Description of Linguistic Resources
A Repository for the Sustainable Management of Research Data
Large Scale Lexical Analysis
NTUSocialRec: An Evaluation Dataset Constructed from Microblogs for Recommendation Applications in Social Networks
Automatic lexical semantic classification of nouns
Building and Exploiting a Corpus of Dialog Interactions between French Speaking Virtual and Human Agents
Arabic-Segmentation Combination Strategies for Statistical Machine Translation
Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks
The SERENOA Project: Multidimensional Context-Aware Adaptation of Service Front-Ends
The Herme Database of Spontaneous Multimodal Human-Robot Dialogues
RIDIRE-CPI: an Open Source Crawling and Processing Infrastructure for Supervised Web-Corpora Building
A Search Tool for FrameNet Constructicon
Extraction of unmarked quotations in Newspapers
A Morphological Analyzer For Wolof Using Finite-State Techniques
Word Sketches for Turkish
Development of a Web-Scale Chinese Word N-gram Corpus with Parts of Speech Information
Annotation Facilities for the Reliable Analysis of Human Motion
Arabic Word Generation and Modelling for Spell Checking
Automatically Generated Online Dictionaries
Citing on-line Language Resources
Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research
Holaaa!! writin like u talk is kewl but kinda hard 4 NLP
Towards Fully Automatic Annotation of Audio Books for TTS
Multimedia database of the cultural heritage of the Balkans
ANALEC: a New Tool for the Dynamic Annotation of Textual Data
Annotating Opinions in German Political News
IDENTIC Corpus: Morphologically Enriched Indonesian-English Parallel Corpus
Web Service integration platform for Polish linguistic resources
The TARSQI Toolkit
Annotation Trees: LDC's customizable, extensible, scalable, annotation infrastructure
An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines)
Hindi Subjective Lexicon: A Lexical Resource for Hindi Adjective Polarity Classification
Intelligibility assessment in forensic applications
Spontaneous Speech Corpora for language learners of Spanish, Chinese and Japanese
The SYNC3 Collaborative Annotation Tool
Lemmatising Serbian as Category Tagging with Bidirectional Sequence Classification
Efficient Dependency Graph Matching with the IMS Open Corpus Workbench
Strategies to Improve a Speaker Diarisation Tool
MaltOptimizer: A System for MaltParser Optimization
Evaluation of the KomParse Conversational Non-Player Characters in a Commercial Virtual World
Using Language Resources in Humanities research
NgramQuery - Smart Information Extraction from Google N-gram using External Resources
Adapting and evaluating a generic term extraction tool
A GUI to Detect and Correct Errors in Hindi Dependency Treebank
Example-Based Treebank Querying
Evaluation of a Complex Information Extraction Application in Specific Domain
Text Simplification Tools for Spanish
Specifying Treebanks, Outsourcing Parsebanks: FinnTreeBank 3
W-PhAMT: A web tool for phonetic multilevel timeline visualization
The Nordic Dialect Corpus
This also affects the context - Errors in extraction based summaries
Rapid creation of large-scale corpora and frequency dictionaries
Prosomarker: a prosodic analysis tool based on optimal pitch stylization and automatic syllabification
The DISCO ASR-based CALL system: practicing L2 oral skills and beyond
METU Turkish Discourse Bank Browser
The New IDS Corpus Analysis Platform: Challenges and Prospects
Application of a Semantic Search Algorithm to Semi-Automatic GUI Generation
Two Phase Evaluation for Selecting Machine Translation Services
A Graphical Citation Browser for the ACL Anthology
Service Composition Scenarios for Task-Oriented Translation
Towards automation in using multi-modal language resources: compatibility and interoperability for multi-modal features in Kachako
META-SHARE v2: An Open Network of Repositories for Language Resources including Data and Tools
A Tool/Database Interface for Multi-Level Analyses
Using an ASR database to design a pronunciation evaluation system in Basque
BUCEADOR, a multi-language search engine for digital libraries
Getting more data -- Schoolkids as annotators
Visualizing word senses in WordNet Atlas
Building Large Corpora from the Web Using a New Efficient Tool Chain
Assessing Divergence Measures for Automated Document Routing in an Adaptive MT System
Linguagrid: a network of Linguistic and Semantic Services for the Italian Language.
JRC Eurovoc Indexer JEX - A freely available multi-label categorisation tool
Rendering Endangered Lexicons Interoperable through Standards Harmonization: the RELISH project
The Language Archive ― a new hub for language resources
A Holistic Approach to Bilingual Sentence Fragment Extraction from Comparable Corpora
A methodology for the extraction of information about the usage of formulaic expressions in scientific texts
Tools for plWordNet Development. Presentation and Perspectives
A Concise Query Language with Search and Transform Operations for Corpora with Multiple Levels of Annotation
Collecting and Using Comparable Corpora for Statistical Machine Translation
Clause-based Discourse Segmentation of Arabic Texts
Latvian and Lithuanian Named Entity Recognition with TildeNER
LG-Eval: A Toolkit for Creating Online Language Evaluation Experiments
Collaborative Development and Evaluation of Text-processing Workflows in a UIMA-supported Web-based Workbench
A Fast, Memory Efficient, Scalable and Multilingual Dictionary Retriever
Rapidly Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz
An implementation of a Latvian resource grammar in Grammatical Framework
Dealing with unknown words in statistical machine translation
PET: a Tool for Post-editing and Assessing Machine Translation
|
Topic detection & tracking |
An Examination of Cross-Cultural Similarities and Differences from Social Media Data with respect to Language Use
|
Typological databases |
Building a Corpus of Indefinite Uses Annotated with Fine-grained Semantic Functions
|
|
|