In this paper, we report on the development of a large-scale Finnish Internet parsebank, currently consisting of 1.5 billion tokens in 116 million sentences.
In this paper, we report on the development of a large-scale Finnish Inter- net parsebank, currently consisting of 1.5 billion tokens in 116 million sentences.
Authors: Jenna Kanerva, Juhani Luotolahti, Veronika Laippala, Filip Ginter. Editors: Andrius Utka, Gintarė Grigonytė, Jurgita Kapočiūtė-Dzikienė, ...
Syntactic n-gram collection from a large-scale corpus of internet finnish. J KANERVA, J LUOTOLAHTI, V LAIPPALA, F GINTER.
This paper reports on the development of a large-scale Finnish Internet parsebank, currently consisting of 1.5 billion tokens in 116 million sentences, ...
Jul 24, 2015 · The syntactic ngrams are produced with the freely available Finnish Dependency Parser and Ngram Builder and the keystructures analyzed with a ...
The aim of this work is to create N-gram collection based on a large-scale corpus of all Polish sites on the Internet provided by The Common Crawl Foundation ...
Syntactic N-gram Collection from a Large-Scale Corpus of Internet Finnish · Computer Science, Linguistics. Baltic HLT · 2014.
This paper presents the first results on the lin- guistic variation in the Finnish-language Internet by analyzing informality, machine translations and human ...
Aug 19, 2018 · In this paper, we report on the development of a large-scale Finnish Internet parsebank, currently consisting of 1.5 billion tokens in 116 ...