Conference Presentations by Mirella De Sisto
LREC2022 Proceedings, 2022
Sign Languages (SLs) are the primary means of communication for at least half a million people in... more Sign Languages (SLs) are the primary means of communication for at least half a million people in Europe alone. However, the
development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and
standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well
as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are
not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing
the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing
various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at
format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs
and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural
translation models on the data produced by the proposed framework.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
25/01/17
Bookmarks Related papers MentionsView impact
24/06/2015. Talk at the Dialect Meeting 2015 and CIDSM X, Leiden University Centre for Linguisti... more 24/06/2015. Talk at the Dialect Meeting 2015 and CIDSM X, Leiden University Centre for Linguistics.
13/06/2014. Talk at the RomTin Workshop, Universiteit Leiden.
02/06/2014 Talk at the Workshop on Raddoppiamento Fonosintattico, Universiteit Leiden.
Bookmarks Related papers MentionsView impact
Papers by Mirella De Sisto
Linköping electronic conference proceedings, Jul 9, 2024
Bookmarks Related papers MentionsView impact
arXiv (Cornell University), Apr 26, 2024
Bookmarks Related papers MentionsView impact
Zenodo (CERN European Organization for Nuclear Research), Oct 16, 2023
Bookmarks Related papers MentionsView impact
Digital scholarship in the humanities, Feb 7, 2024
Bookmarks Related papers MentionsView impact
Studia Metrica et Poetica, Sep 10, 2023
Bookmarks Related papers MentionsView impact
Zenodo (CERN European Organization for Nuclear Research), Jun 9, 2023
Bookmarks Related papers MentionsView impact
INTERSPEECH 2023
Bookmarks Related papers MentionsView impact
arXiv (Cornell University), Apr 18, 2023
Bookmarks Related papers MentionsView impact
Isogloss, Mar 14, 2024
Bookmarks Related papers MentionsView impact
ILLA - Nuove Ricerche Umanistiche, 2021
Bookmarks Related papers MentionsView impact
28th Manchester Phonology Meeting, May 1, 2021
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Moderna Sprak, 2020
In southern Italian dialects, possessives have an enclitic variant typically associated with kins... more In southern Italian dialects, possessives have an enclitic variant typically associated with kinship nouns (Rohlfs 1967, Sotiri 2007, Ledgeway 2009, D’Alessandro & Migliori 2017) (e.g. [ˈfratə-mə] ˈbrother myˈ). The most common strategy to avoid violations of the three-syllable window is to avoid the enclitic form of the possessive, or stress shift, as in Lucanian (e.g. [ˌiennəˈru-mə] cf. [ˈiennərə], Lüdtke 1979:31). In the dialects of Airola and Boiano, a different strategy is attested: with proparoxytonic nouns (e.g. [ˈjennərə] ˈson-in-lawˈ in both varieties and [ˈsɔtʃəra]/[ˈswotʃəra] ˈmother/father-in-lawˈ in Boiano), the last unstressed syllable of the host is deleted (e.g. [ˈjennə-mə], [ˈsɔtʃə-mə], [ˈswotʃə-mə]). We claim that possessive enclitics in Airola and Boiano are internal clitics, that is, they amalgamate with the prosodic word that contains the host noun. We further propose that both proparoxytonic stress and the three-syllable-window derive from internally layered ternary feet (Martínez-Paricio 2013). These feet need to be aligned with the right edge of their containing prosodic word. When a possessive enclitic is incorporated, the most optimal strategy to comply with this alignment requirement is to build an internally layered ternary foot and delete the last syllable of the host noun, stress shift being excluded.
Bookmarks Related papers MentionsView impact
The idiosyncrasy of literary studies has been an obstacle to its technological improvement for ye... more The idiosyncrasy of literary studies has been an obstacle to its technological improvement for years, especially to represent their knowledge in a machine-readable format. The richness, variety, and different study`s perspectives that scholars find in their studies make this task a highly complex challenge. This complexity is even more noticed in the poetry genre, where each poetic tradition has independently developed its analytical terminology and methodology. In this work, we have addressed the construction of a poetry ontology to express the scholar ́s knowledge spread out in isolated databases or works. Ontopoetry ontology has been developed following Neon methodology, and it has been structured in three modules: a) core, b) poetic analysis and c) transmission, covering the essential aspects in a poetry literary study. Ontopoetry core module has been aligned with FRBRoo ontology guaranteeing its interoperability. This paper is focused on the description of the core module, its ...
Bookmarks Related papers MentionsView impact
The main aim of Poetry Standardization and Linked Open Data Project, POSTDATA, is to provide mean... more The main aim of Poetry Standardization and Linked Open Data Project, POSTDATA, is to provide means for researchers on European poetry to publish and consume semantically-enriched data. Thus, developing a poetry ontology is a pillar of its semantic domain. This ontology tries to enhance interoperability in the European poetry community and capture the European poetry domain knowledge.
Bookmarks Related papers MentionsView impact
The study of the poetic features of text, especially their rhythmic structure when forming verses... more The study of the poetic features of text, especially their rhythmic structure when forming verses, pertains to the different traditions, whose scholars established the rules that might govern poetry. Within this context, the POSTDATA Project formalized a network of ontologies able to express any poetic expression and its analysis at the European level, enabling scholars all over Europe to interchange their data using Linked Open Data. However, varied research interests result in corpora that might not share the same facets of an analysis. To alleviate this concern and foster the completeness of the interchanged corpora, our team set out to build a software toolkit to assist in the analysis of poetry. This paper introduces PoetryLab, an extensible open source toolkit for syllabification, scansion (extraction of stress patterns), enjambment detection (syntactical units split in two lines), rhyme detection, and historical named entity recognition for Spanish poetry. Our toolkit achieve...
Bookmarks Related papers MentionsView impact
Uploads
Conference Presentations by Mirella De Sisto
development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and
standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well
as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are
not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing
the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing
various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at
format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs
and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural
translation models on the data produced by the proposed framework.
13/06/2014. Talk at the RomTin Workshop, Universiteit Leiden.
02/06/2014 Talk at the Workshop on Raddoppiamento Fonosintattico, Universiteit Leiden.
Papers by Mirella De Sisto
development of SL recognition and translation tools is slowed down by a series of obstacles concerning resource scarcity and
standardization issues in the available data. The former challenge relates to the volume of data available for machine learning as well
as the time required to collect and process new data. The latter obstacle is linked to the variety of the data, i.e., annotation formats are
not unified and vary amongst different resources. The available data formats are often not suitable for machine learning, obstructing
the provision of automatic tools based on neural models. In the present paper, we give an overview of these challenges by comparing
various SL corpora and SL machine learning datasets. Furthermore, we propose a framework to address the lack of standardization at
format level, unify the available resources and facilitate SL research for different languages. Our framework takes ELAN files as inputs
and returns textual and visual data ready to train SL recognition and translation models. We present a proof of concept, training neural
translation models on the data produced by the proposed framework.
13/06/2014. Talk at the RomTin Workshop, Universiteit Leiden.
02/06/2014 Talk at the Workshop on Raddoppiamento Fonosintattico, Universiteit Leiden.
guages: from very ancient versifications (Sumerian, Akkadian, Hittie; Ancient Greek), through medieval (Old English, Old Icelandic, Old Saxon) and Renaissance verse to modern experiments (free verse, concrete poetry); from English and Russian through Spanish and German to Portuguese and Catalan. Not only written, but also spoken poetry has been analyzed.
create gender distinction in the plural of nouns; in fact, metaphony takes place in masculine plural forms, while RF marks feminine plural ones. Therefore, two distinct phenomena, one being phonological, namely metaphony, and one being phono-syntactic, namely RF, happen to interact within plural noun formation. These two processes, which developed separately, acquired, synchronically speaking, a value of gender distinction.
Metaphony is a well-known phenomenon of Italian dialects, which consists in the raising or diphthongization of a stressed vowel under the influence of a non-adjacent following high vowel (Rohlfs 1966, Fanciullo 1994, Ledgeway 2009, Maiden 2010). In the dialect of Airola, it only affects mid vowels, namely /ɔ, o, e, ɛ/, and its attestation is not limited to the nominal class; it occurs, in fact, in various word categories, such as adjectives, verbs and possessive pronouns.
RF is an external sandhi phenomenon which consists in the gemination of a word-initial consonant under the influence of a preceding word (Rohlfs 1970, Leone 1984,
Loporcaro 1997, Borrelli 2002). In Airolano RF is lexically triggered, differently from the RF attested in Standard Italian, which occurs to be stress-induced.
The aim of this thesis is to describe the two phenomena, metaphony and RF, in Airolano and to give an analysis of them in order to explain their division of labor. To do so, the processes are first analyzed separately. Then, a unified analysis is elaborated aiming to shed some light on the difference between genders in the plural of nouns.
The analysis of the two phenomena will be based on data from Airolano that were collected in December 2013 and April 2014 by the author.Ten informants were selected, which were classified into four different age groups. All
the recordings were, subsequently, transcribed in IPA and they appear in this form in the text. The full set of data is stored in the Italian Dialect archive of Leiden University.