Additions to the IMDI metadata set for sign language corpora

In order to simulate the concept of element groups, we suggest key values with dots for the moment, such as Hearing Status.Hearing for Actor.keys. The most important future development of the IMDI tools from the perspective of this proposal concerns the creation of ‘profiles’. Profiles will contain sets of key-value pairs specific for different subgroups of users, such as the sign language community. At this moment we can simulate and share a sort of profile by making and sharing a ‘master document’ as the one that can be found on the workshop web page (from early May on). Ideally, one would be able to choose one or more profiles from a list within the IMDI editor. This will be developed in the near future. If the set of sign language extensions is seen as a useful and stable set by the sign language community, the ‘sign language profile’ can perhaps be given a more flexible layout in the IMDI editor and browser. In order to simulate the concept of element groups, we suggest key values with dots, such as ‘Deafness.DeafnessStatus’ for ‘Actor.keys’. Numbers refer to paragraphs in the IMDI 3.02 proposal that already provide space for additional information in the form of ‘keys’. Additions to the IMDI metadata set for sign language corpora 1 of 7 3.3 Content 3.3.6 Content . Languages Content . Languages . Description Space for describing code mixing, sign supported speech, etc. used in this session in prose. Do we need separate keys for describing code mixing and code switching between different languages or modalities? 3.3.7 Content . keys Language Variety Definition: Description of the language variety used in the session. Encoding: string Comments: Space for more constrained description of language variety used in this session. Information about language skills of the individuals should be entered in the actor’s description (cf. Actor . keys). Elicitation Method Definition: A characterization of specific prompts used for eliciting language production. Encoding: OV: no prompt / single picture prompt / picture story prompt / written language prompt / sign language prompt / video prompt / unknown Comments: Use ‘no prompt’ for spontaneous language. When working on the influence of German on DGS compounding, for example, it is essential to know if the spoken language competence has been activated by the elicitation situation. Content . Task might be appropriate for this purpose, but the open vocabulary seems to suggest different levels of detail: While Wizard of Oz certainly is not related to the utterance’s topic, some others are, such as room reservation. "Frog story" could already have a (TM), it is well known to name both contents and elicitation method. Content . Involvement would be a good place, if it were open vocabulary. Interpreting Group Definition: Properties of interpreting appearing in the session. Encoding: Interpreting . Source Interpreting . Target Interpreting . Visibility Interpreting . Audience Comments: Interpreting . Source Definition: Source modality and language type. Encoding: OVL: sign language, speech / sign supported speech / text / fingerspelling / unknown / unspecified Comments: Interpreting . Target Definition: Target modality and language type. Encoding: OVL: sign language / speech / sign supported speech / text (subtitling) / fingerspelling / unknown / unspecified Comments: Additions to the IMDI metadata set for sign language corpora 2 of 7 Interpreting . Visibility Definition: Visibility of the interpreter in the video recordings. Encoding: CCV: not visible / in view during whole session / in view during part of session, unknown, unspecified Comments: Interpreting . Audience Definition: Presence and nature of an audience that the interpreter is signing for. Encoding: CCV: audience not present (signing to camera) / audience known to the interpreter / heterogeneous group partly known to the interpreter / anonymous audience (e.g. theatre) / unknown / unspecified Comments: If Interpreting . Target = subtitling, leave field empty. 3.4 3.4.2 Actors Actor Actor . keys We propose to add a number of keys describing different aspects of the actors, mainly to characterize the language background. All of these keys refer to relatively stable properties (skills) of the actors, not to their actual behaviour in the specific session at hand. Note: descriptions of groups of keys are aligned with the left margin; description of elements are all indented. The other formatting of the descriptions follows the IMDI documents. Keys that are further specified by a set of keys are followed by “(sub)” in the lists. General comment: most of the subjective data could be paralleled with “objective” data, such as ‘db left’ and ‘db right’ for the item ‘hearing’, scores in a language competence tests etc. Is this needed? Does anyone have suggestions for specific field and values that are often measured in your corpus? Actor keys Group: Encoding: Deafness (sub) Sign Language experience (sub) Family (sub) Education (sub) Comments: Stable properties (skills) of the actor, not their actual use in a given session. Deafness Group Definition: Groups information about the deafness status of the actor. Only the first element is relevant for all actors, the other elements specify details about hearing loss. Encoding: Deafness . Status Deafness . Aid Type Comments: Deafness . Status Definition: Actor’s ability to hear. Encoding: CCV: hearing / hard-of-hearing / deaf Comments: Additions to the IMDI metadata set for sign language corpora 3 of 7 Deafness . Aid Type Definition: Type of hearing aid the actor has. Encoding: CCV: none / conventional / CI Comments: Sign Language Experience Group Definition: Groups (partly subjective) information on the actor’s experience with sign language. Encoding: Sign Language Experience . Exposure Age Sign Language Experience . Acquisition Location Sign Language Experience . Sign Teaching Comments: Sign Language Experience . Exposure Age Definition: Age at which exposure to sign language and sign language use started. Encoding: c (years;months) Comments: Nativeness can be expressed by Language . Mother Tongue. Sign Language Experience . Acquisition Location Definition: Place where sign language was learnt. Encoding: OVL home from family/home from tutor/ preschool teachers / teachers / family beyond home / friends Comments: Sign Language Experience . Sign Teaching Definition: Amount of experience with teaching sign language. Encoding: OVL: none / some / extensive Comments: Family Group Definition: Describes deafness status of closest contact persons as well as preferred communication systems used. Encoding: Family . Mother (sub) Family . Father (sub) Family . Partner (sub) Family . Mother Group Definition: Characterises language input from actor's mother. Encoding: Family . Mother . Deafness Family . Mother . Primary Communication Form Family . Mother . Deafness Definition: Describes mother’s deafness status. Encoding: CCV: deaf / hard-of-hearing / hearing / n.a. Comments: Where appropriate, describe deafness status of alternative primary caregiver. Family . Mother . Primary Communication Form Additions to the IMDI metadata set for sign language corpora 4 of 7 Definition: Describes mother’s language input towards the actor. Encoding: OVL: sign / sign-supported speech / gesture / mix between signing and speaking / speech only / writing Comments: Where appropriate, describe primary communication form of alternative primary caregiver. Family . Father Group Definition: Characterises language input from actor's father. Encoding: Family . Father . Deafness Family . Father . Primary Communication Form Family . Father . Deafness Definition: Describes father’s deafness. Encoding: CCV: deaf / hard-of-hearing / hearing / n.a. Comments: Where appropriate, describe deafness status of alternative primary caregiver. Family . Father . Primary Communication Form Definition: Describes father’s language input towards the actor. Encoding: OVL: sign / sign-supported speech / gesture / mix between signing and speaking / speech only / writing Comments: Where appropriate, describe primary communication form of alternative primary caregiver. Family . Partner Group Definition: Characterises language input from actor's partner. Encoding: Family . Partner . Deafness Family . Partner . Primary Communication Form Family . Partner . Deafness Definition: Describes partner’s deafness status. Encoding: CCV: deaf / hard-of-hearing / hearing / n.a. Comments: Describe situation at the time of the recording. Family . Partner . Primary Communication Form Definition: Describes partner’s language input towards the actor. Encoding: OVL: sign / sign-supported speech / gesture / mix between signing and speaking / speech only / writing Comments: Education Group Definition: Describes where the actor was educated. Encoding: Education . Age Education . School Type Education . Class Kind Education . Education Model Education . Location Education . Boarding School Comments: It should become possible in the editor to specify this whole set of elements repeatedly for each school the actor has attended. Currently, this is not possible, and it will need to be Additions to the IMDI metadata set for sign language corpora 5 of 7 determined in the future how this can be done. In the mean time, it is recommended that users specify values for multiple schools in each field, separated by commas. Education . Age Definition: Describes the age during which the school was attended. Encoding: string Comments: Formatting: start age, dash, end age For example: 3-6, 6;3-12;2, etc Education . School Type Definition: Describes the type of school. Encoding: OV: bilingual home programme / kindergarten / preschool / primary school / vocational training / college / university Comments: Education . Class Kind Definition: Describes the kind of class in the school. Encoding: OV: deaf / hard-of-hearing / deaf class in hearing school / individually integrated Comments: Education . Education Model Definition: Describes the education model used at the school. Workshop home page The background document Sign language master files for IMDI Updated tools are available from September 2003. IMDI (ISLE Metadata Initiative), 2001, Part 1B. Metadata elements for lexicon descriptions. Draft proposal version 2.1. June 2001. IMDI (ISLE Metadata Initiative), 2001, Part 1C. Metadata elements for lexicon descriptions. Draft proposal version 1.0. December 2001. Birgit Hellwig, 2003, IMDI Editor, version 2.0. Manual. Version: 02 Apr 2003. Birgit Hellwig, 2003, IMDI Browser, version 1.4. Manual. Version: 12 Sep 2002. Peter Wittenburg & Daan Broeder, 2003, Metadata in ECHO. Version: 10 Mar 2003. Additions to the IMDI metadata set for sign language corpora 7 of 7