Welsh Acquisition Database

    Database of the Welsh of Children 3-7 Years

    Transcription

     

    Childes

    The database uses the CHILDES transcriptional system namely CHAT (Codes for the Human Analysis of Transcripts), in order to achieve an internationally-recognized written version of the sound recordings of the spontaneous spoken data of young children. A manual is available at the CHILDES Web site, in .pdf format. A summary of the conventions which are used in the database is given below.

    Intonational features are not coded except for exceptional stress on individual words. The traditional sentence types of declarative, interrogative and exclamative are indicated by the standard orthographic conventions of using a full stop (period), question mark, or exclamation mark.

    The format of a data file

    Opening headers convey details about the speakers and the recordings:

    @Begin  
    @Participants: TRY [speaker 1's reference], Trystan [speaker 1's name], Target_child [speaker 1's role],
      HEL [speaker 2's reference], Heledd [speaker 2's name], Target_child [speaker 2's role],
      BMJ [speaker 3's reference], Bob+morris+jones [speaker 3's name], Investigator [speaker 3's role]
    @Filename: c3004.cha [the name of the electronic file with CHILDES extension]  

    @End is placed at the end of each file.

    Between the beginning and end headers is the transcript:

    • '*' occurs at the beginning of every line of data
    • The speaker's reference then follows e.g. TRY
    • The speaker's utterance then follows
    • The spelling reflects spoken forms
    • Square and angled brackets are used to enclose various comments and codes (see below)
    • @ introduces comment lines

    Conventions which are used

     

    The following summarises the transcriptional conventions in the files. More general observations follow the summary.

    Symbol Meaning Example
         
    [...] Contain codes or comments on immediately preceding data. tractor [?]
    ‹ ... ›
     
    Enclose the words that comments/codes refer to.
    Without angled brackets the comments/codes refer to one word.
    ‹un cloc› [?]
    un cloc [?]
    .?! Indicate the end of a line of data: declarative, interrogative, exclamation  
    +...
    +..?
    +/.
    Unfinished declarative.
    Unfinished interrogative.
    Interruption.
     
    ,,,
    ,,
    Left-peripheral material i.e. on left periphery of core syntax
    Right-peripheral material i.e. on right periphery of core syntax
    ie,,, heddiw.
    dim heddiw,, na.
    Comma
    Without coma
    Between different items in a list especially adjectives (but without noting the relationship)
    Repetition of same items in a list especially adjectives
    cynffon mawr, hir
    cynffon hir hir.
    Initial capital letter Personal names, place names, brand names.
    The names of the children and adults (except investigators), place names, works names, have been made anonymous by using nonsense alphabetic strings.
    A final '0' on the anonymous versions indicate the names of places and works.
    Steve-austin, Xrst ac Lmno0.
    [!!] Contrastive word stress  
    [!] Strong word stress na [!]
    ["] Quoting another speaker's words  
    [% Saesneg] An English phrase or sentence welish i ‹big christmas tree› [% Saesneg]
    [% ca:n] Words from a song or nursery rhyme ‹dau gi bach yn mynd i 'r coed› [% ca:n]
    [/] Repetition fi [/] fi sy 'n mynd
    [//] Repetition with change fi [//] ti sy 'n mynd
    [>] ac [<] Overlapping speech.
    Numbers can indicate successive pairings
     
    [= 'explanation'] '=' indicates an explanation about the immediately preceding data tlacdol [= tractor]
    [=? 'explanation'] =? indicates a tentative explanation on immediately preceding data [=? 'di marw]
    [=! 'description'] '=!' indicates how utterances are delivered [=! prolonged 'r']
    [?] 'Best guess' transcription arian [?]
    xxx Indecipherable data.
    The number of syllable beats are indicated thus [% 2 sill]
    xxx [% 2 sill]
    & Unfinished word (not shortening) &bre
    : The colon symbol : is placed after a vowel instead of the circumflex diacritic ^ ta:n in place of tân
    ,, Precedes question tag yn fanna mae 'o,, ynde?
    # Pause in mid-utterance rho hwnna # yn1 fanna
    @sn suffix Noises and onomatopoeic forms br+rr@sn
    @gl suffix Nonsense words nwci+nwcs@gl
    @l suffix Letter from the alphabet s@l

    Personal names, local place-names, and local places-of-work have been made anonymous by using random nonsense-strings of letters: all begin with an initial capital, and the place names have a final 0. The names of public figures, fictional characters, and more distant places have been retained. Making names anonymous loses some information about word-forms, especially about mutations - where they occur - and word-play.

    The children produced many noises while playing, and some attempt has been made to transcribe these, although they are not intended to capture the phonetic details. They have the suffix @sn. Nonsense forms, in word-play for instance, have the suffix @gl. Both are declared in the 00depadd.cut file.

    English is also spoken by various children to different degrees in the database. Single English words - either by themselves or within a Welsh utterance - are not marked. But phrases or sentences of English words are enclosed in scope symbols < ... >, and are followed by the comment [% Saesneg] - 'Saesneg' being the Welsh word for 'English'.

    Similarly, phrases and sentences which are from songs, nursery rhymes, and similar material are enclosed within < ... > and are followed by the comment [% ca:n] - 'ca:n' (or 'cân', to use the circumflex - see below) is the Welsh for 'song'.

    Unfinished words (that is, fragments and not shortened words) are indicated by an initial &.

    There are many homonyms, many of which come about through phonological processes of elision and assimilation in spontaneous speech. Digits and the apostrophe are used to distinguish different word-forms which otherwise have the same spelling. The lexicon gives the lexeme to which they belong. The apostrophe is declared in the 00depadd.cut file to cater for word-initial occurrences.

    In spontaneous speech, patterns of a Welsh copula followed by a personal subject pronoun occur as a pronoun only. Such pronouns are indicated by a final apostrophe. There are instances, mainly of directive-like utterances within the context of a game, were it is not entirely clear what the pattern is. But these instances have likewise been give a final apostrophe.

    Welsh orthography contains circumflexed letters: 'âêîô' and also 'w' and 'y', for which there is no ASCII provision. Circumflexed letters are not stable over different applications, as is well-known. Consequently, they are represented as 'a: e: i: o:', which convention can then be conveniently extended to 'w: and y:'. This convention is mainly used where ambiguity would otherwise occur. Welsh also makes limited use of the diaeresis and the acute diacritics, but it has not been necessary to cater for these separately.

    The data files contain utterances by children and adults. The former are identified as Target_child or Child on the @Participant header line in the data files; the latter are identified as Investigators and Teachers. The utterances of the adults have been transcribed in full, but not as painstakingly as those of the children; in particular, homonyms have not all been disambiguated through transcription.

    Example of a transcription

    *HEL:mwy.
    *HEL:mwy.
    *HEL:'na2 ni!
    *HEL:'ei [= chwerthin].
    *HEL:'anna.
    *TRY:nagi.
    *HEL:na.
    *TRY:Heledd, na' i gal yr un melyn, 'de.
    *TRY:gei di gal yr un glas.
    @Comment:sw:n chwarae.
    *TRY:gei di 'm+ond rhyi [: rhoi] dwy [?].
    *TRY:ymm, nei di ryid, ymm +...
    *HEL:heina?
    *HEL:hwn.
    *TRY:ia.
    *TRY:&n [/] na.
    *TRY:na, ryid tywod i+mewn # efo fi.
    *HEL:naf.
    *TRY:xxx [% 2 sill].
    *TRY:‹un cloc› [?]
    *TRY:‹xxx [% 3 sill]› [›].
    *HEL:‹dw i› [‹] +...