Questo sito utilizza cookie di terze parti per inviarti pubblicità in linea con le tue preferenze. Se vuoi saperne di più clicca QUI 
Chiudendo questo banner, scorrendo questa pagina, cliccando su un link o proseguendo la navigazione in altra maniera, acconsenti all'uso dei cookie. OK

Analisi ed etichettatura sintattica di un corpus di parlato

Speech and writing are different from each other for a number of factors, concerning prosodic, social and, last but not least, syntactic features. As far as syntax is concerned, research shows that the reference units for spontaneous speech are clauses, instead of sentences, and the boundaries between these units are far from easily recognized. Voghera (1992) identifies three criteria for the recognition and the delimitation of clauses or sentences in speech: predication, autonomy and intonation, correlated among them. Moreover, coordinative or subordinative conjunctions, instead of assuming a grammatical function inside the clause, can, in speech, have a prosodic status, becoming similar to discourse markers.
Therefore, the syntactic tagging system AN.ANA.S is created to label every type of text (spoken or written), using the XML standard. AN.ANA.S uses a manual software, XGATE, with a DTD, the set of grammar rules for the tagging process, which gives the text a tree structure. The DTD is composed of a series of elements, which represent each node of the syntactic tree. All the elements have a tag, which can be composed of a series of attributes. The tagging process consists of giving a hierarchic role to a portion of text, related to its syntactic level, and then describing it with its tag and the related attributes.
As far as speech is concerned, the DTD is made to take into consideration its features (e.g. the importance of the clause level, of discourse markers, false starts, hesitations, etc). However, the relationship between spoken language, which, for its connection with the various contexts of enunciation, is always new, unrepeatable, and a software, which demands a certain degree of rigour, is not unproblematic.
During the tagging process (carried out on a corpus of eight radio or TV spoken texts), several problems were found: syntactic problems, such as the boundaries between clauses, or the treatment of conjunction with a pragmatic function and of particular noun modifiers; problems with the structure of the DTD, such as the treatment of inchoative verbs; other problems, such as the recognition of multi-word expressions, or the treatment of interruptions (retrace-and-repair sequences).
In conclusion, it must be pointed out that, although this piece of writing deals exclusively with the tagging process and its problems, the aim of AN.ANA.S is studying the structures of the language (both the spoken one and the written one), by querying the database created by the software about the percentage of each feature of the tagged corpus (i.e. unexpressed subjects).

Mostra/Nascondi contenuto.
Abstract Speech and writing are different from each other for a number of factors, concerning prosodic, social and, last but not least, syntactic features. As for syntax, research shows that the reference units for spontaneous speech are clauses, instead of sentences, and the boundaries between these units are far from easily recognized. Voghera (1992) identifies three criteria for the re- cognition and the delimitation of clauses or sentences in speech: predication, autonomy and intonation, correlated among them. Moreover, coordinative or subordinative conjunctions, instead of assuming a grammatical function inside the clause, can, in speech, have a prosodic status, becoming similar to discourse markers. The syntactic tagging system AN.ANA.S. is created to label every type of text (spoken or written), using the XML standard. AN.ANA.S. uses a manual software, XGATE, with a DTD, the set of grammar rules for the tagging process, which outlines the tree structure of the text. The DTD is composed of a series of elements, which represent each node of the syntactic tree. All the elements have a tag, which can be composed of a series of at- tributes. The tagging process consists of giving a hierarchic role to a portion of text, related to its syntactic level, and then describing it with its tag and the related attributes. 1

Laurea liv.I

Facoltà: Lingue e Letterature Straniere

Autore: Annamaria Landolfi Contatta »

Composta da 111 pagine.

 

Questa tesi ha raggiunto 1424 click dal 03/07/2007.

Disponibile in PDF, la consultazione è esclusivamente in formato digitale.