Questo sito utilizza cookie di terze parti per inviarti pubblicità in linea con le tue preferenze. Se vuoi saperne di più clicca QUI 
Chiudendo questo banner, scorrendo questa pagina, cliccando su un link o proseguendo la navigazione in altra maniera, acconsenti all'uso dei cookie. OK

Annotazioni linguistiche: una rassegna

This document will deal with Linguistic Annotations applied to computer science, discussing how much it is important to express information on a language resource, and what instruments does IT provide to this purpose. It will provide a review of some practical examples of Linguistic Annotations, particularly focusing on the work of ISO, which has formed a working group [WitBro2002] to discuss the issues of linguistic annotation, analyse existent solutions and find a standard for annotating resources.
Language is the way in which a living being expresses itself. Is it movement, gesture, sound, verse or sign, language gives a being the ability to communicate, that is entering in relation with the world that surrounds it, from earth and sky to other living creatures.
Everything that lives does communication, from the smallest cell to the biggest animal. A cell can move, or contract, a bee dances to indicate fellow bees how to find a flower, dolphins and whales use ultrasounds to locate food and threats, many male animals use particular behaviours to court females. Lack of communication is quite a synonym for death, so the necessity to use a language is as old as the presence of life on Earth.
Whatever language we consider, one feature is always present: needing to be understandable, communication between living beings of the same kind must use a common code, accepted and adopted by everyone, that permanently associates a particular sign (gesture, verse, writing etc.) with a precise meaning. For example, a cat curves its back and erects his hair to defy an enemy, and a dog shakes its tail when it is happy.
The necessity of sharing a common code of communication, written or spoken, traded to next generations genetically or trough learning, is a way of expressing information on a linguistic content (or language resource, as it will be called in this document), that is a linguistic annotation.
Humans have developed a great variety of languages, starting from gestures in their early existence on Earth, followed by spoken words and finally written content. Writing, in turn, originated as a way to express ideas with signs, and then took a syllabic/phonetic approach, focusing more on the description of sounds than on their meaning. Meanwhile, humans settled across the entire planet, without easy communication between people so widely spread. This situation facilitated the differentiation of languages, but when technological progress made communication easier again the need for interoperating languages raised.
It is necessary to translate words and phrases from one language to another, but to do this the structure of text had to be found. This was the first linguistic annotation, or classification of parts of the discourse based on its logical or grammatical structure. This is the typical way teachers use to present a language to their students.
Another approach is that made by those who study pronunciation of words, trying to find a common code to express the sounds made by the voice to say a word. In this way, they created a phonetic alphabet, which could be suitable for pronunciation of every natural language spoken by men.
The latter and more versatile approach is focusing on the semantics: the information the Linguistic Annotation has to carry now concerns the meaning of the resource it annotates. We could tell when and where a certain quote was said, and who said it, or we could do a summary of that quote, pointing out the key information. We could find relations between that quote and other quotes of the same speaker, about the same content, or with the same opinion, for further research. This is the typical work of journalists.
Technology progress made categorization of language resources not only reliant on writing, but also on audio and video, and the birth of computers, although a little lately, raised their accessibility and usability. In the last years, attempts to automate also the annotations were made, especially with the growth of mark-up languages like HTML and XML, very suitable for this purpose. Recently, ISO had recognized the importance of annotations and created a working group to discuss these issues and find out a standard solution.
The first chapter will introduce the problem and make a summary of the rest of the document. The second chapter will discuss the general aspects of linguistic annotation, in particular the different approaches to annotation and what are the most used instruments to implement them on computer systems. Chapter 3 will show some examples of computer-based linguistic annotation, chosen to embrace all kinds of approach and implementation. Chapter 4 will focus on the process of standardization for language resources, and especially on the ISO work. Chapter 5 will draw conclusions.

Mostra/Nascondi contenuto.
Abstract ABSTRACT 5 This document will deal with Linguistic Annotations applied to computer science, discussing how much it is important to express information on a language resource, and what instruments does IT provide to this purpose. It will provide a review of some practical examples of Linguistic Annotations, particularly focusing on the work of ISO, which has formed a working group [WitBro2002] to discuss the issues of linguistic annotation, analyse existent solutions and find a standard for annotating resources. Language is the way in which a living being expresses itself. Is it movement, gesture, sound, verse or sign, language gives a being the ability to communicate, that is entering in relation with the world that surrounds it, from earth and sky to other living creatures. Everything that lives does communication, from the smallest cell to the biggest animal. A cell can move, or contract, a bee dances to indicate fellow bees how to find a flower, dolphins and whales use ultrasounds to locate food and threats, many male animals use particular behaviours to court females. Lack of communication is quite a synonym for death, so the necessity to use a language is as old as the presence of life on Earth. Whatever language we consider, one feature is always present: needing to be understandable, communication between living beings of the same kind must use a common code, accepted and adopted by everyone, that permanently associates a particular sign (gesture, verse, writing etc.) with a precise meaning. For example, a cat curves its back and erects his hair to defy an enemy, and a dog shakes its tail when it is happy. The necessity of sharing a common code of communication, written or spoken, traded to next generations genetically or trough learning, is a way of expressing information on a linguistic content (or language resource, as it will be called in this document), that is a linguistic annotation. Humans have developed a great variety of languages, starting from gestures in their early existence on Earth, followed by spoken

Laurea liv.I

Facoltà: Scienze Matematiche, Fisiche e Naturali

Autore: Mattia Gentilini Contatta »

Composta da 101 pagine.

 

Questa tesi ha raggiunto 518 click dal 20/03/2004.

Disponibile in PDF, la consultazione è esclusivamente in formato digitale.