Lexical features of political discourse: a corpus-based analysis of speeches about the European Union

10 Corpora can contain a total population of texts that we want to analyse, e.g. all the editions of a newspaper or all publications of an author. These are called full-text corpora and differ from the sample-text corpora, which instead only represent a sample of the population, containing for example a definite number of reports on economics. Texts are sampled to follow specific criteria and the researcher bearing specific objectives in mind such as to investigate particular characteristics of a genre (see Kennedy 1998: 19-23, for a more detailed description of the different types of corpora). c) Representativeness Representativeness in a corpus refers to the degree with which the language population contained in that corpus represents the entire language population, which it refers to. In other words, the higher the quantity of material representing a language population in the corpus, the higher the probability that the whole population is properly represented in that corpus. Therefore, a corpus needs to be representative in order to be appropriately used as a basis for generalisations concerning a language. The concept of representativeness changes if we refer to large corpora or to small ones. As regards the former, as the scope of large corpora is to represent a language population as a whole, they have to contain the greatest possible quantity and variety of texts referable to that language. As far as the latter are concerned, instead, as small corpora usually aim at representing only a particular aspect of a language population, they must contain texts of the same type and with the same characteristics. Going back to Chomsky’s criticism, achieving an appropriate representativeness of a language in a corpus would be a very difficult if not impossible task. This because quantitative analysis entails risks concerned with the generalisation of a single phenomenon found on a sample, to some larger population. It is on the other hand also true, as mentioned again in McEnery & Wilson (1996:78), that if such criticism would be accepted, it should be applied not only to language corpora, but to any form of scientific investigation, which is based on sampling rather than on the exhaustive analysis of an entire and finite population. On the basis of these considerations I agree with McEnery & Wilson (1996) when they say that Chomsky’s criticism should not be taken as so drastic, since corpus linguists have developed many safeguards and methods which may be applied in sampling for the maximal possible

