Questo sito utilizza cookie di terze parti per inviarti pubblicità in linea con le tue preferenze. Se vuoi saperne di più clicca QUI 
Chiudendo questo banner, scorrendo questa pagina, cliccando su un link o proseguendo la navigazione in altra maniera, acconsenti all'uso dei cookie. OK

Mixture models for count data: the ZIP model

The thesis, done under the Generalized Linear Models (GLM), is divided into theoretical development before then applied to the problem of excess zeros within a dataset insurance, kindly offered by ANIA. This dataset was then modeled with a template ZIP: "Zero Inflated Poisson." Before applying carried out with the statistical software R was made ​​an overview of Mixture Models. Then formalize the important EM algorithm, and clearly extend the discussion to a further formalization of the ZIP model integrated with the EM algorithm.

In Chapter 1, I introduce the mixture models, in particular describing the origin, interpretation and issues of this class of model. I finish the first chapter with interesting example about a mixture of two normal distribution with the real data of Old Faithful geyser. Chapter 2 is devoted to the formalization of basic EM Algorithm. I also finish this chapter with an example about transmission tomography. In Chapter 3, I present a brief introduction to ZIP model, in which I will travel through again themes already discussed in Chapter 1 and 2, but in this time referring to the ZIP model. I formalized the Maximum Likelihood Estimation using EM algorithm and I show Zero Inflated Poisson model fit in R. Finally in Chapter 4, I introduce my data and I do a descriptive analysis, with also a 3D plot and in the end I fit the ZIP model.

Mostra/Nascondi contenuto.
~ 4 ~ Preface In the context of Inference and Statistical Models, to describe any phenomenon, many times the basic tools, that we can use, it doesn’t enough to explain the real phenomenon that we try to model through the common statistical model. Therefore, in this particular case, to overcome several specification problems and to overcome some limited use of the basic model it is formulated a special class of models: mixture models. Mixture models have gained an increased popularity in many fields of sciences; the main feature of this class of model is, in fact, that commonly used density functions are employed as building blocks for more complex distributions: this allows for a great flexibility in statistical modeling and makes mixture models adequate to very complicate framework. In this work, I have underscored also, some of the reasons that leads to fit a mixture model, like: the presence of k unobserved subpopulation (see Bruce G. Lindsay (1995) in: “Mixture Models: Theory, Geometry and Applications“), or even when there is no reason to believe that a latent structure affects the data- generating process, a mixture models can be fitted with the aim of exploiting its flexibility, as pointed out in McLachlan and Peel (2000). As stated Lindsay (1995), a mixture model has a dual usefulness: on the one hand it enables to study the distribution of the outcome response (Y) when a covariate (Z) is missing; on the other hand, it makes use of a surrogate measure (Y) to learn about an unobserved variable (Z). Unfortunately, estimating the parameters of a mixture model presents a number of obstacles: first, model identification is not guaranteed; second, estimates are sensitive to the started values used for the optimization algorithm. In this work I shortly reviews some of this problems. Farther, I gutted the EM- algorithm that allows to overcome this estimate’s problems, reporting step by step whole formalization of that iterative process. Finally, I have been studying a very particular mixture model, the Zero Inflated Poisson model (ZIP), which help us when the data have an excess number of zeros. (There are other mixture model, for example, that help us with count variables when the classical Poisson regression model are limited because the data exhibit over or under dispersion. Also in this case we can use ZIP model or Negative Binomial that is another type of mixture model). I also fit ZIP model on real data with statistical software R. The real data that I will use for my ZIP application, will a dataset of 133707 observations on three variables, that I was provided by ANIA-National Association of Insurance Companies, after a very long practice.

Laurea liv.I

Facoltà: Scienze Statistiche ed Attuariali

Autore: Leonardo Affinito Contatta »

Composta da 34 pagine.


Questa tesi ha raggiunto 58 click dal 13/05/2014.

Disponibile in PDF, la consultazione è esclusivamente in formato digitale.