A. Golan, G. Judge, and J. Perloff. Journal of the American Statistical Association, 91 (434):
841--853(June 1996)The classical maximum entropy (ME) approach to estimating the unknown parameters of a multinomial discrete choice problem, which is equivalent to the maximum likelihood multinomial logit (ML) estimator, is generalized. The generalized maximum entropy (GME) model includes noise terms in the multinomial information constraints. Each noise term is modeled as the mean of a finite set of a priori known points in the interval -1, 1 with unknown probabilities where no parametric assumptions about the error distribution are made. A GME model for the multinomial probabilities and for the distributions associated with the noise terms is derived by maximizing the joint entropy of multinomial and noise distributions, under the assumption of independence. The GME formulation reduces to the ME in the limit as the sample grows large or when no noise is included in the entropy maximization. Further, even though the GME and the logit estimators are conceptually different, the dual GME model is related to a generalized class of logit models. In finite samples, modeling the noise terms along with the multinomial probabilities results in a gain of efficiency of the GME over the ME and ML. In addition to analytical results, some extensions, sampling experiments, and an example based on real-world data are presented..
A. Geronimus, J. Bound, and L. Neidert. Journal of the American Statistical Association, 91 (434):
529--537(June 1996)Investigators of social differentials in health outcomes commonly augment incomplete microdata by appending socioeconomic characteristics of residential areas (such as median income in a zip code) to proxy for individual characteristics. But little empirical attention has been paid to how well this aggregate information serves as a proxy for the individual characteristics of interest. We build on recent work addressing the biases inherent in proxies and consider two health-related examples within a statistical framework that illuminates the nature and sources of biases. Data from the Panel Study of Income Dynamics and the National Maternal and Infant Health Survey are linked to census data. We assess the validity of using the aggregate census information as a proxy for individual information when estimating main effects and when controlling for potential confounding between socioeconomic and sociodemographic factors in measures of general health status and infant mortality. We find a general, but not universal, tendency for aggregate proxies to exaggerate the effects of micro-level variables and to do more poorly than micro-level variables at controlling for confounding. The magnitude and direction of these biases vary across samples, however. Our statistical framework and empirical findings suggest the difficulties in and limits to interpreting proxies derived from aggregate census data as if they were micro-level variables. The statistical framework that we outline for our study of health outcomes should be generally applicable to other situations where researchers have merged aggregate data with microdata samples..