Statistical Computing '99 - Schloß Reisensburg

A Semiparametric Approach to Analysis of Covariance

Michael G. Schimek
Karl-Franzens-Universität Graz
Institut für Medizinische Informatik, Statistik und Dokumentation
Engelgasse 13, A-8010 Graz, Austria

We discuss a partially linear (i.e. semiparametric) regression model with a predictor function consisting of a categorical parametric component and a metrical nonparametric component. We assume that responses $y_1,\ldots,y_n$ are obtained at non-stochastic values $t_1,\ldots,t_n$ of a covariable t. Let us now consider

\begin{displaymath}y_i = {\bf u}_i^T + f(t_i) + \epsilon_i

for $i=1,\ldots,n$, where ${\bf u}_1,\ldots,{\bf u}_n$ are known k-dimensional vectors forming a design matrix ${\bf U}$, $\gamma$ is an unknown parameter vector, fis an unknown smooth spline function, and the errors $\epsilon_1,\ldots,\epsilon_n$ are independent, zero mean random variables with a common variance $\sigma^2$. When the design matrix characterizes k treatment conditions this setting can be interpreted as analysis of covariance. Our semiparametric model in matrix notation takes the form

\begin{displaymath}{\bf y}={\bf U}\gamma+{\bf f}+\epsilon

where ${\bf y}=(y_1,\ldots,y_n)^T$, ${\bf U}^T=[{\bf u}_1,\ldots,
{\bf u}_n]$, ${\bf f}=(f(t_1),\ldots, f(t_n))^T$ and $\epsilon=
(\epsilon_1,\ldots,\epsilon_n)^T$. The equation of (linear) analysis of covariance is

\begin{displaymath}{\bf y}={\bf X}\mu+{\bf Z}\beta+\epsilon

and has the same structure as that of the partially linear model. ${\bf X}$ is a $n \times k$ design matrix and ${\bf Z}$ is a $n \times l$ matrix of covariate measurements. $\mu$ and $\beta$ are the corresponding unknown parameter vectors. Futher it is assumed that ${\bf X}$ and ${\bf Z}$ are of full rank and uncorrelated. These assumptions can be relaxed in the semiparametric approach apart from the advantage that the relation between y and t can be arbitrary apart from certain smoothness requirements.

Schimek (1999) has proposed a general cubic spline-based algorithm for partially linear models which also allows for testing of the treatment conditions. The smoothing parameter choice is crucial as might be expected. We recommend an unbiased risk criterion introduced in Eubank et al. (1998). The connection to so-called nonparametric analysis of covariance (Quade, 1982) is pointed out. Finally a real data example is given.


31. Statistical Computing '99