R scripts: hierarchical1.R

Mlxtran codes: model/hierarchical1a.txt ; model/hierarchical1b.txt

$$\newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\deriv}[1]{\dot{#1}(t)} \newcommand{\prob}[1]{ \mathbb{P}\!(#1)} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\by}{\boldsymbol{y}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bpsi}{\boldsymbol{\psi}} \def\pmacro{\texttt{p}} \def\like{{\cal L}} \def\llike{{\cal LL}} \def\logit{{\rm logit}} \def\probit{{\rm probit}} \def\one{{\rm 1\!I}} \def\iid{\mathop{\sim}_{\rm i.i.d.}} \def\simh0{\mathop{\sim}_{H_0}} \def\df{\texttt{df}} \def\res{e} \def\xomega{x} \newcommand{\argmin}[1]{{\rm arg}\min_{#1}} \newcommand{\argmax}[1]{{\rm arg}\max_{#1}} \newcommand{\Rset}{\mbox{\mathbb{R}}} \def\param{\theta} \def\setparam{\Theta} \def\xnew{x_{\rm new}} \def\fnew{f_{\rm new}} \def\ynew{y_{\rm new}} \def\nnew{n_{\rm new}} \def\enew{e_{\rm new}} \def\Xnew{X_{\rm new}} \def\hfnew{\widehat{\fnew}} \def\degree{m} \def\nbeta{d} \newcommand{\limite}[1]{\mathop{\longrightarrow}\limits_{#1}} \def\ka{k{\scriptstyle a}} \def\ska{k{\scriptscriptstyle a}} \def\kel{k{\scriptstyle e}} \def\skel{k{\scriptscriptstyle e}} \def\cl{C{\small l}} \def\Tlag{T\hspace{-0.1em}{\scriptstyle lag}} \def\sTlag{T\hspace{-0.07em}{\scriptscriptstyle lag}} \def\Tk{T\hspace{-0.1em}{\scriptstyle k0}} \def\sTk{T\hspace{-0.07em}{\scriptscriptstyle k0}} \def\thalf{t{\scriptstyle 1/2}} \newcommand{\Dphi}[1]{\partial_\pphi #1} \def\asigma{a} \def\pphi{\psi} \newcommand{\stheta}{{\theta^\star}} \newcommand{\htheta}{{\widehat{\theta}}}$$

# 1 Introduction

The population approach means that each individual is represented by the same basic parametric model but not necessarily the exact same parameter values. Thus, individual $$i$$ has parameters $$\psi_i$$. If we consider that individuals are randomly selected from the population, then we can treat $$\psi_i$$ as a random vector of parameters.

The probability distribution of $$\psi_i$$ is a parametric distribution that depends on a vector $$\theta$$ of population parameters and possibly a set of individual covariates $$c_i$$. This dependence can be made clear by writing $$\pmacro(\, \psi_i \,;\theta,c_i)$$ for the pdf of $$\psi_i$$.

In this context, the model for individual $$i$$ is the joint distribution of the longitudinal data $$y_i$$ and the individual parameters $$\psi_i$$: $\pmacro(y_i , \psi_i; \theta, c_i)=\pmacro(y_i | \psi_i ) \, \pmacro(\psi_i; \theta, c_i).$

The model $$\pmacro(y_i | \psi_i )$$ for the longitudinal data is implemented in the section [LONGITUDINAL] and the model $$\pmacro(\psi_i; \theta, c_i)$$ for the individual parameters is implemented in the section [INDIVIDUAL]. Its inputs are the population parameters $$\theta$$ and the individual covariates $$c_i$$.

# 2 Examples

## 2.1 Example 1

Consider the following model for continuous data:

1. $y_{ij} = \frac{100}{V_i}e^{-k \, t_{ij}} + a \, e_{ij}$ where $$e_{ij} \iid {\cal N}(0,1)$$.

Given $$V_i$$, each $$y_{ij}$$ is therefore normally distributed:

1. $y_{ij} | V_i \sim {\cal N} \left( \frac{100}{V_i}e^{-k \, t_{ij}} , a^2 \right) .$

Here, $$V_i$$ is an individual parameter which is assumed to be lognormally distributed:

1. $\log(V_i) \iid {\cal N}\left(\log \left(V_{\rm pop}\right) \ ,\ \omega_V^2\right).$

The other parameters of model (2) are $$k_i=k$$ and $$a_i=a$$. They are fixed in this example: we donâ€™t need to define their probability distributions.

The model therefore consists of the conditional distribution of the longitudinal data $$(y_ij)$$ defined in (2) and the distribution of the individual parameter $$V_i$$ defined in (3).

This model is implemented in hierarchical1a.txt.

[LONGITUDINAL]
input = {V, k, a}

EQUATION:
f = 100/V*exp(-k*t)

DEFINITION:
y = {distribution = normal, prediction = f, sd = a}

;----------------------------------------------
[INDIVIDUAL]
input = {V_pop, omega_V}

DEFINITION:
V = {distribution = lognormal, prediction = V_pop, sd = omega_V}


We will use simulx for sampling one individual parameter $$V_i$$ and one sequence $$y_i=(y_{ij})$$ for one individual $$i=1$$, with some given values of the population parameters $$V_{pop}$$, $$\omega_V$$, $$k$$ and $$a$$

p <- c(V_pop=10, omega_V=0.3, k=0.1, a=0.5)
f <- list(name='f', time=seq(0, 30, by=0.1))
y <- list(name='y', time=seq(1, 30, by=3))
V <- list(name='V')

res1 <- simulx(model     = 'model/hierarchical1a.txt',
parameter = p,
output    = list(V, f, y),
settings  = list(seed = 12345))

Here, res1 is a list with three elements:

names(res1)
## [1] "f"         "y"         "parameter"

Let us see the simulated value of $$V_1$$ and plot the simulated data

print(res1$parameter) ## V ## 1 10.2493 print(ggplot() + geom_line(data=res1$f, aes(x=time, y=f), size=1) +
geom_point(data=res1$y, aes(x=time, y=y), colour='red', size=2)) Instead of only one individual, let us now simulate a group of 5 individuals with the same model. An individual parameter $$V_i$$ is drawn for each individual $$i=1,2,\ldots,5$$. The level of randomization is therefore â€˜individualâ€™. g <- list( size = 5, level = 'individual') res2 <- simulx(model = 'model/hierarchical1a.txt', parameter = p, output = list(V, f, y), group = g, settings = list(seed = 12345)) print(res2$parameter)
##   id         V
## 1  1 10.249302
## 2  2 12.908722
## 3  3  9.909589
## 4  4  8.703464
## 5  5 11.433969
print(ggplot() +
geom_line(data=res2$f, aes(x=time, y=f, colour=id),size=1) + geom_point(data=res2$y, aes(x=time, y=y, colour=id), size=2))

If we now want to sample a single individual parameter $$V_1$$ for a single individual $$i=1$$, and 5 replicates of the longitudinal data for this individual, we must define the level of randomization as â€˜longitudinalâ€™.

g <- list( size  = 5,
level = 'longitudinal')

res3 <- simulx(model     = 'model/hierarchical1a.txt',
parameter = p,
output    = list(V, f, y),
group     = g,
settings  = list(seed = 12345))

print(res3$parameter) ## id V ## 1 1 10.2493 ## 2 2 10.2493 ## 3 3 10.2493 ## 4 4 10.2493 ## 5 5 10.2493 print(ggplot() + geom_line(data=res3$f, aes(x=time, y=f), colour="grey", size=1) +
geom_point(data=res3$y, aes(x=time, y=y, colour=id), size=2)) It is possible to combine several levels of randomization. We may want, for example, to sample 2 individual parameters $$V_1$$ and $$V_2$$, and 3 replicates of the longitudinal data for each individual. g <- list( size = c(2,3), level = c('individual','longitudinal')) res4 <- simulx(model = 'model/hierarchical1a.txt', parameter = p, output = list(V, f, y), group = g, settings = list(seed = 12345)) print(res4$parameter)
##   id        V
## 1  1 10.24930
## 2  2 10.24930
## 3  3 10.24930
## 4  4 12.90872
## 5  5 12.90872
## 6  6 12.90872
print(ggplot() +
geom_line(data=res4$f, aes(x=time, y=f, colour=id), size=1) + geom_point(data=res4$y, aes(x=time, y=y, colour=id), size=2))

## 2.2 Example 2

We still consider model (2) for the longitudinal data but $$V_i$$ is now function of the weight of individual $$i$$:

1. $\log(V_i) \iid {\cal N}\left(\log \left(V_{\rm pop} \left({w_i}/{w_{\rm pop}}\right)^{\beta}\right) \ ,\ \omega_V^2\right).$

This model is implemented in hierarchical1b.txt.

[LONGITUDINAL]
input = {V, k, a}
EQUATION:
f = 100/V*exp(-k*t)
DEFINITION:
y = {distribution = normal, prediction = f, sd = a}

[INDIVIDUAL]
input = {V_pop, omega_V, beta, w, w_pop}
EQUATION:
V_pred = V_pop*(w/w_pop)^beta
DEFINITION:
V = {distribution = lognormal, prediction = V_pred, sd = omega_V}


We therefore need to define the individual weight $$w_i$$ and the population weight $$w_{\rm pop}$$ as inputs of the model by adding them to the vector parameter.

p <- c(V_pop=10, omega_V=0.3, beta=1, w=75, w_pop=70, k=0.1, a=0.5)
f <- list(name='f', time=seq(0, 30, by=0.1))
y <- list(name='y', time=seq(1, 30, by=3))
V <- list(name='V')

g <- list( size  = 5,
level = 'individual')

res5 <- simulx(model     = 'model/hierarchical1b.txt',
parameter = p,
output    = list(V, f, y),
group     = g,
settings  = list(seed = 12345))

print(res5$parameter) ## id V ## 1 1 10.98140 ## 2 2 13.83077 ## 3 3 10.61742 ## 4 4 9.32514 ## 5 5 12.25068 print(ggplot() + geom_line(data=res5$f, aes(x=time, y=f, colour=id),size=1) +
geom_point(data=res5\$y, aes(x=time, y=y, colour=id), size=2))

Remark 1: The model for the individual parameter $$V_i$$ is a linear Gaussian model. Indeed, an equivalent mathematical representation of model (4) is the following one:

$\log(V_i) = \log(V_{\rm pop}) + \beta \log(w_i /w_{\rm pop}) + \eta_i$ where $$\eta_i \iid {\cal N}(0, \ ,\ \omega_V^2 )$$.

It is then possible to take advantage of this representation and implement the model for the individual parameters as a linear model. We could therefore replace the section [INDIVIDUAL] of hierarchical1b.txt with the following one:

[INDIVIDUAL]
input = {V_pop, omega_V, beta, w, w_pop}
EQUATION:
lw = log(w/w_pop)
DEFINITION:
V = {distribution=lognormal, reference=V_pop, covariate=lw, coefficient=beta, sd=omega_V}


Remark 2: The individual covariate $$w_i$$ is fixed and the same for all the individuals ($$w_i=75$$) in this examples. We will see in the next article how to simulate individual covariates. Individual covariates can also be provided in a data file.