R script: random.R


$$ \newcommand{\esp}[1]{\mathbb{E}\left(#1\right)} \newcommand{\var}[1]{\mbox{Var}\left(#1\right)} \newcommand{\deriv}[1]{\dot{#1}(t)} \newcommand{\prob}[1]{ \mathbb{P}\!(#1)} \newcommand{\eqdef}{\mathop{=}\limits^{\mathrm{def}}} \newcommand{\by}{\boldsymbol{y}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bpsi}{\boldsymbol{\psi}} \def\pmacro{\texttt{p}} \def\like{{\cal L}} \def\llike{{\cal LL}} \def\logit{{\rm logit}} \def\probit{{\rm probit}} \def\one{{\rm 1\!I}} \def\iid{\mathop{\sim}_{\rm i.i.d.}} \def\simh0{\mathop{\sim}_{H_0}} \def\df{\texttt{df}} \def\res{e} \def\xomega{x} \newcommand{\argmin}[1]{{\rm arg}\min_{#1}} \newcommand{\argmax}[1]{{\rm arg}\max_{#1}} \newcommand{\Rset}{\mbox{$\mathbb{R}$}} \def\param{\theta} \def\setparam{\Theta} \def\xnew{x_{\rm new}} \def\fnew{f_{\rm new}} \def\ynew{y_{\rm new}} \def\nnew{n_{\rm new}} \def\enew{e_{\rm new}} \def\Xnew{X_{\rm new}} \def\hfnew{\widehat{\fnew}} \def\degree{m} \def\nbeta{d} \newcommand{\limite}[1]{\mathop{\longrightarrow}\limits_{#1}} \def\ka{k{\scriptstyle a}} \def\ska{k{\scriptscriptstyle a}} \def\kel{k{\scriptstyle e}} \def\skel{k{\scriptscriptstyle e}} \def\cl{C{\small l}} \def\Tlag{T\hspace{-0.1em}{\scriptstyle lag}} \def\sTlag{T\hspace{-0.07em}{\scriptscriptstyle lag}} \def\Tk{T\hspace{-0.1em}{\scriptstyle k0}} \def\sTk{T\hspace{-0.07em}{\scriptscriptstyle k0}} \def\thalf{t{\scriptstyle 1/2}} \newcommand{\Dphi}[1]{\partial_\pphi #1} \def\asigma{a} \def\pphi{\psi} \newcommand{\stheta}{{\theta^\star}} \newcommand{\htheta}{{\widehat{\theta}}}$$

1 Introduction

It is possible to define normal random variables in the EQUATION block instead of the DEFINITION block of the Mlxtran script, using the syntax

x ~ normal(mu, sigma)

where mu is the mean and sigma the standard deviation of the normal random variable x.

Then x can be used in the EQUATION block as any other variable.

Remark: For simulation, we can either draw random variables in the block EQUATION, or define its probability distribution in the block DEFINITION. This is not the case for other tasks, such estimation, where an explicit description of the probability distributions is required by the estimation algotihms. Then, this definition should be provided in the block DEFINITION.


2 Using random variables in Section [LONGITUDINAL]

Consider the following model \[ y_j = (a + b\, t_j)e^{e_j} \] where \(e_j \iid {\cal N}(0, s^2)\).

When this model is used for simulation only, it is possible to define the sequence of residual errors \((e_j)\) directly in the block EQUATION:

myModel1 = inlineModel("
[LONGITUDINAL]
input =  {a, b, s}

EQUATION:
f = a + b*t
e ~ normal(0,s)
y = f*exp(e)
")

f <- list(name='f',time=seq(0, 100, by=1))
y <- list(name='y',time=seq(5, 100, by=10))
e <- list(name='e',time=seq(5, 100, by=10))
p <- c(a=10, b=0.5, s=0.2)
s <- list(seed=12345)

res <- simulx(model     = myModel1,
              parameter = p,
              settings  = s,
              output    = list(f, y, e))

We can then plot both sequences \((y_j)\) and \((e_j)\)

library(gridExtra)
pl1 <- ggplot() + 
  geom_point(aes(x=time, y=y), data=res$y, color="red", size=3) +
  geom_line(aes(x=time, y=f), data=res$f, size=1) 

pl2 <- ggplot(aes(x=time, y=e), data=res$e) +  geom_point(color="red", size=3) +
  geom_hline(yintercept = 0)

grid.arrange(pl1, pl2, nrow=1)

print(head(res$y))
##   time        y
## 1    5 12.70690
## 2   15 14.72222
## 3   25 16.84884
## 4   35 29.41867
## 5   45 28.16719
## 6   55 47.63037

Remark: In this example, it would be equivalent to define the probability distribution of \((y_j)\) in the block DEFINITION:

myModel2 = inlineModel("
[LONGITUDINAL]
input =  {a, b, s}

EQUATION:
f = a + b*t

DEFINITION:
y = {distribution=lognormal, prediction=f, sd=s}                      
")
res <- simulx(model     = myModel2,
              parameter = p,
              settings  = s,
              output    = list(f, y))

print(head(res$y))
##   time        y
## 1    5 12.70690
## 2   15 14.72222
## 3   25 16.84884
## 4   35 29.41867
## 5   45 28.16719
## 6   55 47.63037

The results are the same because the same seed is used with both models.


3 Using random variables in Section [INDIVIDUAL]

Random normal variables can also be defined in the bloc EQUATIONof Section [INDIVIDUAL]

myModel3 = inlineModel("
[LONGITUDINAL]
input =  {a, b, s}

EQUATION:
f = a + b*t
e ~ normal(0,s)
y = f*exp(e)

[INDIVIDUAL]
input = {a_pop, omega_a, b_pop, omega_b}
EQUATION:
a ~ normal(a_pop,omega_a)
b ~ normal(b_pop,omega_b)
")

i <- list(name=c('a', 'b'))

res <- simulx(model     = myModel3,
              parameter = c(a_pop=10, omega_a=1, b_pop=0.5, omega_b=0.1, s=0.2),
              settings  = list(seed=12345),
              output    = list(y, i))

print(res$parameter)
##          a         b
## 1 10.08208 0.4135785
print(head(res$y))
##   time        y
## 1    5 14.40448
## 2   15 12.69401
## 3   25 24.52023
## 4   35 21.36594
## 5   45 30.87609
## 6   55 38.49637

This is equivalent to defining the probability distributions in the 2 blocs DEFINITION:

myModel4 = inlineModel("
[LONGITUDINAL]
input =  {a, b, s}
EQUATION:
f = a + b*t
DEFINITION:
y = {distribution=lognormal, prediction=f, sd=s}                      

[INDIVIDUAL]
input = {a_pop, omega_a, b_pop, omega_b}
DEFINITION:
a = {distribution=normal, mean=a_pop, sd=omega_a}                      
b = {distribution=normal, mean=b_pop, sd=omega_b}                      
")

res <- simulx(model     = myModel4,
              parameter = c(a_pop=10, omega_a=1, b_pop=0.5, omega_b=0.1, s=0.2),
              settings  = list(seed=12345),
              output    = list(y, i))

print(res$parameter)
##          a         b
## 1 10.08208 0.4135785
print(head(res$y))
##   time        y
## 1    5 14.40448
## 2   15 12.69401
## 3   25 24.52023
## 4   35 21.36594
## 5   45 30.87609
## 6   55 38.49637