R scripts: catcov.R
Mlxtran codes: model/catcov1A.txt ; catcov1B.txt ; catcov2A.txt ; catcov2B.txt
Categorical variables take a finite number of values from a set that is not necessarily numerical or even ordered, e.g., gender, country or ethnicity.
The approach taken for continuous covariates extends easily to categorical ones. For simplicity’s sake, let us consider a unique covariate \({c}_i\) that takes its values in \(\{ a_1, a_2, \ldots, a_K\}\), and a unique parameter \(\psi_i\).
Here, a reference covariate value \({c}_{\rm pop}\) is a reference category, i.e., a specific element \(a_{k^\ast}\) of \(\{ a_1, a_2, \ldots, a_K\}\).
Assume for instance a linear model of covariates. Then, the prediction \(\tilde{\psi}_i\) for \(\psi_i\) is given by :
\[ h(\tilde{\psi}_i) = h(\psi_{\rm pop})+ \beta_1 \one_{{c}_i=a_1} + \beta_2 \one_{{c}_i=a_2} + \ldots + \beta_K \one_{{c}_i=a_K} , \]
with \(\beta_{k^\ast} = 0\) and where \(\one\) is the indicator function (\(\one_A = 1\) if \(A\) is true, 0 otherwise).
This model is equivalent to \[ h(\tilde{\psi}_i) = \left\{ \begin{array}{ll} h(\psi_{\rm pop}) & \hbox{if ${c}_i=a_{k^\ast}$} \\ h(\psi_{\rm pop}) + \beta_k & \hbox{if ${c}_i=a_k \neq a_{k^\ast}$} . \end{array} \right. \]
We see that if the covariate has \(K\) categories, \(K-1\) coefficients \((\beta_k)\) are required for defining the covariate model.
Consider a model where gender takes the values \(M\) and \(F\) with \[ \prob{{\rm gender}_i=F} = p_F \]
Then, gender is used as a categorical covariate for describing the variability of \(k\)
\[ \log(k_i) = \log(k_{\rm pop}) + \beta_F \one_{{\rm gender}_i=F} + \eta_i \]
That means that the typical values of \(k\) for a male and a female are, respectively, \(k_{\rm pop}\) and \(k_{\rm pop}e^{\beta_F}\).
Function of time is then defined as \[f(t ; k_i) = e^{-k_i \, t}\]
This model is implemented in catcov1A.txt
:
[COVARIATE] input={p_F} DEFINITION: gender = { type = categorical, categories = {F,M}, P(gender=F) = p_F } ;--------------------------------------- [INDIVIDUAL] input={k_pop, omega_k, gender, beta_F} gender={type=categorical,categories={F,M}} DEFINITION: k = { distribution = lognormal, reference = k_pop, covariate = gender, coefficient = {beta_F,0}, sd = omega_k } ;--------------------------------------- [LONGITUDINAL] input = {k} EQUATION: f = exp(-k*t)
Let us generate 12 simulated genders, 12 individual parameters \(k_i\) (one per simulated gender) and 12 functions \(f(t;k_i)\) (one per individual parameter) with this model:
p <- c(p_F=0.4,k_pop=0.2, omega_k=0.1, beta_F=0.6)
f <- list(name='f', time=seq(0, 30, by=0.1))
ind <- list(name=c("gender","k"))
res1 <- simulx(model = "model/catcov1A.txt",
parameter= p,
output = list(ind, f),
group = list(size=12, level='covariate'),
settings = list(seed=12345))
print(head(res1$parameter))
## id gender k
## 1 1 F 0.3967964
## 2 2 M 0.1993954
## 3 3 F 0.3479398
## 4 4 M 0.2091361
## 5 5 F 0.2885420
## 6 6 M 0.1754243
fg <- merge(res1$f,res1$parameter)
plot(ggplot(data=fg) + geom_line(aes(x=time, y=f, group=id, colour=gender), size=0.5))
Remark: simulx
supports any alphanumeric (alphabetic and numeric) character to define the categories of a categorical variable in a block a block EQUATION
or in a bloc DEFINITION
.
The model for \(k_i\) can be represented in an equivalent manner as follows: \[ \begin{align} \tilde{k}_i & = \left\{ \begin{array}{ll} k_{\rm pop} & \text{if } \ \ {\rm gender}_i=M \\ k_{\rm pop} e^{\beta_F} & \text{if } \ \ {\rm gender}_i=F . \end{array} \right. \\ \log(k_i) &= \log(\tilde{k}_i)) + \eta_i \end{align} \]
This model is implemented in catcov1B.txt
:
[COVARIATE] input={p_F} DEFINITION: gender = { type=categorical, categories={F,M}, P(gender=F)=p_F } [INDIVIDUAL] input={k_pop, omega_k, gender, beta_F} gender={type=categorical,categories={F,M}} EQUATION: if gender==F k_pred = k_pop*exp(beta_F) else k_pred = k_pop end DEFINITION: k = {distribution=logNormal, prediction=k_pred, sd=omega_k} [LONGITUDINAL] input = {k} EQUATION: f = exp(-k*t)
Implementations catcov1A.txt
and catcov1B.txt
are equivalent.
Let us consider now an extension of the previous model, with a continuous covariate \(w\) and a categorical covariate \(trt\) with three categories \(TA\), \(TB\) and \(TC\). The reference category is \(B\) in this example.
This model is implemented in catcov2A.txt
:
[COVARIATE] input = {w_pop, omega_w, p_A, p_B} DEFINITION: w = { distribution=normal, mean=w_pop, sd=omega_w } trt = { type=categorical, categories={TA,TB,TC}, P(trt=TA)=p_A, P(trt=TB)=p_B } ;------------------------------------------- [INDIVIDUAL] input = {k_pop, omega_k, w, trt} trt = {type=categorical,categories={TA,TB,TC}} EQUATION: lw70 = log(w/70) DEFINITION: k = { distribution = lognormal, reference = k_pop, covariate = {lw70,trt}, coefficient = {0.2,{-0.6,0,0.8}}, sd = omega_k } ;------------------------------------------- [LONGITUDINAL] input = {k} EQUATION: f = exp(-k*t)
We will use simulx
for generating 200 individuals with different simulated covariates:
p <- c(w_pop=70, omega_w=15, p_A=0.2, p_B=0.4, k_pop=0.2, omega_k=0.05)
f <- list(name='f', time=seq(0, 30, by=0.1))
ind <- list(name=c('w','k','trt'))
res3 <- simulx(model = "model/catcov2A.txt",
parameter = p,
output = list(ind, f),
group = list(size=200, level='covariate'),
settings = list(seed=12345))
print(head(res3$parameter))
## id w k trt
## 1 1 71.23123 0.2094233 TB
## 2 2 82.76591 0.2065015 TB
## 3 3 69.54589 0.4343591 TC
## 4 4 63.05680 0.2002886 TB
## 5 5 76.70018 0.4033726 TC
## 6 6 34.97849 0.1630424 TB
fg<-merge(res3$f,res3$parameter)
plot(ggplot(data=fg) + geom_line(aes(x=time, y=f, group=id, colour=trt), size=0.5))
Until now, the categorical covariates were simulated using section [COVARIATE]
of the model Mlxtran
. Categorical covariates can also be used as inputs of the model.
FOr example, model catcov2B.txt
has only two sections [INDIVIDUAL]
and [LONGITUDINAL]
:
[INDIVIDUAL] input = {k_pop, omega_k, w, trt} trt = {type=categorical, categories={TA,TB,TC}} EQUATION: lw70 = log(w/70) DEFINITION: k = { distribution = lognormal, reference = k_pop, covariate = {lw70,trt}, coefficient = {0.2,{-0.6,0,0.8}}, sd = omega_k } [LONGITUDINAL] input = {k} EQUATION: f = exp(-k*t)
The values of the covariates \(w\) and \(trt\) for 6 individuals are defined in the following R
script. They are then added to the list of input parameters.
p.indiv <- inlineDataFrame("
id w trt
1 80 TA
2 60 TB
3 85 TC
4 55 TA
5 90 TB
6 60 TC
")
p.pop <- c(k_pop=0.2, omega_k=0.1)
f <- list(name='f', time=seq(0, 30, by=0.1))
ind <- list(name=c('w','k','trt'))
res4 <- simulx(model = "model/catcov2B.txt",
parameter = list(p.indiv, p.pop),
output = list(ind, f),
settings = list(seed=123))
print(res4$parameter)
## id w k trt
## 1 1 80 0.11119100 TA
## 2 2 60 0.18139348 TB
## 3 3 85 0.53408323 TC
## 4 4 55 0.08999034 TA
## 5 5 90 0.21648495 TB
## 6 6 60 0.40827956 TC
plot(ggplot(data=res4$f, aes(x=time, y=f, colour=id)) + geom_line(size=0.75))