The objective of this application is to give an introduction to population PK modelling. Here are the steps you should follow:

### Data & Model

Click on ** Project** to select the datafile and define the model.

- Use the tab
`dataset`

to select one of the 3 following PK examples- phenobarbital
- theophylline
- warfarin

- Use the tab
`model`

to define the model

Select one of the following structural models (all these models are one compartment PK models)

- (ka, V, Cl) first order absorption with linear elimination (model for oral administration)
- (Tk0, V, Cl) zero order absorption with linear elimination (model for oral administration)
- (Tlag, ka, V, Cl) lag, first order absorption with linear elimination (model for oral administration)
- (Tlag, Tk0, V, Cl) lag, zero order absorption with linear elimination (model for oral administration)
- (V, Cl) linear elimination (model for intravenous administration)

Select one of the 3 following residual error models (more information about residual error models can be found here. )

- constant: \(y = f + a \, e\)
- proportional: \(y = f + b \, f \, e\)
- combined: \(y = f + (a + b \, f) \, e\)

- Select which individual PK parameters vary within the population and which ones are fixed. Here, all the PK parameters are log-normally distributed.

- Select which individual PK parameters are function of the weight. Here weight is the only covariate used in the model for explaining part of the inter individual variability of the parameters. Since the individual parameters are log-normally distributed, we use the following model for parameter \(\psi_i\): \[\log(\psi_i) = \log(\psi_{\rm pop}) + \beta \log(w_i/w_{\rm pop}) + \eta_i\] where \(w_i\) is the weight of patient \(i\) and \(w_{\rm pop}\) a typical weight in the population (the median weight is used here).

### Tasks

Once the model is defined, you can run a reduced version of Monolix for

estimating the population parameters, the Fisher information matrix, the individual parameters and some information criteria,

performing statistical tests for the covariate model.

Estimation of the population parameters using the SAEM requires to provide some initial values and some settings. Use the tabs `initial values`

and `settings`

if you want to see and/or modify the default values.

See Section `Tasks and Methods`

for more details about the methods and the algorithms used in this application.

**Remark:** the project defined in this application (data, model, initial values, settings) is automatically converted into a Mlxtran script file (use the tab `mlxtran`

to display this model file). Monolix then uses this project file to run the sequence of predifined tasks.

### Results

At the end of the run, you can either display some tables with numerical results or some diagnostic plots.

\[ \newcommand{\deriv}[1]{\dot{#1}(t)} \def\iid{\mathop{\sim}_{\rm i.i.d.}} \newcommand{\prob}[1]{ \mathbb{P}\!(#1)} \newcommand{\by}{\boldsymbol{y}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bpsi}{\boldsymbol{\psi}} \newcommand{\pmacro}{\texttt{p}} \newcommand{\thmle}{\hat{\theta}} \]

### Definitions and notations

The models we are interested with are mixed effects models (see also this web animation), i.e. hierarchical models that involves different types of variables:

We call \(y_i=(y_{ij},1\leq j \leq n_i)\) the set of

*longitudinal data*recorded at times \((t_{ij},1\leq j \leq n_i)\) for subject \(i\), and \(\by\) the combined set of observations for all \(N\) individuals: \(\by = (y_1,\ldots,y_N)\).We write \(\psi_i\) for the vector of

*individual parameters*for individual \(i\) and \(\bpsi\) the set of individual parameters for all \(N\) individuals: \(\bpsi = (\psi_1,\ldots,\psi_N)\).The distribution of the individual parameters \(\psi_i\) of subject \(i\) may depend on a vector of

*individual covariates*\(c_i\). Then, let \(\bc = (c_1,c_2,\ldots , c_N)\). Here, the only available covariate is the weight.In a population approach context, we call \(\theta\) the vector of

*population parameters*.

Considering observations and the individual parameters as random variables, the joint probability distribution of \(\by\) and \(\bpsi\) can be decomposed as follows:

\[ \pmacro(\by,\bpsi ; \bc,\theta) = \pmacro(\by|\bpsi,\theta)\pmacro(\bpsi|\bc,\theta)\]

You can run 2 tasks with this application:

`Model fitting`

: the application will- estimate the population parameters,
- estimate the Fisher information matrix and the standard errors of the estimates,
- estimate the individual parameters,
- estimate the observed likelihood and some information criteria,
- display the numerical results in several tables,
- generate several diagnostic plots.

`Covariate test`

: the application will use several statistical criteria for comparing 2 covariate models.

### Model Fitting

#### Estimation of the population parameters

*Maximum likelihood estimation* of the vector of population parameters \(\theta\) consists of maximizing with respect to \(\theta\) the *observed likelihood function* defined by
\[{\cal L}_{\by}(\theta) = \pmacro(\by ; \theta) = \int \pmacro(\by,\bpsi ;\theta) \, d \bpsi .
\]

The stochastic approximation expectation-maximization (SAEM) algorithm as implemented in Monolix for computing the maximum likelihood (ML) estimate has appealing practical and theoretical properties (convergence of SAEM has been rigorously proved).

It is important to keep in mind some of its key properties:

SAEM is an iterative algorithm:

- Initial estimates must be provided by the user. Even though SAEM is relatively insensitive to initial parameter values, it is always preferable to provide as good as possible initial values to minimize the number of iterations required and also increase the probability of converging to the global maximum of the likelihood.

Default initial estimates can be modified with the tab`Initial values`

. - SAEM as implemented in \monolix has two phases. The goal of the first is to get to a neighborhood of the solution in only a few iterations. A
*simulated annealing*version of SAEM accelerates this process when the initial value is far from the actual solution. The second phase consists of convergence to the located maximum with behavior that becomes increasingly deterministic, like a gradient algorithm.

Default settings for SAEM can be modified with the tab`Settings`

.

- Initial estimates must be provided by the user. Even though SAEM is relatively insensitive to initial parameter values, it is always preferable to provide as good as possible initial values to minimize the number of iterations required and also increase the probability of converging to the global maximum of the likelihood.
SAEM is a stochastic algorithm:

- We cannot claim that SAEM
*always*converges to the global maximum of the likelihood. We can only say that it converges under quite general hypotheses to a maximum – global or perhaps local – of the likelihood. A large number of simulation studies have shown that SAEM converges with high probability to a “good'' solution – it is hoped the global maximum – after a small number of iterations. - The trajectory of the outputs of SAEM depends on the sequence of random numbers used by the algorithm. This sequence is entirely determined by the seed.'' In this way, two runs of SAEM using the same seed will produce exactly the same results. If different seeds are used, the trajectories will be different but convergence occurs to the same solution under quite general hypotheses.

- We cannot claim that SAEM

#### Estimation of the Fisher information matrix

The variance of the maximum likelihood estimate \(\thmle\), and thus confidence intervals, can be derived from the observed Fisher information matrix (FIM), itself derived from the observed likelihood (i.e., the pdf of observations \(\by\)): \[I_{\by}(\thmle) \ = \ - \frac{\partial^2}{\partial \theta^2} \log({\cal L}_{\by}(\theta)).\]

The variance-covariance matrix of \(\thmle\) can then be estimated by the inverse of the observed FIM. Standard errors (s.e.) for each component of \(\thmle\) are the standard deviations, i.e., the square root of the diagonal elements of the variance-covariance matrix.

A stochastic approximation algorithm using a Markov chain Monte Carlo (MCMC) algorithm is implemented in Monolix for estimating the FIM.

Assume that the volume \(V_i\) is function of the weight \(w_i\) in the model: \[\log(Vi_i) = \log(V_{\rm pop}) + \beta_V \log(w_i/w_{\rm pop}) + \eta_{V,i}.\] Then, We can use the estimated standard error \(\widehat{\rm s.e.}(\hat{\beta}_V)\) to perform a Wald test of whether \(\beta_V = 0\). In fact, under this hypothesis, the distribution of the Wald statistic \(\hat{\beta}_V/\widehat{\rm s.e.}(\hat{\beta}_V)\) can be approximated by a normal distribution with mean 0 and variance 1. The \(p\)-value of the Wald test displayed in the table is the value of \(\prob{|{\mathcal N}(0,1)| > |\hat{\beta}_V|/\widehat{\rm s.e.}(\hat{\beta}_V)}\).

#### Estimation of the individual parameters

Once the vector of population parameters \(\theta\) has been estimated, the conditional distribution \(\pmacro(\psi_i | y_i ; \hat{\theta})\) can be used for estimating the vector of individual parameters \(\psi_i\).

The conditional mode \(\hat{\psi}_i^{\rm mode}\) maximizes this conditional distribution:

\[\hat{\psi}_i^{\rm mode} = {\rm arg}\max_{\psi_i}\pmacro(\psi_i | y_i ; \hat{\theta}).\]

**Remark:** Conditional modes are also called empirical Bayes estimates (EBEs), or maximum a posteriori (MAP) estimates.

By default, Monolix uses the conditional mode for computing predictions, taking the philosophy that the most likely values of the individual parameters are the most suited for computing the most likely predictions.

When individual data are sparse, the distribution of conditional modes/EBEs can shrink'' towards the same population value, and as a direct consequence, resulting diagnostics for the distribution of the random effects can be misleading. On the other hand, random effects randomly sampled from their conditional distributions leads to more reliable results.

For each \(i\), the MCMC algorithm generates a sequence \((\eta_i^{(k)}, 1 \leq k \leq K)\) which converges in distribution to the conditional distribution \(\pmacro(\eta_i | y_i ; \hat{\theta})\) and can be used for estimating any summary statistic of it (mean, standard deviation, quantiles, etc.).

#### Estimation of the log-likelihood

Performing likelihood ratio tests and computing information criteria for a given model requires computation of the *observed log-likelihood*
\[
{\cal LL}_{\by}(\thmle) = \log({\cal L}_{\by}(\thmle)) = \log(\pmacro(\by;{\thmle})) ,
\]
where \(\thmle\) is the vector of population parameter estimates for the model being considered.

The log-likelihood cannot be computed in closed form for nonlinear mixed effects models. It can however be estimated in a general framework for all kinds of data and models using the importance sampling Monte Carlo method implemented in Monolix. This method has the advantage of providing an unbiased estimate of the log-likelihood – even for nonlinear models – whose variance can be controlled by the Monte Carlo size.

Statistical tools for model selection are then based on the likelihood: information criteria such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), and hypothesis tests such as the likelihood ratio test (LRT).

### Covariate model test

A covariate model is defined as a sequence of 0 and 1. Assume, for instance that model (Tk0, V, Cl) is used. Then,

`[0 0 0]`

means that none of the PK parameters are function of weight,`[0 1 0]`

means that only volume V is function of weight,`[0 1 1]`

means that both volume V and clearance Cl are function of weight.

When the two models to compare are defined, Monolix returns the model selected according to several criteria:

- AIC
- BIC
- Likelihood ratio test (LRT)

**Remark:** The two covariate models shoud be *nested* for performing the likelihood ratio test.

\[ \newcommand{\deriv}[1]{\dot{#1}(t)} \def\iid{\mathop{\sim}_{\rm i.i.d.}} \newcommand{\prob}[1]{ \mathbb{P}\!(#1)} \newcommand{\by}{\boldsymbol{y}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bpsi}{\boldsymbol{\psi}} \newcommand{\pmacro}{\texttt{p}} \newcommand{\thmle}{\hat{\theta}} \]

### Tables

Several tables are produced:

#### Population parameters

For each population parameter of the model, the table displays:

- the value of the maximum likelihood estimate
- the standard error of the estimate
- the relative standard error = s.e./value*100
- the p-value of the Wald test for testing if a coefficient \(\beta=0\) (only for the \(\beta\)'s).

#### Correlation matrix of the estimates

The variance-covariance matrix \(C = (c_{k\ell})\) of the ML estimate of \(\theta\) is computed as the inverse of the estimated Fisher information matrix.

Then, the correlation matrix \(R = (\rho_{k\ell})\) is derived from this covariance matrix: \(\rho_{k\ell} = c_{k\ell}/\sqrt{c_{kk}c_{\ell\ell}}\).

#### Log-likelihood

The log-likelihood \({\cal LL}_{\by}(\thmle)\), the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are displayed.

AIC and BIC are defined by \[\text{AIC} = - 2 {\cal LL}_{\by}(\thmle) + 2 P \] \[\text{BIC} = - 2 {\cal LL}_{\by}(\thmle) + \log(N) P \] where \(P\) is the total number of parameters to be estimated and \(N\) the number of subjects.

#### Predictions

The model for the PK data is a model for continuous data with the following mathematical representation \[y_{ij} = f(t_{ij} , \psi_i) + g(t_{ij} , \psi_i)e_{ij}\] where \(f\) is the structural PK model and \(\psi_i\) the vector of PK parameters for individual \(i\).

Then, the concentration at time \(t_{ij}\) can be predicted by \[ \tilde{y}_{ij} = f(t_{ij} , \hat{\psi}_i))\] where \(\hat{\psi}_i\) is an estimate of \(\psi_i\). The residual error at time \(t_{ij}\) is then defined by \[\tilde{e}_{ij} = (y_{ij} - f(t_{ij} , \hat{\psi}_i)/g(t_{ij} , \hat{\psi}_i).\]

The choice of \(\hat{\psi}\) is not unique:

the conditional mode (or EBE) \(\hat{\psi}_i^{\rm mode}\) maximizes the conditional distribution \[\hat{\psi}_i^{\rm mode} = {\rm arg}\max_{\psi_i}\pmacro(\psi_i | y_i ; \hat{\theta}).\] In other word, \(\hat{\psi}_i^{\rm mode}\) is the most likely value of \(\psi_i\) given the observed concentrations \((y_{ij})\) and the parameter estimate \(\hat{\theta}\).

the predicted parameter \(\hat{\psi}_i^{\rm pred}\) depends on the individual covariates, i.e. the weight of individual \(i\) and ignore the random effects.

Let assume that \(\psi_i\) is a PK parameter function of the weight \(w_i\): \[\log(\psi_i) = \log(\psi_{\rm pop}) + \beta \log(w_i/w_{\rm pop}) + \eta_i\] where \(w_{\rm pop}\) is a typical weight in the population (the median is used here). Then, \[\hat{\psi}_i^{\rm pred} = \hat{\psi}_{\rm pop}(w_i/w_{\rm pop})^\hat{\beta} \] We can notice that individuals who share the same covariates will also share the same predicted parameter.

- the population parameter \(\hat{\psi}_i^{\rm pop} = \hat{\psi}_{\rm pop}\) is the typical parameter value in the population. This value is also the predicted parameter value for an individual with typical weight \(w_{\rm pop}\).

The application displays a table with:

- the observation time points \((t_{ij}\))
- the measured concentrations \((y_{ij}\))
- the predicted concentrations computed using the conditional modes/EBEs \((\hat{\psi}_i^{\rm mode})\),
- the predicted concentrations computed using the predicted parameters \((\hat{\psi}_i^{\rm pred})\),
- the predicted concentrations computed using the population parameters \(\hat{\psi}_{\rm pop}\).
- the three sequences of residuals errors computed with these three sequences of predictions.

The best fit is expected to be obtained with \(\hat{\psi}_i^{\rm mode}\) since both the random effects and the individual covariates are used to explain and describe the inter individual variability (IIV) of the PK data. The worst fit is expected to be obtain with the population parameter \(\hat{\psi}_{\rm pop}\) that ignore any source of IIV while \(\hat{\psi}_i^{\rm pred}\) is somewhere “in between'' (only the weight is used to explain the IIV).

### Diagnostic plots

#### Individual fits

The observed concentrations \((y_{ij})\) can be displayed for each individual, together with predicted concentrations \(f(t,\hat{\psi}_i)\). Here, the predicted concentrations are computed on a fine grid, in order to display smooth functions of time.

Use the checkboxes to select which estimate(s) \(\hat{\psi}_i\) to use, i.e. which prediction(s) to display.

#### Observations versus predictions

The predicted concentrations \(f(t,\hat{\psi}_i)\) are displayed on the x-axis and the observed concentrations \((y_{ij})\) on the y-axis.

Use the radiobuttons to select which estimate \(\hat{\psi}_i\) to use, i.e. which prediction to display.

#### Residual errors

The residual errors \((\tilde{e}_{ij})\) are displayed either as function of the time points \((t_{ij})\) or the predicted concentrations \((f(t,\hat{\psi}_i))\).

Use the radiobuttons to select which estimate \(\hat{\psi}_i\) to use, i.e. which sequence of residual error to display.

#### Random effects

The empirical distributions of the estimated random effects are displayed using boxplots.

You can select between the conditional modes/EBEs of the random effects, or random effects randomly sampled from the conditional distributions \(\pmacro(\psi_i | y_i ; \hat{\theta})\).

When individual data are sparse, the distribution of conditional modes/EBEs can shrink'' towards 0, and as a direct consequence, resulting diagnostics for the distribution of the random effects can be misleading. On the other hand, random effects randomly sampled from their conditional distributions leads to more reliable results.

#### Convergence of SAEM

Each subplot displays the sequence of estimates computed by SAEM along the iterations.

The algorithm is designed to converge quickly - and randomly - to a neighborhood of the solution during the first iterations (before the vertical red line). Then, SAEM is expected to converge toward the solution (the maximum likelihood estimate of \(\theta\)).

### Introduction

Several diagnostic plots are available to qualitatively evaluate a given model, and quantitative criteria for comparing potential models. These must now all be used effectively to build a model. Defining an optimal strategy for model building is far from easy because a model is the assembled product of numerous components that need to been evaluated and perhaps improved: the structural model, residual error model, covariate model, covariance model, etc.

#### Diagnostic plots

The objective of a diagnostic plot is twofold:

- first we want to visually check if the assumptions made on the model are valid or not ;
- then, if some assumptions are rejected, we want to get some guidance on how to improve the model.

Model diagnostics is therefore used to eliminate model candidates that do not seem capable of reproducing the observed data. In such process of model building, by definition, none of the features of the final model should be rejected.

As is the usual case in statistics, it is not because this final model has not been rejected that it is necessarily the “true'' one. All that we can say is that the experimental data does not allow us to reject it. It is merely one of perhaps many models that cannot be rejected.

#### Model selection criteria

When several possible models are retained, we can try to select the best'' one (or best ones if no leader distinguishes itself from the rest). This means developing a model selection process which allows us to compare models. But with what criteria?

In a purely explanatory context, Occam's razor is a useful parsimony principle which states that among competing hypotheses, the one with the fewest assumptions should be selected. In the modeling context, this means that among valid competing models, the most simple one should be selected.

Model diagnostic tools are for the most part graphical (we see'' when something is not right between a chosen model and the data it is hypothesized to describe). Model selection tools, on the other hand, are analytical: we calculate the value of some criterion that allows us to compare models with each other.

Statistical tools proposed in this application for model selection include

information criteria that we aim to minimize:

- the Akaike information criterion (AIC)
- the Bayesian information criterion (BIC)

hypothesis tests:

- the Wald test for testing if a coefficient \(\beta_V\) is null, i.e. for comparing models with and without covariate effect on some parameters.
- likelihood ratio test (LRT) for nested models.

### Examples

We will use the warfarin PK data to illustrate the use of diagnostic plots and model selection criteria.

Fit the PK model (V, Cl)

- Which diagnostic plots allow you to reject this hypothesis about the structural model?
- Do you need to look at any objective function (log-likelihood, AIC, BIC…) for taking your decision about the model?

Fit the PK model (ka, V, Cl)

- Does the individual fits plot allow you to reject this model?
- Look at the Observations versus predictions'' plot ? What information about the PK model can we obtain from this plot?

Fit the PK model (Tlag, ka, V, Cl)

- Look at the Observations versus predictions'' plot
- Look at the distribution of the random effects. What happens if you select EBEs?

Compare PK models with first order and zero order absorption processes

- Looking at the diagnostic plots
- Looking at information criteria

Compare different error models

- Looking at the diagnostic plots
- Looking at information criteria

Use the weight to explain part of the variability of some PK parameters

- ka function of weight: should we select this new model?
- V function of weight: should we select this new model?

Compare the model where only V is function of weight with the model where both V and Cl are function of weights

- Using the Wald test
- Using the tool for covariate model test.

In conclusion, what would be the final model'' for this PK data?

This shiny application has been developed by:

Francois-Marie Floch & Marc Lavielle,

Inria Saclay & Ecole Polytechnique, Xpop team

June 6th, 2016

This shiny application uses a C++ version of the Monolix Suite developed by Lixoft

### Monolix project

### Summary

### Scale

### Dataset

### Model

### Initial values

### Settings

### Fits

### Display

### Predictions

### Residuals

### x-axis

### Random effects

### Display

### Individual estimates

### Linear regression

### Hypothesis 0

### Hypothesis 1

### Covariate test

#### Predicted concentrations and normalized residual errors computed using:

(pred1, err1): the individual parameters

(pred2, err2): the population parameters and the individual covariates

(pred3, err3): the population parameters and the population covariates