# Difference between revisions of "Fast derivatives of likelihood functionals for ODE based models using adjoint-state method"

(Created page with "__NUMBEREDHEADINGS__ == Paper name == Melicher, V., Haber, T. & Vanroose, W. [https://doi.org/10.1007/s00180-017-0765-8 Fast derivatives of likelihood functionals for ODE base...") |
|||

(15 intermediate revisions by the same user not shown) | |||

Line 1: | Line 1: | ||

__NUMBEREDHEADINGS__ | __NUMBEREDHEADINGS__ | ||

− | == | + | === Citation === |

Melicher, V., Haber, T. & Vanroose, W. [https://doi.org/10.1007/s00180-017-0765-8 Fast derivatives of likelihood functionals for ODE based models using adjoint-state method]. Comput Stat 32, 1621–1643 (2017). | Melicher, V., Haber, T. & Vanroose, W. [https://doi.org/10.1007/s00180-017-0765-8 Fast derivatives of likelihood functionals for ODE based models using adjoint-state method]. Comput Stat 32, 1621–1643 (2017). | ||

+ | |||

+ | === Summary === | ||

+ | |||

+ | In this paper, the adjoint-state method (ASM) for computation of the gradient and the Hessian of likelihood functionals for time series data modelled by ordinary differential equations (ODEs) derived and analyzed. Discrete data and the continuous model are interfaced on the level of likelihood functional, using the concept of point-wise distributions. | ||

+ | |||

+ | This alternative approach is compared to sensitivity equations (SE) and finite differences. | ||

+ | |||

+ | === Study outcomes === | ||

+ | ==== Outcome O1 ==== | ||

+ | When using ASM for computing the '''gradient''' of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of ASM. | ||

+ | |||

+ | Outcome O1 is presented as Figure 1 in the original publication. | ||

+ | |||

+ | ==== Outcome O2 ==== | ||

+ | When using ASM for computing the '''gradient''' of linear (and diagonal) ODE models, the acceleration of ASM compared to SE declines exponentially with the number of data points. So, the more observations of the problem, the less beneficial is the use of ASM. For the maximal number of data points investigated here, both procedures have same speed. | ||

+ | |||

+ | Outcome O2 is presented as Figure 2 in the original publication. | ||

+ | |||

+ | ==== Outcome O3 ==== | ||

+ | When using ASM for computing the '''Hessian''' of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of adjoints. FD adjoints is slightly faster than SE adjoints. | ||

+ | |||

+ | Accuracy of adjoint FD is not as high as adjoint SE but usually sufficient. | ||

+ | |||

+ | Outcome O3 is presented as Figure 3 in the original publication. | ||

+ | |||

+ | ==== Outcome O4 ==== | ||

+ | When using ASM for computing the '''gradient''' of one specific nonlinear ODE model, the factor of acceleration of ASM compared to SE declines exponentially with the number of data points. At roughly n = 6, the two methods are equally fast. For smaller n, ASM is preferable. For higher n, SE is preferable. | ||

+ | |||

+ | Outcome O4 is presented as Figure 4 in the original publication. | ||

+ | |||

+ | ==== Outcome O5 ==== | ||

+ | When using ASM for computing the '''Hessian''' of one specific nonlinear ODE model, the factor of acceleration of ASM compared to both SE and FD declines exponentially with the number of data points. | ||

+ | Adjoint FD is faster than adjoint SE. Adjoint FD is therefore preferable over adjoint SE. | ||

+ | At roughly n = 7, adjoint FD and FD are equally fast. For smaller n, adjoint FD is preferable. For higher n, FD is preferable. | ||

+ | |||

+ | Outcome O5 is presented as Figure 5 in the original publication. | ||

+ | |||

+ | ==== Further outcomes ==== | ||

+ | # The implementation of SE approach is so efficient, that it renders the finite difference approximation practically obsolete, due to its superior accuracy. | ||

+ | # The ASM efficiency is dependent on the number of measurement times, which is not the case for SE approach. | ||

+ | |||

+ | === Study design and evidence level === | ||

+ | ==== General aspects ==== | ||

+ | * They use CVODES solver from the SUNDIALS | ||

+ | * As a modelling toolbox, they use ''DiffMEM'' which is originated is mixed effects modelling. It is a C-library with R and Python interfaces provided. | ||

+ | * Accuracies (Tolerances) of the ODE solver are provided and claimed to be sufficient to have no effect on results. | ||

+ | |||

+ | Critical comments: | ||

+ | * The linear model is diagonal. Is this a relevant setting? | ||

+ | * There was only one ODE model from an experimental setting investigated. Parameters were varied within a very small window of variation. | ||

+ | |||

+ | ==== Design for Outcome O1 ==== | ||

+ | * ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > p<sub>i</sub> > -1.1. | ||

+ | * Dimensionality of problem is varied between 2 and 122 with 13 different dimensions. | ||

+ | * Synthetic data: 11 equidistant measurements between 0 and 100. | ||

+ | * 100 repetitions to estimate variance of the performance. | ||

+ | * Gaussian noise with variance 1% of maximum prediction. | ||

+ | * Quantities of interest: How long and how accurate is gradient calculation? | ||

+ | |||

+ | ==== Design for Outcome O2 ==== | ||

+ | * ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > p<sub>i</sub> > -1.1. | ||

+ | * Dimensionality of problem is fixed to 50 dimensions. | ||

+ | * Synthetic data: The number of time observations fluctuates between 2 and 122 in 12 steps 11 equidistant measurements between 0 and 100. | ||

+ | * 100 repetitions to estimate variance of the performance. | ||

+ | * Gaussian noise with variance 1% of maximum prediction. | ||

+ | * Quantities of interest: How long and how accurate is gradient calculation? | ||

+ | |||

+ | ==== Design for Outcome O3 ==== | ||

+ | * Basically the same as the design for outcome O1, but instead of gradients, Hessians are now investigated. | ||

+ | |||

+ | ==== Design for Outcome O4 ==== | ||

+ | * Latent dynamic HIV model from Lavielle et al. (2011). It consists of 6 states, 11 parameters and 2 observables. | ||

+ | * Parameters are randomly perturbed within 5% deviation. | ||

+ | * Other procedures are identical to O2, i.e. number of observations is changed from 2 to 122. | ||

+ | * Quantities of interest: How long and how accurate is gradient calculation? | ||

+ | |||

+ | ==== Design for Outcome O5 ==== | ||

+ | |||

+ | * Using the HIV model and vary the number of observations (as in Outcome O4). | ||

+ | * Investigate the difference from the methods (as in Outcome O2). | ||

+ | * Quantities of interest: How long and how accurate is Hessian calculation? | ||

+ | |||

+ | === Further comments and aspects === | ||

+ | |||

+ | * Quite a lot of calculus is provided, also in the appendix. | ||

+ | * Rather abstract problem formulation, i.e. it is not specifically designed for systems biology modeling problems. | ||

+ | * <q>For models with a high-number of parameters and a small number of measurement times, the ASM is a clear winner.</q> | ||

+ | * Special case of linear and diagonal system. How do results translate to non-diagonal linear systems? | ||

+ | |||

+ | === References === | ||

+ | The list of cited or related literature is placed here. |

## Latest revision as of 15:20, 25 February 2020

## Contents

### 1 Citation

Melicher, V., Haber, T. & Vanroose, W. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Comput Stat 32, 1621–1643 (2017).

### 2 Summary

In this paper, the adjoint-state method (ASM) for computation of the gradient and the Hessian of likelihood functionals for time series data modelled by ordinary differential equations (ODEs) derived and analyzed. Discrete data and the continuous model are interfaced on the level of likelihood functional, using the concept of point-wise distributions.

This alternative approach is compared to sensitivity equations (SE) and finite differences.

### 3 Study outcomes

#### 3.1 Outcome O1

When using ASM for computing the **gradient** of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of ASM.

Outcome O1 is presented as Figure 1 in the original publication.

#### 3.2 Outcome O2

When using ASM for computing the **gradient** of linear (and diagonal) ODE models, the acceleration of ASM compared to SE declines exponentially with the number of data points. So, the more observations of the problem, the less beneficial is the use of ASM. For the maximal number of data points investigated here, both procedures have same speed.

Outcome O2 is presented as Figure 2 in the original publication.

#### 3.3 Outcome O3

When using ASM for computing the **Hessian** of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of adjoints. FD adjoints is slightly faster than SE adjoints.

Accuracy of adjoint FD is not as high as adjoint SE but usually sufficient.

Outcome O3 is presented as Figure 3 in the original publication.

#### 3.4 Outcome O4

When using ASM for computing the **gradient** of one specific nonlinear ODE model, the factor of acceleration of ASM compared to SE declines exponentially with the number of data points. At roughly n = 6, the two methods are equally fast. For smaller n, ASM is preferable. For higher n, SE is preferable.

Outcome O4 is presented as Figure 4 in the original publication.

#### 3.5 Outcome O5

When using ASM for computing the **Hessian** of one specific nonlinear ODE model, the factor of acceleration of ASM compared to both SE and FD declines exponentially with the number of data points.
Adjoint FD is faster than adjoint SE. Adjoint FD is therefore preferable over adjoint SE.
At roughly n = 7, adjoint FD and FD are equally fast. For smaller n, adjoint FD is preferable. For higher n, FD is preferable.

Outcome O5 is presented as Figure 5 in the original publication.

#### 3.6 Further outcomes

- The implementation of SE approach is so efficient, that it renders the finite difference approximation practically obsolete, due to its superior accuracy.
- The ASM efficiency is dependent on the number of measurement times, which is not the case for SE approach.

### 4 Study design and evidence level

#### 4.1 General aspects

- They use CVODES solver from the SUNDIALS
- As a modelling toolbox, they use
*DiffMEM*which is originated is mixed effects modelling. It is a C-library with R and Python interfaces provided. - Accuracies (Tolerances) of the ODE solver are provided and claimed to be sufficient to have no effect on results.

Critical comments:

- The linear model is diagonal. Is this a relevant setting?
- There was only one ODE model from an experimental setting investigated. Parameters were varied within a very small window of variation.

#### 4.2 Design for Outcome O1

- ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > p
_{i}> -1.1. - Dimensionality of problem is varied between 2 and 122 with 13 different dimensions.
- Synthetic data: 11 equidistant measurements between 0 and 100.
- 100 repetitions to estimate variance of the performance.
- Gaussian noise with variance 1% of maximum prediction.
- Quantities of interest: How long and how accurate is gradient calculation?

#### 4.3 Design for Outcome O2

- ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > p
_{i}> -1.1. - Dimensionality of problem is fixed to 50 dimensions.
- Synthetic data: The number of time observations fluctuates between 2 and 122 in 12 steps 11 equidistant measurements between 0 and 100.
- 100 repetitions to estimate variance of the performance.
- Gaussian noise with variance 1% of maximum prediction.
- Quantities of interest: How long and how accurate is gradient calculation?

#### 4.4 Design for Outcome O3

- Basically the same as the design for outcome O1, but instead of gradients, Hessians are now investigated.

#### 4.5 Design for Outcome O4

- Latent dynamic HIV model from Lavielle et al. (2011). It consists of 6 states, 11 parameters and 2 observables.
- Parameters are randomly perturbed within 5% deviation.
- Other procedures are identical to O2, i.e. number of observations is changed from 2 to 122.
- Quantities of interest: How long and how accurate is gradient calculation?

#### 4.6 Design for Outcome O5

- Using the HIV model and vary the number of observations (as in Outcome O4).
- Investigate the difference from the methods (as in Outcome O2).
- Quantities of interest: How long and how accurate is Hessian calculation?

### 5 Further comments and aspects

- Quite a lot of calculus is provided, also in the appendix.
- Rather abstract problem formulation, i.e. it is not specifically designed for systems biology modeling problems.
For models with a high-number of parameters and a small number of measurement times, the ASM is a clear winner.

- Special case of linear and diagonal system. How do results translate to non-diagonal linear systems?

### 6 References

The list of cited or related literature is placed here.