Difference between revisions of "Fast derivatives of likelihood functionals for ODE based models using adjoint-state method"

Line 11: Line 11:
 
=== Study outcomes ===
 
=== Study outcomes ===
 
==== Outcome O1 ====
 
==== Outcome O1 ====
When using ASM for computing the gradient of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of ASM.
+
When using ASM for computing the '''gradient''' of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of ASM.
  
 
Outcome O1 is presented as Figure 1 in the original publication.  
 
Outcome O1 is presented as Figure 1 in the original publication.  
  
 
==== Outcome O2 ====
 
==== Outcome O2 ====
When using ASM for computing the gradient of linear (and diagonal) ODE models, the acceleration of ASM compared to SE declines exponentially with the number of data points. So, the more observations of the problem, the less beneficial is the use of ASM. For the maximal number of data points investigated here, both procedures have same speed.
+
When using ASM for computing the '''gradient''' of linear (and diagonal) ODE models, the acceleration of ASM compared to SE declines exponentially with the number of data points. So, the more observations of the problem, the less beneficial is the use of ASM. For the maximal number of data points investigated here, both procedures have same speed.
  
 
Outcome O2 is presented as Figure 2 in the original publication.
 
Outcome O2 is presented as Figure 2 in the original publication.
  
 
==== Outcome O3 ====
 
==== Outcome O3 ====
When using ASM for computing the Hessian of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of adjoints. FD adjoints is slightly faster than SE adjoints.
+
When using ASM for computing the '''Hessian''' of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of adjoints. FD adjoints is slightly faster than SE adjoints.
  
 
Accuracy of adjoint FD is not as high as adjoint SE but usually sufficient.
 
Accuracy of adjoint FD is not as high as adjoint SE but usually sufficient.
Line 28: Line 28:
  
 
==== Outcome O4 ====
 
==== Outcome O4 ====
When using ASM for computing the gradient of one specific nonlinear ODE model, the factor of acceleration of ASM compared to SE declines exponentially with the number of data points. At roughly n = 6, the two methods are equally fast. For smaller n, ASM is preferable. For higher n, SE is preferable.  
+
When using ASM for computing the '''gradient''' of one specific nonlinear ODE model, the factor of acceleration of ASM compared to SE declines exponentially with the number of data points. At roughly n = 6, the two methods are equally fast. For smaller n, ASM is preferable. For higher n, SE is preferable.  
  
 
Outcome O4 is presented as Figure 4 in the original publication.
 
Outcome O4 is presented as Figure 4 in the original publication.
  
 
==== Outcome O5 ====
 
==== Outcome O5 ====
When using ASM for computing the Hessian of one specific nonlinear ODE model, the factor of acceleration of ASM compared to both SE and FD declines exponentially with the number of data points.  
+
When using ASM for computing the '''Hessian''' of one specific nonlinear ODE model, the factor of acceleration of ASM compared to both SE and FD declines exponentially with the number of data points.  
 
Adjoint FD is faster than adjoint SE. Adjoint FD is therefore preferable over adjoint SE.
 
Adjoint FD is faster than adjoint SE. Adjoint FD is therefore preferable over adjoint SE.
 
At roughly n = 7, adjoint FD and FD are equally fast. For smaller n, adjoint FD is preferable. For higher n, FD is preferable.  
 
At roughly n = 7, adjoint FD and FD are equally fast. For smaller n, adjoint FD is preferable. For higher n, FD is preferable.  
Line 48: Line 48:
 
* As a modelling toolbox, they use ''DiffMEM'' which is originated is mixed effects modelling. It is a C-library with R and Python interfaces provided.
 
* As a modelling toolbox, they use ''DiffMEM'' which is originated is mixed effects modelling. It is a C-library with R and Python interfaces provided.
 
* Accuracies (Tolerances) of the ODE solver are provided and claimed to be sufficient to have no effect on results.
 
* Accuracies (Tolerances) of the ODE solver are provided and claimed to be sufficient to have no effect on results.
* Special case of linear and diagonal system. How do results translate to non-diagonal linear systems?
 
  
 
==== Design for Outcome O1 ====
 
==== Design for Outcome O1 ====
Line 56: Line 55:
 
* 100 repetitions to estimate variance of the performance.
 
* 100 repetitions to estimate variance of the performance.
 
* Gaussian noise with variance 1% of maximum prediction.
 
* Gaussian noise with variance 1% of maximum prediction.
 +
* How long and how accurate is gradient calculation?
  
 
==== Design for Outcome O2 ====
 
==== Design for Outcome O2 ====
Line 63: Line 63:
 
* 100 repetitions to estimate variance of the performance.
 
* 100 repetitions to estimate variance of the performance.
 
* Gaussian noise with variance 1% of maximum prediction.
 
* Gaussian noise with variance 1% of maximum prediction.
 +
* How long and how accurate is gradient calculation?
  
 
==== Design for Outcome O3 ====
 
==== Design for Outcome O3 ====
Basically the same as the design for outcome O1.
+
* Basically the same as the design for outcome O1, but instead of gradients, Hessians are now investigated.
  
 
==== Design for Outcome O4 ====
 
==== Design for Outcome O4 ====
Line 71: Line 72:
 
* Parameters are randomly perturbed within 5% deviation.
 
* Parameters are randomly perturbed within 5% deviation.
 
* Other procedures are identical to O2, i.e. number of observations is changed from 2 to 122.
 
* Other procedures are identical to O2, i.e. number of observations is changed from 2 to 122.
 +
* How long and how accurate is gradient calculation?
  
 
==== Design for Outcome O5 ====
 
==== Design for Outcome O5 ====
  
* Using the model of design for Outcome O4 and the study setup of design of Outcome O2.
+
* Using the HIV model and vary the number of observations (as in Outcome O4).
 +
* Investigate the difference from the methods (as in Outcome O2).
 +
* How long and how accurate is Hessian calculation?
  
 
=== Further comments and aspects ===
 
=== Further comments and aspects ===
Line 81: Line 85:
 
* Rather abstract problem formulation, i.e. it is not specifically designed for systems biology modeling problems.
 
* Rather abstract problem formulation, i.e. it is not specifically designed for systems biology modeling problems.
 
* "For models with a high-number of parameters and a small number of measurement times, the ASM is a clear winner."
 
* "For models with a high-number of parameters and a small number of measurement times, the ASM is a clear winner."
 +
* Special case of linear and diagonal system. How do results translate to non-diagonal linear systems?
  
 
=== References ===
 
=== References ===
 
The list of cited or related literature is placed here.
 
The list of cited or related literature is placed here.

Revision as of 14:20, 25 February 2020

1 Citation

Melicher, V., Haber, T. & Vanroose, W. Fast derivatives of likelihood functionals for ODE based models using adjoint-state method. Comput Stat 32, 1621–1643 (2017).

2 Summary

In this paper, the adjoint-state method (ASM) for computation of the gradient and the Hessian of likelihood functionals for time series data modelled by ordinary differential equations (ODEs) derived and analyzed. Discrete data and the continuous model are interfaced on the level of likelihood functional, using the concept of point-wise distributions.

This alternative approach is compared to sensitivity equations (SE) and finite differences.

3 Study outcomes

3.1 Outcome O1

When using ASM for computing the gradient of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of ASM.

Outcome O1 is presented as Figure 1 in the original publication.

3.2 Outcome O2

When using ASM for computing the gradient of linear (and diagonal) ODE models, the acceleration of ASM compared to SE declines exponentially with the number of data points. So, the more observations of the problem, the less beneficial is the use of ASM. For the maximal number of data points investigated here, both procedures have same speed.

Outcome O2 is presented as Figure 2 in the original publication.

3.3 Outcome O3

When using ASM for computing the Hessian of linear (and diagonal) ODE models, the speed increase goes linearly with the number of states which is here equivalent to the number of parameters. So, the higher dimensionality of the problem, the more beneficial is the use of adjoints. FD adjoints is slightly faster than SE adjoints.

Accuracy of adjoint FD is not as high as adjoint SE but usually sufficient.

Outcome O3 is presented as Figure 3 in the original publication.

3.4 Outcome O4

When using ASM for computing the gradient of one specific nonlinear ODE model, the factor of acceleration of ASM compared to SE declines exponentially with the number of data points. At roughly n = 6, the two methods are equally fast. For smaller n, ASM is preferable. For higher n, SE is preferable.

Outcome O4 is presented as Figure 4 in the original publication.

3.5 Outcome O5

When using ASM for computing the Hessian of one specific nonlinear ODE model, the factor of acceleration of ASM compared to both SE and FD declines exponentially with the number of data points. Adjoint FD is faster than adjoint SE. Adjoint FD is therefore preferable over adjoint SE. At roughly n = 7, adjoint FD and FD are equally fast. For smaller n, adjoint FD is preferable. For higher n, FD is preferable.

Outcome O5 is presented as Figure 5 in the original publication.

3.6 Further outcomes

  1. The implementation of SE approach is so efficient, that it renders the finite difference approximation practically obsolete, due to its superior accuracy.
  2. The ASM efficiency is dependent on the number of measurement times, which is not the case for SE approach.

4 Study design and evidence level

4.1 General aspects

  • They use CVODES solver from the SUNDIALS
  • As a modelling toolbox, they use DiffMEM which is originated is mixed effects modelling. It is a C-library with R and Python interfaces provided.
  • Accuracies (Tolerances) of the ODE solver are provided and claimed to be sufficient to have no effect on results.

4.2 Design for Outcome O1

  • ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > pi > -1.1.
  • Dimensionality of problem is varied between 2 and 122 with 13 different dimensions.
  • Synthetic data: 11 equidistant measurements between 0 and 100.
  • 100 repetitions to estimate variance of the performance.
  • Gaussian noise with variance 1% of maximum prediction.
  • How long and how accurate is gradient calculation?

4.3 Design for Outcome O2

  • ODE models with linear and diagonal RHS (-> number of states = number of parameters) are randomly generated with -0.1 > pi > -1.1.
  • Dimensionality of problem is fixed to 50 dimensions.
  • Synthetic data: The number of time observations fluctuates between 2 and 122 in 12 steps 11 equidistant measurements between 0 and 100.
  • 100 repetitions to estimate variance of the performance.
  • Gaussian noise with variance 1% of maximum prediction.
  • How long and how accurate is gradient calculation?

4.4 Design for Outcome O3

  • Basically the same as the design for outcome O1, but instead of gradients, Hessians are now investigated.

4.5 Design for Outcome O4

  • Latent dynamic HIV model from Lavielle et al. (2011). It consists of 6 states, 11 parameters and 2 observables.
  • Parameters are randomly perturbed within 5% deviation.
  • Other procedures are identical to O2, i.e. number of observations is changed from 2 to 122.
  • How long and how accurate is gradient calculation?

4.6 Design for Outcome O5

  • Using the HIV model and vary the number of observations (as in Outcome O4).
  • Investigate the difference from the methods (as in Outcome O2).
  • How long and how accurate is Hessian calculation?

5 Further comments and aspects

  • Quite a lot of calculus is provided, also in the appendix.
  • Rather abstract problem formulation, i.e. it is not specifically designed for systems biology modeling problems.
  • "For models with a high-number of parameters and a small number of measurement times, the ASM is a clear winner."
  • Special case of linear and diagonal system. How do results translate to non-diagonal linear systems?

6 References

The list of cited or related literature is placed here.