Introduction to the construction of Markov models

Andrew Sims

September 2024

Introduction

Sonnenberg and Beck’s 1993 practical guide to using Markov models in decision making1 is widely cited, and describes the principles of using Markov models for cohort simulations. As an example, it introduces an idealised three health state model applied to patients who have received a prosthetic heart valve (PHV). This vignette explains how to use rdecision to implement the example case, and replicates the published results.

Model structure

In the example, there are three states: “Well”, “Disabled” and “Dead”, which are represented by variables of type MarkovState. As a minimum, only the name property of each state must be set; the utility and annual occupancy cost can be set when a state is created or set later. Because we will be setting these properties later, we create the states as named variables. A Markov state object is represented in rdecision as a node in a graph.

s_well <- MarkovState$new(name = "Well")
s_disabled <- MarkovState$new(name = "Disabled")
s_dead <- MarkovState$new(name = "Dead")

Each allowed transition between states is represented as an object of type Transition. In rdecision, transitions are the directed edges of a graph. Transitions are defined by the source and target states that they join, and can have the optional properties of a cost associated with making the transition, and a label. In this example, we do not need to set any of the optional properties, and can create the transitions as a list of unnamed variables. Unless states are temporary states (occupied for one cycle only), we must also define transitions from each state to itself, to represent people who remain in a state between cycles.

E <- list(
  Transition$new(s_well, s_well),
  Transition$new(s_dead, s_dead),
  Transition$new(s_disabled, s_disabled),
  Transition$new(s_well, s_disabled),
  Transition$new(s_well, s_dead),
  Transition$new(s_disabled, s_dead)
)

The model itself is created as a variable of type SemiMarkovModel, which represents a directed graph with nodes (states) and edges (transitions). Properties of the model, including the cycle time and discount rates can be set when the model is created. For this example, we leave these as their default values (one year cycle length, no discounting applied to costs or utilities).

m <- SemiMarkovModel$new(V = list(s_well, s_disabled, s_dead), E)

The model can be saved as a graph object and rendered as a diagram. Method as_gml() creates a representation of the graph in GML format, which can be opened and plotted using the igraph package, optionally with additional manipulation of the graph’s appearance to achieve the desired effect, as below.

Model variables

In the prosthetic heart valve example, there are only 4 model variables: three probabilities of transition during one cycle, and one utility (disabled state).

The default utility of each state is 1, so we have to set the utilities of the disabled and dead states, assuming those in the well state have full utility, as follows:

s_disabled$set_utility(0.7)
s_dead$set_utility(0.0)

The probabilities of making a transition between states in a semi-Markov model must be defined as a matrix. Specifically, these are the probabilities of starting a cycle in one state and finishing it in another. rdecision requires the matrix to have specific properties:

In this example there are three values for transition probabilities (well to disabled, well to dead, disabled to dead), an assumption that there is no transition from disabled to well, and an assumption that dead is an absorbing state. The matrix is created and set as follows:

snames <- c("Well", "Disabled", "Dead")
pt <- matrix(
  data = c(NA, 0.2, 0.2, 0.0, NA, 0.4, 0.0, 0.0, NA),
  nrow = 3L, byrow = TRUE,
  dimnames = list(source = snames, target = snames)
)
m$set_probabilities(pt)
Well Disabled Dead
Well NA 0.2 0.2
Disabled 0 NA 0.4
Dead 0 0.0 NA

The transition probability matrix can be extracted from the model using the function transition_probabilities(). The values set as NA are replaced as required, and the order of rows and columns may differ from the one provided. For this example it is as follows:

Well Disabled Dead
Disabled 0.0 0.6 0.4
Well 0.6 0.2 0.2
Dead 0.0 0.0 1.0

Running the model

In a cohort Markov model, it is necessary to define the starting populations in each state. The total population size is arbitrary; it is the relative proportions starting in each state that matters. In Sonnenberg and Beck’s PHV example, they assume there are 10,000 people who start in the “Well” state. In rdecision this is achieved by resetting the model; the elapsed time and cycle number can be reset with the same call, here we leave them as their default values.

In this case, the state populations are given as integers, but in a cohort simulation, most practical transition probability matrices lead to state occupancies involving fractions of patients as the simulation proceeds. Thus the starting populations can also be given as real numbers.

m$reset(populations = c(Well = 10000L, Disabled = 0L, Dead = 0L))

To run the model, we call the cycles() method. Following the example, there are 25 yearly cycles, and there is no half cycle correction. This correction can be applied independently to the output to the Markov trace after each cycle for the state population, cycle cost and incremental QALYs.

mt <- m$cycles(25L, hcc.pop = FALSE, hcc.cost = FALSE, hcc.QALY = FALSE)

The cycles() method returns a Markov trace, a data frame which contains one row per cycle with details of state populations, cumulative costs and QALYs. The trace for this example is as follows:

Cycle Well Disabled Dead QALY cQALY
0 10000.00 0.00 0.00 0.0000 0.0000
1 6000.00 2000.00 2000.00 0.7400 0.7400
2 3600.00 2400.00 4000.00 0.5280 1.2680
3 2160.00 2160.00 5680.00 0.3672 1.6352
4 1296.00 1728.00 6976.00 0.2506 1.8858
5 777.60 1296.00 7926.40 0.1685 2.0542
6 466.56 933.12 8600.32 0.1120 2.1662
7 279.94 653.18 9066.88 0.0737 2.2399
8 167.96 447.90 9384.14 0.0481 2.2881
9 100.78 302.33 9596.89 0.0312 2.3193
10 60.47 201.55 9737.98 0.0202 2.3395
11 36.28 133.03 9830.69 0.0129 2.3524
12 21.77 87.07 9891.16 0.0083 2.3607
13 13.06 56.60 9930.34 0.0053 2.3660
14 7.84 36.57 9955.59 0.0033 2.3693
15 4.70 23.51 9971.79 0.0021 2.3714
16 2.82 15.05 9982.13 0.0013 2.3728
17 1.69 9.59 9988.72 0.0008 2.3736
18 1.02 6.09 9992.89 0.0005 2.3741
19 0.61 3.86 9995.53 0.0003 2.3745
20 0.37 2.44 9997.20 0.0002 2.3747
21 0.22 1.54 9998.25 0.0001 2.3748
22 0.13 0.97 9998.90 0.0001 2.3749
23 0.08 0.61 9999.32 0.0001 2.3749
24 0.05 0.38 9999.57 0.0000 2.3749
25 0.03 0.24 9999.73 0.0000 2.3750

The column labelled “QALY” is the per-patient quality adjusted life years accumulated at each cycle (in Sonnenberg and Beck’s Table 2 this is labelled as “Cycle Sum” and is for the whole cohort of 10,000 people), and the column labelled “cQALY” is the per-patient cumulative quality adjusted life years over the simulation (in Sonnenberg and Beck’s Table 2 this is labelled as “Cumulative Utility” and is for the whole cohort).

References

1.
Sonnenberg, F. A. & Beck, J. R. Markov Models in Medical Decision Making: A Practical Guide. Medical Decision Making 13, 322–338 (1993).