BeviMed, which stands for Bayesian Evaluation of
Variant Involvement in Mendelian Disease [1], is an association
test which estimates the probability of an association between a given
set of variants and a case/control label, the mode of inheritance for
the disease, and the probability that each individual variant is
pathogenic. This vignette gives a quick description of what can be done
with the package and demonstrates how to use it on simulated data. For
more detailed explanations, see the ‘BeviMed Guide’ vignette and the
individual function help pages.
Inference is performed by the bevimed function, which
evaluates the data with respect to three models: a model of no
association between case/control label and allele counts, and models of
dominant and recessive association. The function depends on input
parameters:
y, a length N (number
of samples) logical vector,G, an N by k
integer matrix of allele counts for N individuals at
k rare variant sites,prior_prob_association - the prior probability of
association between the disease label and the variants in the locus.
Defaults to 0.01.prior_prob_dominant - the prior probability of dominant
as opposed to recessive inheritance, given an association with the
locus. Defaults to 0.5.ploidy - a vector the same length as the case-control
label y giving the ploidy of each individual in the locus.
Defaults to 2 for each sample.?bevimed for more
details).bevimed returns an object of class BeviMed
which contains the output of the (MCMC-based) inference procedure,
including samples from the posterior distributions of the model
parameters. The object can be evaluated at the command line to print a
summary of inference, telling you summary statistics of interest,
including the probability of association. The object is likely to take
up a lot of memory, so it is useful to store a summary, computed with
summary, for each result if the procedure is being applied
to multiple loci.
Summary statistics can also be computed directly from the arguments using the functions (see help for individual functions for more information):
prob_association - returning the probability of
association, optionally broken down by mode of inheritance.conditional_prob_pathogenic - the probabilities of
pathogenicity for the individual variants conditional on a mode of
inheritance.expected_explained - the expected number of cases
explained by variants.explaining_variants - the expected number of variants
involved in explained cases.Here we demonstrate a simple application of BeviMed for some simulated data.
Firstly, we’ll generate a random allele-count matrix G
for 100 samples at 20 variant sites (each with an allele frequency of
0.02) and an independently generated case-control label,
y_random.
G <- matrix(rbinom(size=2, prob=0.02, n=100*20), nrow=100, ncol=20)
y_random <- runif(n=nrow(G)) < 0.1
prob_association(G=G, y=y_random)## [1] 0.002810699The results indicate that there is a low probability of association.
We now generate a new case control label y_dependent which
depends on G - specifically, we treat variants 1 to 3 as
‘pathogenic’, and label any samples harbouring alleles for any of these
variants as cases.
y_dependent <- apply(G, 1, function(variants) sum(variants[1:3]) > 0)
prob_association(G=G, y=y_dependent)## [1] 0.9998203Notice that there is now a higher estimated probability of association.
By default, prob_association integrates over mode of
inheritance (e.g. are at least 1 or 2 pathogenic variants required for a
pathogenic configuration?). The probabilities of association with each
mode of inheritance can by shown by passing the option
by_MOI=TRUE (for more details, including how to set the
ploidy of the samples within the region, see
?prob_pathogenic).
For a more detailed output, the bevimed function can be
used, and it’s returned values can be summarised and stored/printed.
## -------------------------------------------------------------------------------- 
## Posterior probability of association: 
##  1 [prior: 0.01]
## -------------------------------------------------------------------------------- 
##      Model MOI Prior     Post Cases Variants
##   dominant dom   0.5 0.999676  6.93     2.95
##  recessive rec   0.5 0.000324  4.59     5.96
## 
## MOI: mode of inheritance, dominant (dom) or recessive (rec)
## Prior: prior probability of model given association
## Post: posterior probability of model given association
## Cases: posterior expected number of cases explained
## Variants: posterior expected number of variants involved in explained cases
## -------------------------------------------------------------------------------- 
## Probabilities of pathogenicity for individual variants given association
## 
##  Var   Probability pathogenic
##    1 [1.00 =================]
##    2 [1.00 =================]
##    3 [0.94 ================ ]
##    4 [0.00                  ]
##    5 [0.00                  ]
##    6 [0.00                  ]
##    7 [0.00                  ]
##    8 [0.00                  ]
##    9 [0.00                  ]
##   10 [0.00                  ]
##   11 [0.00                  ]
##   12 [0.01                  ]
##   13 [0.00                  ]
##   14 [0.02                  ]
##   15 [0.00                  ]
##   16 [0.00                  ]
##   17 [0.00                  ]
##   18 [0.00                  ]
##   19 [0.00                  ]
##   20 [0.00                  ]
## --------------------------------------------------------------------------------