R/permutation-testing.R
cluster_permute_test.Rd
This tests a statistic for association between labels (for instance, cluster/clonal ID) and covariates (for instance, subject or treatment) by permuting the link between the two.
Each observation represents a cell.
statistic
is any function of labels
cluster_permute_test(
ccdb,
cell_covariate_keys,
cell_label_key = ccdb$cluster_pk,
cell_stratify_keys,
statistic,
contrasts = NULL,
n_perm,
alternative = c("two.sided", "less", "greater"),
sanity_check_strata = TRUE,
...
)
ContigCellDB
character
naming fields in ccdb$cell_tbl
character
naming a single field in ccdb$cell_tbl
optional character
naming fields in ccdb$cell_tbl
under which permutations of cell_label_key
will occur.
This means that the test will occur conditional on these covariates.
Must be disjoint from cell_covariate_keys
.
function of label (vector) and covariate (data.frame). If this returns a vector, then by default each level will be compared against each other, pairwise, but see the next section.
an optional list of numeric vectors. Each will be dotted with the statistic, or optionally a matrix provided in which case each row would be tested one-by-one.
number of permutations to run
character
naming the direction statistic
should be fall under the alternative hypothesis
logical
, should cell_stratify_keys
be checked for sanity?
passed to statistic
a list containing the observed value of the statistic, the permuted values of the statistic, its expectation (under independence), a p-value, and the Monte Carlo standard error (of the expected value).
library(dplyr)
# covariate should name one or more columns in `cell_tbl`
cluster_idx = c(1, 1, 1, 2, 2, 3, 3)
subject = c('A', 'A', 'B', 'B', 'B', 'C', 'C')
contig_tbl = tibble(contig_pk = seq_along(cluster_idx), cluster_idx, subject)
ccdb_test = ContigCellDB(contig_tbl = contig_tbl, contig_pk = 'contig_pk',
cell_pk = c('contig_pk', 'subject', 'cluster_idx'), cluster_pk = 'cluster_idx')
ccdb_test$cell_tbl
#> # A tibble: 7 × 3
#> contig_pk subject cluster_idx
#> <int> <chr> <dbl>
#> 1 1 A 1
#> 2 2 A 1
#> 3 3 B 1
#> 4 4 B 2
#> 5 5 B 2
#> 6 6 C 3
#> 7 7 C 3
clust_test = cluster_permute_test(ccdb_test, 'subject', 'cluster_idx',
statistic = purity, n_perm = 50)
library(ggplot2)
plot_permute_test(perm_test = clust_test)
#> Loading required namespace: cowplot
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
tidy.PermuteTest(clust_test)
#> # A tibble: 50 × 5
#> statistics observed expected p.value mc.se
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.6 0.25 0.731 0.02 0.0364
#> 2 1 0.25 0.731 0.02 0.0364
#> 3 0.25 0.25 0.731 0.02 0.0364
#> 4 0.6 0.25 0.731 0.02 0.0364
#> 5 0.6 0.25 0.731 0.02 0.0364
#> 6 1 0.25 0.731 0.02 0.0364
#> 7 0.833 0.25 0.731 0.02 0.0364
#> 8 0.6 0.25 0.731 0.02 0.0364
#> 9 0.25 0.25 0.731 0.02 0.0364
#> 10 0.833 0.25 0.731 0.02 0.0364
#> # … with 40 more rows