This tests a statistic for association between labels (for instance, cluster/clonal ID) and covariates (for instance, subject or treatment) by permuting the link between the two. Each observation represents a cell. statistic is any function of labels

cluster_permute_test(
  ccdb,
  cell_covariate_keys,
  cell_label_key = ccdb$cluster_pk,
  cell_stratify_keys,
  statistic,
  contrasts = NULL,
  n_perm,
  alternative = c("two.sided", "less", "greater"),
  sanity_check_strata = TRUE,
  ...
)

Arguments

ccdb

ContigCellDB

cell_covariate_keys

character naming fields in ccdb$cell_tbl

cell_label_key

character naming a single field in ccdb$cell_tbl

cell_stratify_keys

optional character naming fields in ccdb$cell_tbl under which permutations of cell_label_key will occur. This means that the test will occur conditional on these covariates. Must be disjoint from cell_covariate_keys.

statistic

function of label (vector) and covariate (data.frame). If this returns a vector, then by default each level will be compared against each other, pairwise, but see the next section.

contrasts

an optional list of numeric vectors. Each will be dotted with the statistic, or optionally a matrix provided in which case each row would be tested one-by-one.

n_perm

number of permutations to run

alternative

character naming the direction statistic should be fall under the alternative hypothesis

sanity_check_strata

logical, should cell_stratify_keys be checked for sanity?

...

passed to statistic

Value

a list containing the observed value of the statistic, the permuted values of the statistic, its expectation (under independence), a p-value, and the Monte Carlo standard error (of the expected value).

See also

Examples

library(dplyr)
# covariate should name one or more columns in `cell_tbl`

cluster_idx = c(1, 1, 1, 2, 2, 3, 3)
subject = c('A', 'A', 'B', 'B', 'B', 'C', 'C')
contig_tbl = tibble(contig_pk = seq_along(cluster_idx), cluster_idx, subject)
ccdb_test = ContigCellDB(contig_tbl = contig_tbl, contig_pk = 'contig_pk',
cell_pk = c('contig_pk', 'subject', 'cluster_idx'), cluster_pk = 'cluster_idx')
ccdb_test$cell_tbl
#> # A tibble: 7 × 3
#>   contig_pk subject cluster_idx
#>       <int> <chr>         <dbl>
#> 1         1 A                 1
#> 2         2 A                 1
#> 3         3 B                 1
#> 4         4 B                 2
#> 5         5 B                 2
#> 6         6 C                 3
#> 7         7 C                 3

clust_test = cluster_permute_test(ccdb_test, 'subject', 'cluster_idx',
statistic = purity, n_perm  = 50)
library(ggplot2)
plot_permute_test(perm_test = clust_test)
#> Loading required namespace: cowplot
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

tidy.PermuteTest(clust_test)
#> # A tibble: 50 × 5
#>    statistics observed expected p.value  mc.se
#>         <dbl>    <dbl>    <dbl>   <dbl>  <dbl>
#>  1      0.6       0.25    0.731    0.02 0.0364
#>  2      1         0.25    0.731    0.02 0.0364
#>  3      0.25      0.25    0.731    0.02 0.0364
#>  4      0.6       0.25    0.731    0.02 0.0364
#>  5      0.6       0.25    0.731    0.02 0.0364
#>  6      1         0.25    0.731    0.02 0.0364
#>  7      0.833     0.25    0.731    0.02 0.0364
#>  8      0.6       0.25    0.731    0.02 0.0364
#>  9      0.25      0.25    0.731    0.02 0.0364
#> 10      0.833     0.25    0.731    0.02 0.0364
#> # … with 40 more rows