R/pairing-methods.R
pairing_tables.RdA contingency table of every combination of cluster_idx up to table_order
is generated. Combinations that are found in at least min_expansion number
of cells are reported. All cells that have these combinations are returned,
as well as cells that only have orphan_level of matching cluster_idx.
pairing_tables(
ccdb,
ranking_key = "grp_rank",
table_order = 2,
min_expansion = 2,
orphan_level = 1,
cluster_keys = character(),
cluster_whitelist = NULL,
cluster_blacklist = NULL
)ContigCellDB
field in ccdb$contig_tbl giving the ranking of each contig per cell. Probably generated by a call to rank_prevalence_ccdb() or rank_chain_ccdb().
Integer larger than 1. What order of cluster_idx will be paired, eg, order = 2 means that the first and second highest ranked contigs will be sought and paired in each cell
the minimal number of times a pairing needs to occur for it to be reported
Integer in interval [1, table_order]. Given that at least min_expansion cells are found that have table_order chains identical, how many cluster_idx pairs will we match on to select other cells. Example: ophan_level=1 means that cells that share just a single chain with an expanded pair will be reported.
optional character naming additional columns in
ccdb$cluster_tbl to be reported in the pairing
a table of pairings or clusters that should always be reported. Here the clusters must be named "cluster_idx.1", "cluster_idx.2" (if order-2 pairs are being selected) rather than with `ccdb$cluster_pk``
a table of pairings or clusters that will never be reported. Must be named as per cluster_whitelist.
list of tables. The cell_tbl is keyed by the cell_identifiers, with fields "cluster_idx.1", "cluster_idx.2", etc, IDing the contigs present in each cell. "cluster_idx.1_fct" and "cluster_idx.2_fct" cast these fields to factors and are reordered to maximize the number of pairs along the diagonal. The idx1_tbl and idx2_tbl report information (passed in about the cluster_idx by feature_tbl.) The cluster_pair_tbl reports all pairings found of contigs, and the number of times observed.
For example, if table_order=2 and min_expansion=2 then heavy/light or
alpha/beta pairs found two or more times will be returned
(as well as alpha-alpha pairs, etc, if those are present).
If orphan_level=1 then all cells that share just a single chain with an
expanded clone will be returned.
The cluster_idx.1_fct and cluster_idx.2_fct fields in cell_tbl,
idx1_tbl, idx2_tbl are cast to factors and ordered such that pairings will
tend to occur along the diagonal when they are cross-tabulated.
This facilitates plotting.
library(dplyr)
tbl = tibble(clust_idx = gl(3, 2), cell_idx = rep(1:3, times = 2), contig_idx = 1:6)
ccdb = ContigCellDB(tbl, contig_pk = c('cell_idx', 'contig_idx'),
cell_pk = 'cell_idx', cluster_pk = 'clust_idx')
# add `grp_rank` to ccdb$contig_tbl indicating how frequent a cluster is
ccdb = rank_prevalence_ccdb(ccdb, tie_break_keys = character())
# using `grp_rank` to determine pairing
# no pairs found twice
pt1 = pairing_tables(ccdb)
#> Warning: No pairs found
# all pairs found, found once.
pt2 = pairing_tables(ccdb, min_expansion = 1)
pt2$cell_tbl
#> # A tibble: 3 × 6
#> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#> <int> <fct> <fct> <int> <fct>
#> 1 1 1 2 1 1
#> 2 2 1 3 1 1
#> 3 3 2 3 1 2
#> # … with 1 more variable: cluster_idx.2_fct <fct>
tbl2 = bind_rows(tbl, tbl %>% mutate(cell_idx = rep(4:6, times = 2)))
ccdb2 = ContigCellDB(tbl2, contig_pk = c('cell_idx', 'contig_idx'), cell_pk = 'cell_idx',
cluster_pk = 'clust_idx') %>% rank_prevalence_ccdb(tie_break_keys = character())
#all pairs found twice
pt3 = pairing_tables(ccdb2, min_expansion = 1)
pt3$cell_tbl
#> # A tibble: 6 × 6
#> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#> <int> <fct> <fct> <int> <fct>
#> 1 1 1 2 2 1
#> 2 2 1 3 2 1
#> 3 3 2 3 2 2
#> 4 4 1 2 2 1
#> 5 5 1 3 2 1
#> 6 6 2 3 2 2
#> # … with 1 more variable: cluster_idx.2_fct <fct>
ccdb2$contig_tbl = ccdb2$contig_tbl %>%
mutate(umis = 1, reads = 1, chain = rep(c('TRA', 'TRB'), times = 6))
ccdb2 = rank_chain_ccdb(ccdb2, tie_break_keys = character())
pt4 = pairing_tables(ccdb2, min_expansion = 1, table_order = 2)