R/pairing-methods.R
pairing_tables.Rd
A contingency table of every combination of cluster_idx
up to table_order
is generated. Combinations that are found in at least min_expansion
number
of cells are reported. All cells that have these combinations are returned,
as well as cells that only have orphan_level
of matching cluster_idx
.
pairing_tables(
ccdb,
ranking_key = "grp_rank",
table_order = 2,
min_expansion = 2,
orphan_level = 1,
cluster_keys = character(),
cluster_whitelist = NULL,
cluster_blacklist = NULL
)
ContigCellDB
field in ccdb$contig_tbl
giving the ranking of each contig per cell. Probably generated by a call to rank_prevalence_ccdb()
or rank_chain_ccdb()
.
Integer larger than 1. What order of cluster_idx will be paired, eg, order = 2 means that the first and second highest ranked contigs will be sought and paired in each cell
the minimal number of times a pairing needs to occur for it to be reported
Integer in interval [1, table_order
]. Given that at least min_expansion
cells are found that have table_order
chains identical, how many cluster_idx
pairs will we match on to select other cells. Example: ophan_level=1
means that cells that share just a single chain with an expanded pair will be reported.
optional character
naming additional columns in
ccdb$cluster_tbl
to be reported in the pairing
a table of pairings or clusters that should always be reported. Here the clusters must be named "cluster_idx.1", "cluster_idx.2" (if order-2 pairs are being selected) rather than with `ccdb$cluster_pk``
a table of pairings or clusters that will never be reported. Must be named as per cluster_whitelist
.
list of tables. The cell_tbl
is keyed by the cell_identifiers
, with fields "cluster_idx.1", "cluster_idx.2", etc, IDing the contigs present in each cell. "cluster_idx.1_fct" and "cluster_idx.2_fct" cast these fields to factors and are reordered to maximize the number of pairs along the diagonal. The idx1_tbl
and idx2_tbl
report information (passed in about the cluster_idx
by feature_tbl
.) The cluster_pair_tbl
reports all pairings found of contigs, and the number of times observed.
For example, if table_order=2
and min_expansion=2
then heavy/light or
alpha/beta pairs found two or more times will be returned
(as well as alpha-alpha pairs, etc, if those are present).
If orphan_level=1
then all cells that share just a single chain with an
expanded clone will be returned.
The cluster_idx.1_fct
and cluster_idx.2_fct
fields in cell_tbl
,
idx1_tbl
, idx2_tbl
are cast to factors and ordered such that pairings will
tend to occur along the diagonal when they are cross-tabulated.
This facilitates plotting.
library(dplyr)
tbl = tibble(clust_idx = gl(3, 2), cell_idx = rep(1:3, times = 2), contig_idx = 1:6)
ccdb = ContigCellDB(tbl, contig_pk = c('cell_idx', 'contig_idx'),
cell_pk = 'cell_idx', cluster_pk = 'clust_idx')
# add `grp_rank` to ccdb$contig_tbl indicating how frequent a cluster is
ccdb = rank_prevalence_ccdb(ccdb, tie_break_keys = character())
# using `grp_rank` to determine pairing
# no pairs found twice
pt1 = pairing_tables(ccdb)
#> Warning: No pairs found
# all pairs found, found once.
pt2 = pairing_tables(ccdb, min_expansion = 1)
pt2$cell_tbl
#> # A tibble: 3 × 6
#> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#> <int> <fct> <fct> <int> <fct>
#> 1 1 1 2 1 1
#> 2 2 1 3 1 1
#> 3 3 2 3 1 2
#> # … with 1 more variable: cluster_idx.2_fct <fct>
tbl2 = bind_rows(tbl, tbl %>% mutate(cell_idx = rep(4:6, times = 2)))
ccdb2 = ContigCellDB(tbl2, contig_pk = c('cell_idx', 'contig_idx'), cell_pk = 'cell_idx',
cluster_pk = 'clust_idx') %>% rank_prevalence_ccdb(tie_break_keys = character())
#all pairs found twice
pt3 = pairing_tables(ccdb2, min_expansion = 1)
pt3$cell_tbl
#> # A tibble: 6 × 6
#> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#> <int> <fct> <fct> <int> <fct>
#> 1 1 1 2 2 1
#> 2 2 1 3 2 1
#> 3 3 2 3 2 2
#> 4 4 1 2 2 1
#> 5 5 1 3 2 1
#> 6 6 2 3 2 2
#> # … with 1 more variable: cluster_idx.2_fct <fct>
ccdb2$contig_tbl = ccdb2$contig_tbl %>%
mutate(umis = 1, reads = 1, chain = rep(c('TRA', 'TRB'), times = 6))
ccdb2 = rank_chain_ccdb(ccdb2, tie_break_keys = character())
pt4 = pairing_tables(ccdb2, min_expansion = 1, table_order = 2)