A contingency table of every combination of
cluster_idx up to
is generated. Combinations that are found in at least
of cells are reported. All cells that have these combinations are returned,
as well as cells that only have
orphan_level of matching
pairing_tables( ccdb, ranking_key = "grp_rank", table_order = 2, min_expansion = 2, orphan_level = 1, cluster_keys = character(), cluster_whitelist = NULL, cluster_blacklist = NULL )
ccdb$contig_tbl giving the ranking of each contig per cell. Probably generated by a call to
Integer larger than 1. What order of cluster_idx will be paired, eg, order = 2 means that the first and second highest ranked contigs will be sought and paired in each cell
the minimal number of times a pairing needs to occur for it to be reported
Integer in interval [1,
table_order]. Given that at least
min_expansion cells are found that have
table_order chains identical, how many
cluster_idx pairs will we match on to select other cells. Example:
ophan_level=1 means that cells that share just a single chain with an expanded pair will be reported.
character naming additional columns in
ccdb$cluster_tbl to be reported in the pairing
a table of pairings or clusters that should always be reported. Here the clusters must be named "cluster_idx.1", "cluster_idx.2" (if order-2 pairs are being selected) rather than with `ccdb$cluster_pk``
a table of pairings or clusters that will never be reported. Must be named as per
list of tables. The
cell_tbl is keyed by the
cell_identifiers, with fields "cluster_idx.1", "cluster_idx.2", etc, IDing the contigs present in each cell. "cluster_idx.1_fct" and "cluster_idx.2_fct" cast these fields to factors and are reordered to maximize the number of pairs along the diagonal. The
idx2_tbl report information (passed in about the
cluster_pair_tbl reports all pairings found of contigs, and the number of times observed.
For example, if
min_expansion=2 then heavy/light or
alpha/beta pairs found two or more times will be returned
(as well as alpha-alpha pairs, etc, if those are present).
orphan_level=1 then all cells that share just a single chain with an
expanded clone will be returned.
cluster_idx.2_fct fields in
idx2_tbl are cast to factors and ordered such that pairings will
tend to occur along the diagonal when they are cross-tabulated.
This facilitates plotting.
library(dplyr) tbl = tibble(clust_idx = gl(3, 2), cell_idx = rep(1:3, times = 2), contig_idx = 1:6) ccdb = ContigCellDB(tbl, contig_pk = c('cell_idx', 'contig_idx'), cell_pk = 'cell_idx', cluster_pk = 'clust_idx') # add `grp_rank` to ccdb$contig_tbl indicating how frequent a cluster is ccdb = rank_prevalence_ccdb(ccdb, tie_break_keys = character()) # using `grp_rank` to determine pairing # no pairs found twice pt1 = pairing_tables(ccdb) #> Warning: No pairs found # all pairs found, found once. pt2 = pairing_tables(ccdb, min_expansion = 1) pt2$cell_tbl #> # A tibble: 3 × 6 #> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct #> <int> <fct> <fct> <int> <fct> #> 1 1 1 2 1 1 #> 2 2 1 3 1 1 #> 3 3 2 3 1 2 #> # … with 1 more variable: cluster_idx.2_fct <fct> tbl2 = bind_rows(tbl, tbl %>% mutate(cell_idx = rep(4:6, times = 2))) ccdb2 = ContigCellDB(tbl2, contig_pk = c('cell_idx', 'contig_idx'), cell_pk = 'cell_idx', cluster_pk = 'clust_idx') %>% rank_prevalence_ccdb(tie_break_keys = character()) #all pairs found twice pt3 = pairing_tables(ccdb2, min_expansion = 1) pt3$cell_tbl #> # A tibble: 6 × 6 #> cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct #> <int> <fct> <fct> <int> <fct> #> 1 1 1 2 2 1 #> 2 2 1 3 2 1 #> 3 3 2 3 2 2 #> 4 4 1 2 2 1 #> 5 5 1 3 2 1 #> 6 6 2 3 2 2 #> # … with 1 more variable: cluster_idx.2_fct <fct> ccdb2$contig_tbl = ccdb2$contig_tbl %>% mutate(umis = 1, reads = 1, chain = rep(c('TRA', 'TRB'), times = 6)) ccdb2 = rank_chain_ccdb(ccdb2, tie_break_keys = character()) pt4 = pairing_tables(ccdb2, min_expansion = 1, table_order = 2)