A contingency table of every combination of cluster_idx up to table_order is generated. Combinations that are found in at least min_expansion number of cells are reported. All cells that have these combinations are returned, as well as cells that only have orphan_level of matching cluster_idx.

pairing_tables(
  ccdb,
  ranking_key = "grp_rank",
  table_order = 2,
  min_expansion = 2,
  orphan_level = 1,
  cluster_keys = character(),
  cluster_whitelist = NULL,
  cluster_blacklist = NULL
)

Arguments

ccdb

ContigCellDB

ranking_key

field in ccdb$contig_tbl giving the ranking of each contig per cell. Probably generated by a call to rank_prevalence_ccdb() or rank_chain_ccdb().

table_order

Integer larger than 1. What order of cluster_idx will be paired, eg, order = 2 means that the first and second highest ranked contigs will be sought and paired in each cell

min_expansion

the minimal number of times a pairing needs to occur for it to be reported

orphan_level

Integer in interval [1, table_order]. Given that at least min_expansion cells are found that have table_order chains identical, how many cluster_idx pairs will we match on to select other cells. Example: ophan_level=1 means that cells that share just a single chain with an expanded pair will be reported.

cluster_keys

optional character naming additional columns in ccdb$cluster_tbl to be reported in the pairing

cluster_whitelist

a table of pairings or clusters that should always be reported. Here the clusters must be named "cluster_idx.1", "cluster_idx.2" (if order-2 pairs are being selected) rather than with `ccdb$cluster_pk``

cluster_blacklist

a table of pairings or clusters that will never be reported. Must be named as per cluster_whitelist.

Value

list of tables. The cell_tbl is keyed by the cell_identifiers, with fields "cluster_idx.1", "cluster_idx.2", etc, IDing the contigs present in each cell. "cluster_idx.1_fct" and "cluster_idx.2_fct" cast these fields to factors and are reordered to maximize the number of pairs along the diagonal. The idx1_tbl and idx2_tbl report information (passed in about the cluster_idx by feature_tbl.) The cluster_pair_tbl reports all pairings found of contigs, and the number of times observed.

Details

For example, if table_order=2 and min_expansion=2 then heavy/light or alpha/beta pairs found two or more times will be returned (as well as alpha-alpha pairs, etc, if those are present). If orphan_level=1 then all cells that share just a single chain with an expanded clone will be returned.

The cluster_idx.1_fct and cluster_idx.2_fct fields in cell_tbl, idx1_tbl, idx2_tbl are cast to factors and ordered such that pairings will tend to occur along the diagonal when they are cross-tabulated. This facilitates plotting.

Examples

library(dplyr)
tbl = tibble(clust_idx = gl(3, 2), cell_idx = rep(1:3, times = 2), contig_idx = 1:6)
ccdb = ContigCellDB(tbl, contig_pk = c('cell_idx', 'contig_idx'),
cell_pk = 'cell_idx', cluster_pk = 'clust_idx')
# add `grp_rank` to ccdb$contig_tbl indicating how frequent a cluster is
ccdb = rank_prevalence_ccdb(ccdb, tie_break_keys = character())
# using `grp_rank` to determine pairing
# no pairs found twice
pt1 = pairing_tables(ccdb)
#> Warning: No pairs found
# all pairs found, found once.
pt2 = pairing_tables(ccdb, min_expansion = 1)
pt2$cell_tbl
#> # A tibble: 3 × 6
#>   cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#>      <int> <fct>         <fct>             <int> <fct>            
#> 1        1 1             2                     1 1                
#> 2        2 1             3                     1 1                
#> 3        3 2             3                     1 2                
#> # … with 1 more variable: cluster_idx.2_fct <fct>
tbl2 = bind_rows(tbl, tbl %>% mutate(cell_idx = rep(4:6, times = 2)))
ccdb2 = ContigCellDB(tbl2, contig_pk = c('cell_idx', 'contig_idx'), cell_pk = 'cell_idx',
cluster_pk = 'clust_idx') %>% rank_prevalence_ccdb(tie_break_keys = character())
#all pairs found twice
pt3 = pairing_tables(ccdb2, min_expansion = 1)
pt3$cell_tbl
#> # A tibble: 6 × 6
#>   cell_idx cluster_idx.1 cluster_idx.2 max_pairs cluster_idx.1_fct
#>      <int> <fct>         <fct>             <int> <fct>            
#> 1        1 1             2                     2 1                
#> 2        2 1             3                     2 1                
#> 3        3 2             3                     2 2                
#> 4        4 1             2                     2 1                
#> 5        5 1             3                     2 1                
#> 6        6 2             3                     2 2                
#> # … with 1 more variable: cluster_idx.2_fct <fct>
ccdb2$contig_tbl = ccdb2$contig_tbl %>%
    mutate(umis = 1, reads = 1, chain = rep(c('TRA', 'TRB'), times = 6))
ccdb2 = rank_chain_ccdb(ccdb2, tie_break_keys = character())
pt4 = pairing_tables(ccdb2, min_expansion = 1, table_order = 2)