Using filtering in contig_filter_args and sorting in tie_break_keys and order find a single, canonical contig to represent each cell Fields in contig_fields will be copied over to the cell_tbl.

canonicalize_cell(
  ccdb,
  contig_filter_args = TRUE,
  tie_break_keys = c("umis", "reads"),
  contig_fields = tie_break_keys,
  order = 1,
  overwrite = TRUE
)

Arguments

ccdb

ContigCellDB()

contig_filter_args

an expression passed to dplyr::filter(). Unlike filter, multiple criteria must be & together, rather than using commas to separate. These act on ccdb$contig_tbl

tie_break_keys

(optional) character naming fields in contig_tbl that are used sort the contig table in descending order. Used to break ties if contig_filter_args does not return a unique contig for each cluster

contig_fields

Optional fields from contig_tbl that will be copied into the cluster_tbl from the canonical contig.

order

The rank order of the contig, based on tie_break_keys to return. If tie_break_keys included an ordered factor (such as chain) this could be used to return the second chain.

overwrite

logical -- should non-key fields in y be overwritten using x, or should a suffix (".y") be added

Value

ContigCellDB() with some number of clusters/contigs/cells but with "canonical" values copied into cell_tbl

Examples

# Report beta chain with highest umi-count, breaking ties with reads
data(ccdb_ex)
beta = canonicalize_cell(ccdb_ex, chain == 'TRB',
tie_break_keys = c('umis', 'reads'),
contig_fields = c('umis', 'reads', 'chain', 'v_gene', 'd_gene', 'j_gene'))
head(beta$cell_tbl)
#> # A tibble: 6 × 9
#>    umis  reads chain v_gene   d_gene j_gene  pop   sample barcode           
#>   <dbl>  <dbl> <chr> <chr>    <chr>  <chr>   <chr> <chr>  <chr>             
#> 1   215 125913 TRB   TRBV13-1 None   TRBJ2-2 balbc 1      GGACATTCATTCTTAC-1
#> 2    36  56182 TRB   TRBV3    TRBD1  TRBJ1-1 b6    5      CACCACTCACCAGTTA-1
#> 3    35  60073 TRB   TRBV16   TRBD2  TRBJ2-7 balbc 1      GGAATAATCGATGAGG-1
#> 4    34 144332 TRB   TRBV13-3 TRBD1  TRBJ2-7 b6    6      TCTTCGGCACCACGTG-1
#> 5    33  98113 TRB   TRBV17   TRBD1  TRBJ2-3 b6    6      CTAACTTCAGGTCCAC-1
#> 6    30  67937 TRB   TRBV19   None   TRBJ2-7 b6    6      AGCGGTCTCACAGGCC-1

# Stable: only adds fields to `cell_tbl`
stopifnot(dplyr::all_equal(beta$cell_tbl[ccdb_ex$cell_pk],
ccdb_ex$cell_tbl[ccdb_ex$cell_pk], ignore_row_order = TRUE))

#Report cdr3 with highest UMI count, but only when > 5 UMIs support it
umi5 = canonicalize_cell(ccdb_ex, umis > 5,
tie_break_keys = c('umis', 'reads'), contig_fields = c('umis', 'cdr3'))
stopifnot(all(umi5$cell_tbl$umis > 5, na.rm = TRUE))