Using filtering in contig_filter_args
and sorting in tie_break_keys
and order
find a
single, canonical contig to represent each cell
Fields in contig_fields
will be copied over to the cell_tbl
.
canonicalize_cell(
ccdb,
contig_filter_args = TRUE,
tie_break_keys = c("umis", "reads"),
contig_fields = tie_break_keys,
order = 1,
overwrite = TRUE
)
an expression passed to dplyr::filter()
.
Unlike filter
, multiple criteria must be &
together, rather than using
commas to separate. These act on ccdb$contig_tbl
(optional) character
naming fields in contig_tbl
that are used sort the contig table in descending order.
Used to break ties if contig_filter_args
does not return a unique contig
for each cluster
Optional fields from contig_tbl
that will be copied into
the cluster_tbl
from the canonical contig.
The rank order of the contig, based on tie_break_keys
to return. If tie_break_keys
included an ordered factor (such as chain)
this could be used to return the second chain.
logical
-- should non-key fields in y be overwritten using x, or should a suffix (".y") be added
ContigCellDB()
with some number of clusters/contigs/cells but with "canonical" values copied into cell_tbl
# Report beta chain with highest umi-count, breaking ties with reads
data(ccdb_ex)
beta = canonicalize_cell(ccdb_ex, chain == 'TRB',
tie_break_keys = c('umis', 'reads'),
contig_fields = c('umis', 'reads', 'chain', 'v_gene', 'd_gene', 'j_gene'))
head(beta$cell_tbl)
#> # A tibble: 6 × 9
#> umis reads chain v_gene d_gene j_gene pop sample barcode
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 215 125913 TRB TRBV13-1 None TRBJ2-2 balbc 1 GGACATTCATTCTTAC-1
#> 2 36 56182 TRB TRBV3 TRBD1 TRBJ1-1 b6 5 CACCACTCACCAGTTA-1
#> 3 35 60073 TRB TRBV16 TRBD2 TRBJ2-7 balbc 1 GGAATAATCGATGAGG-1
#> 4 34 144332 TRB TRBV13-3 TRBD1 TRBJ2-7 b6 6 TCTTCGGCACCACGTG-1
#> 5 33 98113 TRB TRBV17 TRBD1 TRBJ2-3 b6 6 CTAACTTCAGGTCCAC-1
#> 6 30 67937 TRB TRBV19 None TRBJ2-7 b6 6 AGCGGTCTCACAGGCC-1
# Stable: only adds fields to `cell_tbl`
stopifnot(dplyr::all_equal(beta$cell_tbl[ccdb_ex$cell_pk],
ccdb_ex$cell_tbl[ccdb_ex$cell_pk], ignore_row_order = TRUE))
#Report cdr3 with highest UMI count, but only when > 5 UMIs support it
umi5 = canonicalize_cell(ccdb_ex, umis > 5,
tie_break_keys = c('umis', 'reads'), contig_fields = c('umis', 'cdr3'))
stopifnot(all(umi5$cell_tbl$umis > 5, na.rm = TRUE))