Rank contigs, per cell, by experiment-wide prevalence of cluster_pk, which is added as the prevalence field

Rank contigs, per cell, by experiment-wide prevalence of cluster_pk, which is added as the prevalence field

rank_prevalence_ccdb(
  ccdb,
  contig_filter_args = TRUE,
  tie_break_keys = c("umis", "reads")
)

rank_chain_ccdb(
  ccdb,
  contig_filter_args = TRUE,
  tie_break_keys = c("umis", "reads"),
  chain_key = "chain",
  contig_fields = tie_break_keys,
  chain_levels = c("IGL", "IGK", "TRA", "TRB", "IGH")
)

Arguments

ccdb: ContigCellDB()
contig_filter_args: an expression passed to dplyr::filter(). Unlike filter, multiple criteria must be & together, rather than using commas to separate. These act on ccdb$contig_tbl
tie_break_keys: (optional) character naming fields in contig_tbl that are used sort the contig table in descending order. Used to break ties if contig_filter_args does not return a unique contig for each cluster
chain_key: character naming the field in contig_tbl to be sorted on.
contig_fields: Optional fields from contig_tbl that will be copied into the cluster_tbl from the canonical contig.
chain_levels: an optional character vector providing the sort order of the chain column in tbl. If set to length zero, then the the ordering will be alphabetical

Value

ContigCellDB with modified contig_tbl

Functions

rank_chain_ccdb: return a canonical contig by chain type, with TRB/IGH returned first. By default, ties are broken by umis and reads.

Examples

data(ccdb_ex)
ccdb_ex = cluster_germline(ccdb_ex)
rank_prev = rank_prevalence_ccdb(ccdb_ex)
rank_prev$contig_tbl
#> # A tibble: 1,508 × 26
#>    v_gene   j_gene  chain cluster_idx anno_file     pop   sample barcode is_cell
#>    <chr>    <chr>   <chr>       <int> <chr>         <chr> <chr>  <chr>   <lgl>  
#>  1 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    6      GAAGCA… TRUE   
#>  2 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 1      TCAATC… TRUE   
#>  3 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    4      GTACGT… TRUE   
#>  4 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    5      GGCTGG… TRUE   
#>  5 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 1      CCGGTA… TRUE   
#>  6 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 3      CAACTA… TRUE   
#>  7 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 3      TCAGGT… TRUE   
#>  8 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    5      AAGGTT… TRUE   
#>  9 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    6      GGGATG… TRUE   
#> 10 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    4      CTTTGC… TRUE   
#> # … with 1,498 more rows, and 17 more variables: contig_id <chr>,
#> #   high_confidence <lgl>, length <dbl>, d_gene <chr>, c_gene <chr>,
#> #   full_length <lgl>, productive <chr>, cdr3 <chr>, cdr3_nt <chr>,
#> #   reads <dbl>, umis <dbl>, raw_clonotype_id <chr>, raw_consensus_id <chr>,
#> #   celltype <chr>, prevalence <int>, n_grp <int>, grp_rank <int>
rank_chain = rank_chain_ccdb(ccdb_ex)
rank_chain$contig_tbl
#> # A tibble: 1,508 × 25
#>    v_gene   j_gene  chain cluster_idx anno_file     pop   sample barcode is_cell
#>    <chr>    <chr>   <ord>       <int> <chr>         <chr> <chr>  <chr>   <lgl>  
#>  1 TRBV13-1 TRBJ2-2 TRB           541 /Users/amcda… balbc 1      GGACAT… TRUE   
#>  2 TRBV3    TRBJ1-1 TRB           662 /Users/amcda… b6    5      CACCAC… TRUE   
#>  3 TRBV16   TRBJ2-7 TRB           597 /Users/amcda… balbc 1      GGAATA… TRUE   
#>  4 TRBV13-3 TRBJ2-7 TRB           569 /Users/amcda… b6    6      TCTTCG… TRUE   
#>  5 TRBV17   TRBJ2-3 TRB           601 /Users/amcda… b6    6      CTAACT… TRUE   
#>  6 TRBV19   TRBJ2-7 TRB           615 /Users/amcda… b6    6      AGCGGT… TRUE   
#>  7 TRBV12-2 TRBJ1-5 TRB           530 /Users/amcda… b6    5      GGAGCA… TRUE   
#>  8 TRBV1    TRBJ1-1 TRB           511 /Users/amcda… b6    4      TGTATT… TRUE   
#>  9 TRBV17   TRBJ2-1 TRB           599 /Users/amcda… balbc 3      TGAGGG… TRUE   
#> 10 TRBV15   TRBJ1-4 TRB           582 /Users/amcda… balbc 2      ATCCGA… TRUE   
#> # … with 1,498 more rows, and 16 more variables: contig_id <chr>,
#> #   high_confidence <lgl>, length <dbl>, d_gene <chr>, c_gene <chr>,
#> #   full_length <lgl>, productive <chr>, cdr3 <chr>, cdr3_nt <chr>,
#> #   reads <dbl>, umis <dbl>, raw_clonotype_id <chr>, raw_consensus_id <chr>,
#> #   celltype <chr>, n_grp <int>, grp_rank <int>