Rank contigs, per cell, by experiment-wide prevalence of cluster_pk, which is added as the prevalence field

rank_prevalence_ccdb(
  ccdb,
  contig_filter_args = TRUE,
  tie_break_keys = c("umis", "reads")
)

rank_chain_ccdb(
  ccdb,
  contig_filter_args = TRUE,
  tie_break_keys = c("umis", "reads"),
  chain_key = "chain",
  contig_fields = tie_break_keys,
  chain_levels = c("IGL", "IGK", "TRA", "TRB", "IGH")
)

Arguments

ccdb

ContigCellDB()

contig_filter_args

an expression passed to dplyr::filter(). Unlike filter, multiple criteria must be & together, rather than using commas to separate. These act on ccdb$contig_tbl

tie_break_keys

(optional) character naming fields in contig_tbl that are used sort the contig table in descending order. Used to break ties if contig_filter_args does not return a unique contig for each cluster

chain_key

character naming the field in contig_tbl to be sorted on.

contig_fields

Optional fields from contig_tbl that will be copied into the cluster_tbl from the canonical contig.

chain_levels

an optional character vector providing the sort order of the chain column in tbl. If set to length zero, then the the ordering will be alphabetical

Value

ContigCellDB with modified contig_tbl

Functions

  • rank_chain_ccdb: return a canonical contig by chain type, with TRB/IGH returned first. By default, ties are broken by umis and reads.

Examples

data(ccdb_ex)
ccdb_ex = cluster_germline(ccdb_ex)
rank_prev = rank_prevalence_ccdb(ccdb_ex)
rank_prev$contig_tbl
#> # A tibble: 1,508 × 26
#>    v_gene   j_gene  chain cluster_idx anno_file     pop   sample barcode is_cell
#>    <chr>    <chr>   <chr>       <int> <chr>         <chr> <chr>  <chr>   <lgl>  
#>  1 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    6      GAAGCA… TRUE   
#>  2 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 1      TCAATC… TRUE   
#>  3 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    4      GTACGT… TRUE   
#>  4 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    5      GGCTGG… TRUE   
#>  5 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 1      CCGGTA… TRUE   
#>  6 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 3      CAACTA… TRUE   
#>  7 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… balbc 3      TCAGGT… TRUE   
#>  8 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    5      AAGGTT… TRUE   
#>  9 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    6      GGGATG… TRUE   
#> 10 TRBV13-2 TRBJ2-7 TRB           557 /Users/amcda… b6    4      CTTTGC… TRUE   
#> # … with 1,498 more rows, and 17 more variables: contig_id <chr>,
#> #   high_confidence <lgl>, length <dbl>, d_gene <chr>, c_gene <chr>,
#> #   full_length <lgl>, productive <chr>, cdr3 <chr>, cdr3_nt <chr>,
#> #   reads <dbl>, umis <dbl>, raw_clonotype_id <chr>, raw_consensus_id <chr>,
#> #   celltype <chr>, prevalence <int>, n_grp <int>, grp_rank <int>
rank_chain = rank_chain_ccdb(ccdb_ex)
rank_chain$contig_tbl
#> # A tibble: 1,508 × 25
#>    v_gene   j_gene  chain cluster_idx anno_file     pop   sample barcode is_cell
#>    <chr>    <chr>   <ord>       <int> <chr>         <chr> <chr>  <chr>   <lgl>  
#>  1 TRBV13-1 TRBJ2-2 TRB           541 /Users/amcda… balbc 1      GGACAT… TRUE   
#>  2 TRBV3    TRBJ1-1 TRB           662 /Users/amcda… b6    5      CACCAC… TRUE   
#>  3 TRBV16   TRBJ2-7 TRB           597 /Users/amcda… balbc 1      GGAATA… TRUE   
#>  4 TRBV13-3 TRBJ2-7 TRB           569 /Users/amcda… b6    6      TCTTCG… TRUE   
#>  5 TRBV17   TRBJ2-3 TRB           601 /Users/amcda… b6    6      CTAACT… TRUE   
#>  6 TRBV19   TRBJ2-7 TRB           615 /Users/amcda… b6    6      AGCGGT… TRUE   
#>  7 TRBV12-2 TRBJ1-5 TRB           530 /Users/amcda… b6    5      GGAGCA… TRUE   
#>  8 TRBV1    TRBJ1-1 TRB           511 /Users/amcda… b6    4      TGTATT… TRUE   
#>  9 TRBV17   TRBJ2-1 TRB           599 /Users/amcda… balbc 3      TGAGGG… TRUE   
#> 10 TRBV15   TRBJ1-4 TRB           582 /Users/amcda… balbc 2      ATCCGA… TRUE   
#> # … with 1,498 more rows, and 16 more variables: contig_id <chr>,
#> #   high_confidence <lgl>, length <dbl>, d_gene <chr>, c_gene <chr>,
#> #   full_length <lgl>, productive <chr>, cdr3 <chr>, cdr3_nt <chr>,
#> #   reads <dbl>, umis <dbl>, raw_clonotype_id <chr>, raw_consensus_id <chr>,
#> #   celltype <chr>, n_grp <int>, grp_rank <int>