Skip to contents

Reads RNA Polymerase DamID binding profiles either from bedGraph files or directly from a named list of GRanges objects. Calculates binding occupancy summarised over genes.

Usage

load_data_genes(
  binding_profiles_path = NULL,
  binding_profiles = NULL,
  quantile_norm = FALSE,
  organism = "drosophila melanogaster",
  ensdb_genes = NULL,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

binding_profiles_path

Character vector of directories or file globs containing log2 ratio binding tracks in bedGraph format. Wildcards ('*') supported.

binding_profiles

Named list of GRanges objects representing binding profiles.

quantile_norm

Logical (default: FALSE) quantile-normalise across all signal columns if TRUE.

organism

Organism string (lower case) to obtain genome annotation from (if not providing a custom `ensdb_genes` object) Defautls to "drosophila melanogaster".

ensdb_genes

GRanges object: gene annotation. Automatically obtained from `organism` if NULL.

BPPARAM

BiocParallel function (defaults to BiocParallel::bpparam())

Value

List with elements:

binding_profiles_data

data.frame of merged binding profiles, with chr, start, end, sample columns.

occupancy

data.frame of occupancy values summarised over genes.

test_category

Character scalar; will be "expressed".

Details

One of `binding_profiles_path` or `binding_profiles` must be provided.

When supplying GRanges lists, each GRanges should contain exactly one numeric metadata column representing the signal, and `binding_profiles` must be a named list, with element names used as sample names.

Examples

# Create a mock GRanges object for gene annotations
# This object, based on the package's unit tests, avoids network access
# and includes a very long gene to ensure overlaps with sample data.
mock_genes_gr <- GenomicRanges::GRanges(
    seqnames = S4Vectors::Rle("2L", 7),
    ranges = IRanges::IRanges(
        start = c(1000, 2000, 3000, 5000, 6000, 7000, 8000),
        end = c(1500, 2500, 3500, 5500, 6500, 7500, 20000000)
    ),
    strand = S4Vectors::Rle(GenomicRanges::strand(c("+", "-", "+", "+", "-", "-", "+"))),
    gene_id = c("FBgn001", "FBgn002", "FBgn003", "FBgn004", "FBgn005", "FBgn006", "FBgn007"),
    gene_name = c("geneA", "geneB", "geneC", "geneD", "geneE", "geneF", "LargeTestGene")
)

# Get path to sample data files included with the package
data_dir <- system.file("extdata", package = "damidBind")

# Run loading function using sample files and mock gene annotations
# This calculates occupancy over genes instead of peaks.
loaded_data_genes <- load_data_genes(
    binding_profiles_path = data_dir,
    ensdb_genes = mock_genes_gr,
    quantile_norm = FALSE
)
#> Locating binding profile files
#> Building binding profile dataframe from input files ...
#>  - Loaded: Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm
#> Calculating average occupancy per gene ...

# View the head of the occupancy table
head(loaded_data_genes$occupancy)
#>                      name nfrags Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#> 2L:1000-1500 2L:1000-1500      4                             0.000000
#> 2L:2000-2500 2L:2000-2500      4                             0.000000
#> 2L:3000-3500 2L:3000-3500      4                             0.000000
#> 2L:5000-5500 2L:5000-5500      4                             1.465469
#> 2L:6000-6500 2L:6000-6500      2                             1.515389
#> 2L:7000-7500 2L:7000-7500      1                            -0.130000
#>              Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#> 2L:1000-1500                             0.000000
#> 2L:2000-2500                             0.000000
#> 2L:3000-3500                             0.000000
#> 2L:5000-5500                             1.444371
#> 2L:6000-6500                             1.791916
#> 2L:7000-7500                            -0.110000
#>              Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#> 2L:1000-1500                            0.0000000
#> 2L:2000-2500                            0.0000000
#> 2L:3000-3500                            0.0000000
#> 2L:5000-5500                            1.2137325
#> 2L:6000-6500                            0.2373453
#> 2L:7000-7500                           -0.0300000
#>              Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm gene_names gene_ids
#> 2L:1000-1500                            0.0000000      geneA  FBgn001
#> 2L:2000-2500                            0.0000000      geneB  FBgn002
#> 2L:3000-3500                            0.0000000      geneC  FBgn003
#> 2L:5000-5500                            0.8943313      geneD  FBgn004
#> 2L:6000-6500                            0.6489222      geneE  FBgn005
#> 2L:7000-7500                           -0.0900000      geneF  FBgn006