Load genome-wide binding data for gene expression (RNA polymerase occupancy)
Source:R/load_data.R
load_data_genes.Rd
Reads RNA Polymerase DamID binding profiles either from bedGraph files or directly from a named list of GRanges objects. Calculates binding occupancy summarised over genes.
Usage
load_data_genes(
binding_profiles_path = NULL,
binding_profiles = NULL,
quantile_norm = FALSE,
organism = "drosophila melanogaster",
ensdb_genes = NULL,
BPPARAM = BiocParallel::bpparam()
)
Arguments
- binding_profiles_path
Character vector of directories or file globs containing log2 ratio binding tracks in bedGraph format. Wildcards ('*') supported.
- binding_profiles
Named list of GRanges objects representing binding profiles.
- quantile_norm
Logical (default: FALSE) quantile-normalise across all signal columns if TRUE.
- organism
Organism string (lower case) to obtain genome annotation from (if not providing a custom `ensdb_genes` object) Defautls to "drosophila melanogaster".
- ensdb_genes
GRanges object: gene annotation. Automatically obtained from `organism` if NULL.
- BPPARAM
BiocParallel function (defaults to BiocParallel::bpparam())
Value
List with elements:
- binding_profiles_data
data.frame of merged binding profiles, with chr, start, end, sample columns.
- occupancy
data.frame of occupancy values summarised over genes.
- test_category
Character scalar; will be "expressed".
Details
One of `binding_profiles_path` or `binding_profiles` must be provided.
When supplying GRanges lists, each GRanges should contain exactly one numeric metadata column representing the signal, and `binding_profiles` must be a named list, with element names used as sample names.
Examples
# Create a mock GRanges object for gene annotations
# This object, based on the package's unit tests, avoids network access
# and includes a very long gene to ensure overlaps with sample data.
mock_genes_gr <- GenomicRanges::GRanges(
seqnames = S4Vectors::Rle("2L", 7),
ranges = IRanges::IRanges(
start = c(1000, 2000, 3000, 5000, 6000, 7000, 8000),
end = c(1500, 2500, 3500, 5500, 6500, 7500, 20000000)
),
strand = S4Vectors::Rle(GenomicRanges::strand(c("+", "-", "+", "+", "-", "-", "+"))),
gene_id = c("FBgn001", "FBgn002", "FBgn003", "FBgn004", "FBgn005", "FBgn006", "FBgn007"),
gene_name = c("geneA", "geneB", "geneC", "geneD", "geneE", "geneF", "LargeTestGene")
)
# Get path to sample data files included with the package
data_dir <- system.file("extdata", package = "damidBind")
# Run loading function using sample files and mock gene annotations
# This calculates occupancy over genes instead of peaks.
loaded_data_genes <- load_data_genes(
binding_profiles_path = data_dir,
ensdb_genes = mock_genes_gr,
quantile_norm = FALSE
)
#> Locating binding profile files
#> Building binding profile dataframe from input files ...
#> - Loaded: Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm
#> Calculating average occupancy per gene ...
# View the head of the occupancy table
head(loaded_data_genes$occupancy)
#> name nfrags Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#> 2L:1000-1500 2L:1000-1500 4 0.000000
#> 2L:2000-2500 2L:2000-2500 4 0.000000
#> 2L:3000-3500 2L:3000-3500 4 0.000000
#> 2L:5000-5500 2L:5000-5500 4 1.465469
#> 2L:6000-6500 2L:6000-6500 2 1.515389
#> 2L:7000-7500 2L:7000-7500 1 -0.130000
#> Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#> 2L:1000-1500 0.000000
#> 2L:2000-2500 0.000000
#> 2L:3000-3500 0.000000
#> 2L:5000-5500 1.444371
#> 2L:6000-6500 1.791916
#> 2L:7000-7500 -0.110000
#> Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#> 2L:1000-1500 0.0000000
#> 2L:2000-2500 0.0000000
#> 2L:3000-3500 0.0000000
#> 2L:5000-5500 1.2137325
#> 2L:6000-6500 0.2373453
#> 2L:7000-7500 -0.0300000
#> Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm gene_names gene_ids
#> 2L:1000-1500 0.0000000 geneA FBgn001
#> 2L:2000-2500 0.0000000 geneB FBgn002
#> 2L:3000-3500 0.0000000 geneC FBgn003
#> 2L:5000-5500 0.8943313 geneD FBgn004
#> 2L:6000-6500 0.6489222 geneE FBgn005
#> 2L:7000-7500 -0.0900000 geneF FBgn006