Load genome-wide binding data and associated peak files or GRanges objects
Source:R/load_data.R
load_data_peaks.Rd
Reads DamID-seq log2 ratio binding data either from bedGraph files or directly from a list of GRanges objects, and associated peak regions either from GFF/bed files or from a list of GRanges objects. This function is suitable for transcription factor binding analyses. For peak discovery, use an external peak caller (e.g. 'find_peaks').
Usage
load_data_peaks(
binding_profiles_path = NULL,
peaks_path = NULL,
binding_profiles = NULL,
peaks = NULL,
quantile_norm = FALSE,
organism = "drosophila melanogaster",
ensdb_genes = NULL,
BPPARAM = BiocParallel::bpparam()
)
Arguments
- binding_profiles_path
Character vector. Path(s) to directories or file globs containing log2 ratio binding tracks in bedGraph format. Wildcards ('*') supported.
- peaks_path
Character vector. Path(s) to directories or file globs containing the peak calls in GFF or BED format.
- binding_profiles
List of GRanges objects with binding profiles, one per sample.
- peaks
List of GRanges objects representing peak regions.
- quantile_norm
Logical (default: FALSE). If TRUE, quantile-normalise the signal columns across all datasets.
- organism
Organism string (lower case) to obtain genome annotation from (if not providing a custom `ensdb_genes` object) Defautls to "drosophila melanogaster".
- ensdb_genes
GRanges object: gene annotation. Automatically obtained from `organism` if NULL.
- BPPARAM
BiocParallel function (defaults to BiocParallel::bpparam())
Value
A list with components:
- binding_profiles_data
data.frame: Signal matrix for all regions, with columns chr, start, end, sample columns.
- peaks
list(GRanges): All loaded peak regions from input files or directly supplied.
- pr
GRanges: Reduced (union) peak regions across samples.
- occupancy
data.frame: Binding values summarised over reduced peaks, with overlap annotations.
- test_category
Character scalar; will be "bound".
Details
One of `binding_profiles_path` or `binding_profiles` must be provided. Similarly, one of `peaks_path` or `peaks` must be provided.
When supplying GRanges lists, each GRanges should contain exactly one numeric metadata column representing the binding signal, and all GRanges should be supplied as a named list, with element names used as sample names.
Examples
# Create a mock GRanges object for gene annotation
# This object, based on the package's unit tests, avoids network access
# and includes a very long gene to ensure overlaps with sample data.
mock_genes_gr <- GenomicRanges::GRanges(
seqnames = S4Vectors::Rle("2L", 7),
ranges = IRanges::IRanges(
start = c(1000, 2000, 3000, 5000, 6000, 7000, 8000),
end = c(1500, 2500, 3500, 5500, 6500, 7500, 20000000)
),
strand = S4Vectors::Rle(GenomicRanges::strand(c("+", "-", "+", "+", "-", "-", "+"))),
gene_id = c("FBgn001", "FBgn002", "FBgn003", "FBgn004", "FBgn005", "FBgn006", "FBgn007"),
gene_name = c("geneA", "geneB", "geneC", "geneD", "geneE", "geneF", "LargeTestGene")
)
# Get path to sample data files included with the package
data_dir <- system.file("extdata", package = "damidBind")
# Run loading function using sample files and mock gene annotations
loaded_data <- load_data_peaks(
binding_profiles_path = data_dir,
peaks_path = data_dir,
ensdb_genes = mock_genes_gr,
quantile_norm = TRUE
)
#> Locating binding profile files
#> Building binding profile dataframe from input files ...
#> - Loaded: Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#> - Loaded: Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm
#> Applying quantile normalisation
#> Locating peak files
#> Calculating occupancy over peaks
#> Calculating average occupancy per region ...
# View the structure of the output
str(loaded_data, max.level = 1)
#> List of 5
#> $ binding_profiles_data:'data.frame': 62464 obs. of 7 variables:
#> $ peaks :List of 4
#> $ pr :Formal class 'GRanges' [package "GenomicRanges"] with 7 slots
#> $ occupancy :'data.frame': 1208 obs. of 8 variables:
#> $ test_category : chr "bound"