Compute average occupancy for each gene, using DamID binding profiles
Source:R/granges_functions.R
gene_occupancy.Rd
For each gene in the provided annotation, this function calculates the average log2 score by taking a weighted mean of all overlapping fragments from the binding data. The weighting is based on the width of the overlap between each fragment and the gene body.
Usage
gene_occupancy(
binding_data,
ensdb_genes = get_ensdb_genes()$genes,
buffer = 0,
BPPARAM = BiocParallel::bpparam()
)
Arguments
- binding_data
A data.frame as produced by `build_dataframes()`. It must contain columns 'chr', 'start', 'end', and then one numeric column per sample.
- ensdb_genes
A GRanges object of gene annotations. A custom GRanges object can be provided, but it must contain metadata columns 'gene_name' and 'gene_id'. If not provided, it is retrieved via `get_ensdb_genes`.
- buffer
Integer. Number of bases to extend gene boundaries on both sides before calculating overlaps. Default is 0.
- BPPARAM
A BiocParallel parameter object for parallel computation. Default: `BiocParallel::bpparam()`.
Value
A data.frame where rows are genes and columns include one per sample (containing the weighted mean occupancy), plus annotation columns such as gene name, number of overlapping fragments ('nfrags'), and gene ID.
Examples
# 1. Create a GRanges object for gene annotations
genes_gr <- GenomicRanges::GRanges(
"chrY",
IRanges::IRanges(c(1000, 3000), width = 500),
gene_name = c("gene1", "gene2"),
gene_id = c("ID1", "ID2")
)
# 2. Create mock binding data
binding_df <- data.frame(
chr = "chrY",
start = c(950, 1200, 3100),
end = c(1050, 1300, 3200),
sampleA = c(1.5, 2.0, 0.5),
sampleB = c(1.8, 2.2, 0.4)
)
# 3. Calculate average gene occupancy
# Use BiocParallel::SerialParam() for deterministic execution in examples
if (requireNamespace("BiocParallel", quietly = TRUE)) {
gene_occ <- gene_occupancy(binding_df, genes_gr,
BPPARAM = BiocParallel::SerialParam()
)
print(gene_occ)
}
#> Calculating average occupancy per gene ...
#> name nfrags sampleA sampleB gene_names gene_ids
#> chrY:1000-1499 chrY:1000-1499 2 1.832237 2.065789 gene1 ID1
#> chrY:3000-3499 chrY:3000-3499 1 0.500000 0.400000 gene2 ID2