Compute occupancy for genomic regions — calculate

For each interval in the `regions` GRanges object, this function finds all overlapping fragments in `binding_data` and computes a weighted mean of their signal values. Any metadata columns present in the input `regions` object are preserved in the output data.frame.

Usage

calculate_occupancy(
  binding_data,
  regions,
  buffer = 0,
  BPPARAM = BiocParallel::bpparam()
)

Arguments

binding_data: A data.frame as produced by `build_dataframes()`. It must contain columns 'chr', 'start', 'end', followed by numeric sample columns.
regions: A GRanges object of genomic intervals (e.g., genes or reduced peaks) over which to calculate occupancy.
buffer: Optional integer. Number of base pairs to expand each interval in `regions` on both sides before calculating occupancy. Default is 0.
BPPARAM: A BiocParallel parameter object for parallel computation. Default is `BiocParallel::bpparam()`.

Value

A data.frame with one row per region from the input `regions` object. The output includes the weighted mean occupancy for each sample, `nfrags` (number of overlapping fragments), and all original metadata columns from `regions`. Rownames are generated from region coordinates to ensure uniqueness.

Examples

# Create a set of regions with metadata
regions_gr <- GenomicRanges::GRanges(
    "chrX", IRanges::IRanges(start = c(100, 500), width = 100),
    gene_name = c("MyGene1", "MyGene2"), score = c(10, 20)
)

# Create a mock binding data GRanges object
binding_gr <- GenomicRanges::GRanges(
    seqnames = "chrX",
    ranges = IRanges::IRanges(
        start = c(90, 150, 480, 550),
        end = c(110, 170, 520, 580)
    ),
    sampleA = c(1.2, 0.8, 2.5, 3.0),
    sampleB = c(1.0, 0.9, 2.8, 2.9)
)

# Calculate occupancy over the regions
# Use BiocParallel::SerialParam() for deterministic execution in examples
if (requireNamespace("BiocParallel", quietly = TRUE)) {
    occupancy_data <- calculate_occupancy(binding_gr, regions_gr,
        BPPARAM = BiocParallel::SerialParam()
    )
    print(occupancy_data)
}
#> Calculating average occupancy for 2 regions...
#>              gene_name score         name nfrags  sampleA  sampleB
#> chrX:100-199   MyGene1    10 chrX:100-199      2 0.937500 0.934375
#> chrX:500-599   MyGene2    20 chrX:500-599      2 2.798077 2.859615