Find best gene overlap(s) for each query interval
Source:R/granges_functions.R
all_overlaps_to_original.Rd
Annotates each input region with the gene(s) it overlaps. A gene is considered overlapping if its body is within the specified `maxgap` of the query region. If a query region overlaps multiple genes, their names and IDs are returned as a comma-separated string.
Arguments
- query
A GRanges object containing the regions to be annotated.
- subject
A GRanges object of gene annotations. It must have metadata columns named `gene_name` and, optionally, `gene_id`.
- maxgap
Integer. The maximum number of base pairs between the query and subject for them to be considered overlapping. Default is 0 (must be touching).
Value
A list containing two character vectors of the same length as `query`:
- genes
A character vector where each element contains a comma-separated list of `gene_name` values from subject regions overlapping the corresponding query region. An empty string `""` indicates no overlap.
- ids
A character vector with the corresponding `gene_id` values, if the `gene_id` column exists in the subject.
Examples
# Create a query GRanges object with regions of interest
query_regions <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(100, 500), width = 50))
# Create a subject GRanges object with gene annotations
gene_annotations <- GenomicRanges::GRanges(
"chr1",
IRanges::IRanges(c(90, 200, 525), width = c(30, 50, 50)),
gene_name = c("geneA", "geneB", "geneC"),
gene_id = c("FBgn01", "FBgn02", "FBgn03")
)
# Find overlaps (query 1 overlaps geneA; query 2 overlaps geneC)
overlaps <- all_overlaps_to_original(query_regions, gene_annotations, maxgap = 0)
print(overlaps)
#> $genes
#> [1] "geneA" "geneC"
#>
#> $ids
#> [1] "FBgn01" "FBgn03"
#>
# With a larger gap, query 1 now also overlaps geneB
overlaps_gapped <- all_overlaps_to_original(query_regions, gene_annotations, maxgap = 50)
print(overlaps_gapped)
#> $genes
#> [1] "geneA,geneB" "geneC"
#>
#> $ids
#> [1] "FBgn01,FBgn02" "FBgn03"
#>