Skip to contents

Annotates each input region with the gene(s) it overlaps. A gene is considered overlapping if its body is within the specified `maxgap` of the query region. If a query region overlaps multiple genes, their names and IDs are returned as a comma-separated string.

Usage

all_overlaps_to_original(query, subject, maxgap = 0)

Arguments

query

A GRanges object containing the regions to be annotated.

subject

A GRanges object of gene annotations. It must have metadata columns named `gene_name` and, optionally, `gene_id`.

maxgap

Integer. The maximum number of base pairs between the query and subject for them to be considered overlapping. Default is 0 (must be touching).

Value

A list containing two character vectors of the same length as `query`:

genes

A character vector where each element contains a comma-separated list of `gene_name` values from subject regions overlapping the corresponding query region. An empty string `""` indicates no overlap.

ids

A character vector with the corresponding `gene_id` values, if the `gene_id` column exists in the subject.

Examples

# Create a query GRanges object with regions of interest
query_regions <- GenomicRanges::GRanges("chr1", IRanges::IRanges(c(100, 500), width = 50))

# Create a subject GRanges object with gene annotations
gene_annotations <- GenomicRanges::GRanges(
    "chr1",
    IRanges::IRanges(c(90, 200, 525), width = c(30, 50, 50)),
    gene_name = c("geneA", "geneB", "geneC"),
    gene_id = c("FBgn01", "FBgn02", "FBgn03")
)

# Find overlaps (query 1 overlaps geneA; query 2 overlaps geneC)
overlaps <- all_overlaps_to_original(query_regions, gene_annotations, maxgap = 0)
print(overlaps)
#> $genes
#> [1] "geneA" "geneC"
#> 
#> $ids
#> [1] "FBgn01" "FBgn03"
#> 

# With a larger gap, query 1 now also overlaps geneB
overlaps_gapped <- all_overlaps_to_original(query_regions, gene_annotations, maxgap = 50)
print(overlaps_gapped)
#> $genes
#> [1] "geneA,geneB" "geneC"      
#> 
#> $ids
#> [1] "FBgn01,FBgn02" "FBgn03"       
#>