Skip to contents

This function performs Gene Ontology (GO) enrichment analysis using `clusterProfiler` for either the upCond1ulated or upCond2ulated regions/genes identified by `differential_binding()` or `differential_accessibility()`. It automatically extracts the relevant Flybase IDs (FBgnIDs) and the background universe from the input `DamIDResults` object.

Usage

analyse_go_enrichment(
  diff_results,
  direction = "cond1",
  org_db = org.Dm.eg.db::org.Dm.eg.db,
  ontology = "BP",
  pvalue_cutoff = 0.05,
  qvalue_cutoff = 0.2,
  plot_title = NULL,
  show_category = 12,
  label_format_width = 30,
  save = NULL,
  save_results_path = NULL,
  maxGSSize = 1000,
  minGSSize = 10
)

Arguments

diff_results

A `DamIDResults` object, as returned by `differential_binding()` or `differential_accessibility()`.

direction

Character string. Specifies which set of genes to analyse, either using condition names, "cond1" or "cond2", or "all" (for all significantly enriched genes from either direction). Default is "cond1".

org_db

An OrgDb object specifying the organism's annotation database. For Drosophila, use `org.Dm.eg.db::org.Dm.eg.db`.

ontology

Character string. The GO ontology to use: "BP" (Biological Process), "MF" (Molecular Function), or "CC" (Cellular Component). Default is "BP".

pvalue_cutoff

Numeric. Adjusted p-value cutoff for significance. Default: 0.05.

qvalue_cutoff

Numeric. Q-value cutoff for significance. Default: 0.2.

plot_title

Character string. Title for the generated dot plot.

show_category

Integer. Number of top enriched GO categories to display in the plot. Default: 12.

label_format_width

Integer. Max character length for GO term labels on the plot. Default: 30.

save

List or `NULL`. Controls saving the plot to a file (dot plot). If `NULL`, `FALSE`, or `0`, the plot is not saved. If a `list`, it specifies saving parameters:

  • filename (character): The path and base name for the output file. If not specified, the default name "damidBind_GSEA_dotplot" is used.

  • format (character): File format ("pdf", "svg", or "png"). Default is "pdf".

  • width (numeric): Width of the plot in inches. Default is 6.

  • height (numeric): Height of the plot in inches. Default is 6.

save_results_path

Character string or NULL. If a path is provided (e.g., "go_results.csv"), the enrichment results table will be saved to this CSV file.

maxGSSize

Integer. Maximum size of gene sets to consider. Default: 1000.

minGSSize

Integer. Minimum size of gene sets to consider. Default: 10.

Value

A list containing:

enrich_go_object

`enrichResult` object from `clusterProfiler`.

results_table

Data frame of enrichment results.

dot_plot

`ggplot` object of the dot plot.

NULL if no significant enrichment is found or if input validation fails.

Details

This function assumes that the `analysis` slot in the `diff_results` object contains a `gene_ids` column. If this column is not present, or cannot be processed, the function will return NULL.

The function includes an internal helper `clean_gene_symbols` which filters common ambiguous gene symbols (snoRNA, snRNA, tRNA) that may not be useful for GO enrichment.

Examples

# This example requires the 'org.Dm.eg.db' package
if (requireNamespace("org.Dm.eg.db", quietly = TRUE)) {
    # Helper function to create a sample DamIDResults object
    .generate_example_results <- function() {
        # Define a mock gene object. Note: Real, mappable FlyBase IDs are
        # used for the 'gene_id' column to ensure the example runs.
        mock_genes_gr <- GenomicRanges::GRanges(
            seqnames = S4Vectors::Rle("2L", 7),
            ranges = IRanges::IRanges(
                start = c(1000, 2000, 3000, 5000, 6000, 7000, 8000),
                end = c(1500, 2500, 3500, 5500, 6500, 7500, 20000000)
            ),
            gene_id = c(
                "FBgn0034439", "FBgn0031267", "FBgn0051138", "FBgn0031265",
                "FBgn0004655", "FBgn0000251", "FBgn0000252"
            ),
            gene_name = c("ap", "dpr1", "side", "dpr2", "eg", "bi", "br")
        )
        data_dir <- system.file("extdata", package = "damidBind")
        loaded_data <- load_data_peaks(
            binding_profiles_path = data_dir,
            peaks_path = data_dir,
            ensdb_genes = mock_genes_gr,
            quantile_norm = TRUE
        )
        diff_results <- differential_binding(
            loaded_data,
            cond = c("L4", "L5"),
            cond_names = c("L4 Neurons", "L5 Neurons")
        )
        return(diff_results)
    }
    diff_results <- .generate_example_results()

    # Run GO Enrichment for genes enriched in the first condition ("L4")
    # Note: with tiny sample data, this may not find significant terms.
    go_results <- analyse_go_enrichment(
        diff_results,
        direction = "L4",
        org_db = org.Dm.eg.db::org.Dm.eg.db
    )

    # Print the results table if any enrichment was found
    if (!is.null(go_results)) {
        print(go_results$results_table)
    }
}
#> Locating binding profile files
#> Building binding profile dataframe from input files ...
#>  - Loaded: Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm
#>  - Loaded: Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm
#> Applying quantile normalisation
#> Locating peak files
#> Calculating occupancy over peaks
#> Calculating average occupancy per region ...
#> Differential analysis setup:
#> Condition 1: 'L4' (display as 'L4 Neurons')
#>   Found 2 replicates:
#>     Bsh_Dam_L4_r1-ext300-vs-Dam.kde-norm_qnorm
#>     Bsh_Dam_L4_r2-ext300-vs-Dam.kde-norm_qnorm
#> Condition 2: 'L5' (display as 'L5 Neurons')
#>   Found 2 replicates:
#>     Bsh_Dam_L5_r1-ext300-vs-Dam.kde-norm_qnorm
#>     Bsh_Dam_L5_r2-ext300-vs-Dam.kde-norm_qnorm
#> limma contrasts: L4-L5
#> 
#> 277 loci enriched in L4 Neurons
#> Highest-ranked genes:
#> br, br, br, br, br, br, br, br, br, br
#> 
#> 173 loci enriched in L5 Neurons
#> Highest-ranked genes:
#> br, br, br, br, br, br, br, br, br, br
#> Universe of Flybase IDs is too small or invalid for enrichment analysis. Returning NULL.