Extract unique sample names from complex labels
Source:R/diagnostic_plots.R
extract_unique_sample_ids.RdThis function takes a vector of complex sample labels and iteratively constructs a simplified, unique name for each. It identifies all blocks of text that differ across the sample set and progressively adds them to a base name until the combination of the base name and a replicate identifier is unique for every sample.
Usage
extract_unique_sample_ids(
sample_names,
delimiter = "[-_\\.]",
replicate_pattern = "^(n|N|r|rep|replicate|sample)\\d+"
)Value
A vector of simplified, unique names. If a unique name cannot be formed or essential information is missing for a sample, the original label for that sample is returned as a fallback.
Examples
labels <- c(
"RNAPII_elav-GSE77860-n1-SRR3164378-2017-vs-Dam.scaled.kde-norm",
"RNAPII_elav-GSE77860-n2-SRR3164379-2017-vs-Dam.scaled.kde-norm",
"RNAPII_elav-GSE77860-n4-SRR3164380-2017-vs-Dam.scaled.kde-norm",
"RNAPII_Wor-GSE77860-n1-SRR3164346-2017-vs-Dam.scaled.kde-norm",
"RNAPII_Wor-GSE77860-n2-SRR3164347-2017-vs-Dam.scaled.kde-norm",
"RNAPII_Wor-GSE77860-sample1-SRR2038537-2017-vs-Dam.scaled.kde-norm"
)
extract_unique_sample_ids(labels)
#> [1] "elav_n1" "elav_n2" "elav_n4" "Wor_n1" "Wor_n2"
#> [6] "Wor_sample1"