v1.5.3

  • New feature: --coverage flag outputs coverage bedGraph track of each sample (useful with Dam-only tracks for CATaDa analysis). --just_coverage exits after coverage tracks are written.
  • New feature: Kernel density normalisation routines sped up ~10x via Inline::C. This is enabled by default, but will only work if the Inline::C module is installed; damidseq_pipeline will fall back on the Perl coded routine if this is not present. If you want to control explicitly, do so via the --kde_inline flag.
  • New feature: ChIP-seq-specific processing routines added: --remdups removes PCR duplicates, --unique only maps unique reads. Enable all of these with --chipseq (also turns of GATC-level binning, GATC-based read extension and will only output coverage tracks not ratios)
  • New feature: New normalisation method, “rawbins”. This writes the normalisation data to TSV format and runs a user-provided script for custom normalisation. Your script should read the data file and output the normalisation factor. Advanced users only, use with caution.
  • New feature: consider all occupancy as RPM (reads per million mapped reads), set as default. Control with --scores_as_rpm flag.
  • Changed: bowtie2 command handling refactored more elegantly
  • Added: ability to add custom flags to bowtie2 command with --bowtie2_add_flags
  • Fixed: small changes to read mapping code for more accurate mapping
  • Changed: filename filters are now turned off by default (use --no_file_filters=0 to turn back on)

v1.4.6

  • New feature: paired-end reads are now handled natively at the fastq level. Use the –paired flag to enable, and your paired reads should be automagically detected and aligned
  • Changed: BAM file format is now used natively without SAM intermediates at all levels (also fixes the recent samtools version handling bug)
  • Changed: paired-end reads are selected based on the 0x2 bit on the SAM file format bitwise FLAG (reads aligned in proper pair)
  • Fixed: paired-end read counts are enumerated correctly
  • Removed: –keep_sam and –only_sam flags (as SAM format is no longer used)
  • Added: –keep_original_bams flag to retain the non-extended reads BAM file when using single-end sequencing
  • Changed: warning on missing chromosomal identities in GATC alignment made more friendly and less confusing

v1.4.5

  • Added –ps_debug option to explicitly display pseudocounts calculation

v1.4.4

  • Fixed issue with some externally-generated BAM files

v1.4.3

  • Better handling of chromosome names

v1.4.2

  • Small improvements to SAM file handling

v1.4.1

  • Fixed issues with samtools v1.3 (this version of samtools introduced backwards incompatibilities when using the ‘sort’ function. damidseq_pipeline now checks for the version number and should support all versions of samtools.)

v1.4

  • New read-extension method: by default, reads are now only extended as far as the next GATC fragment. Use –extension_method=full to disable this feature and extend every read by the value of –len.
  • Output format is now bedgraph by default. Use –output_format=gff to restore the previous default. Changing the default to bedgraph allows users to create TDF tracks directly within the graphical IGV tools, making it easier for end users.
  • Minor code cleanups

v1.3

  • Major bugfix: reads from the minus strand were not being extended correctly during processing. The overall impact is minor (correlation between old and new read extension methods is >0.95) but this new method is technically more accurate.
  • added –keep_sam (do not delete the temporary SAM file) option
  • added –only_sam (do not generate BAM files) option (both options are intended for debug purposes only)

v1.2.7

  • New opition –no_file_filters to prevent any filename trimming/filtering (by default input filenames are trimmed to the first underscore)
  • Small filename issue fixes

v1.2.6

  • Now uses File::Basename to handle filenames
  • Fixed/cleaned up a number of rare problems with filename handling

v1.2.5

  • Added –dam and –out_name options. Use these to set the Dam-only control sample and/or a custom output name
  • Added more sanity checks
  • Minor bugfixes

v1.2.4

  • Added explicit checks for bowtie2 and samtools executables, and for bowtie2 output

v1.2.3

  • Fixed a serious error in RPM normalisation calculations (values were inverted) – please re-run on your samples if you have used this method on them
  • Minor code cleanups

v1.2.1

  • Added ability to process BAM files generated from paired-end sequencing
  • Cleaner reporting of missing assembly fragments in GATC files
  • Some small bugs fixed
  • General code clean-ups

v1.2

  • Completely re-written normalisation routine based on kernel density estimation
  • Genomic coverage is now calculated internally rather than using bedtools (uses much less memory, is slightly faster, and drops the requirement for an external binned windows file)
  • Binned window files are no longer required (bins are calculated automatically using the sequence information provided in the BAM headers, and the bin size specified by the –bins command-line option)
  • Better handling of GATC fragment files (should prevent hangs/pauses when creating GATC fragment arrays)
  • Memory optimisation for large files (greatly reduces usage for processing mouse/human data)
  • Added ability to process BAM files directly
  • Much better file-handling all round (now takes sample names directly from filenames by default; the option to use an index.txt file remains but is essentially deprecated)
  • New option: –norm_method=## kde/rpm “kde” is the default method using kernel density estimation; “rpm” normalises solely on readcounts/million reads only (the “rpm” method is not recommended except for the very rare cases in which a Dam-fusion protein fails to methylate accessible genomic regions, making kde normalisation is inappropriate)
  • Re-written –help output rountines (better formatted and more informative)
  • Ability to read gzipped GATC files
  • Ability to save sets of defaults to enable quick switching between different genomes (use –save_defaults=## name; use –load_defaults=## name to load; use load_defaults=list to list current available options)
  • New location for config files (in ~/.config/damid_pipeline/). Existing config file will be migrated automatically
  • Various small bugfixes and code clean-ups

** NB: a number of default parameters have changed with this release. It is strongly advised to reset all parameters to the default value with –reset_defaults.

v1.0

  • Initial release