Monday, October 17, 2011

Denoising pyrosequencing reads

Tools to denoise pyrosequencing reads.


1. Denoiser
- Software for rapidly denoising pyrosequencing amplicaon reads by exploiting rank-abundance distributions.
- It is an alternative to PyroNoise software suite
- It is included in the new release of Qiime 1.3
Download

$ wget http://www.microbio.me/denoiser/Denoiser_0.91.tgz
$ tar zxvf Denoiser_0.91.tgz
$ cd Denoiser_0.91


Pre-requisites

Python: >= 2.5.1        http://www.python.org
PyCogent toolkit: >= 1.4        http://pycogent.sourceforge.net
ghc: >= 6.8 (recommended for install)  http://haskell.org/ghc


Install pre-requisites
$ sudo apt-get install ghc

Install Denoiser

$ cd FlowgramAlignment
$ make
ghc --make -O2 FlowgramAli_4frame
[1 of 3] Compiling ADPCombinators   ( ADPCombinators.lhs, ADPCombinators.o )
[2 of 3] Compiling FlowgramUtils    ( FlowgramUtils.lhs, FlowgramUtils.o )
[3 of 3] Compiling Main             ( FlowgramAli_4frame.lhs, FlowgramAli_4frame.o )
Linking FlowgramAli_4frame ...
$ make install
cp FlowgramAli_4frame ../bin/

Provide the path to Denoiser in ~/.bashrc file

PYTHONPATH=$PYTHONPATH/:$HOME/software/Denoiser_0.9/
export PYTHONPATH

Define variable in Denoiser/settings.py

PROJECT_HOME = home + "/path/to/Denoiser_0.91/"

Follow the mini-tutorial in README to denoise the sequences
To get fasta file, quality file and sff text file from sff file

$ sffinfo  454Reads.sff > 454Reads.sff.txt
$ sffinfo -s 454Reads.sff > 454Reads.fasta
$ sffinfo -q 454Reads.sff > 454Reads.qual

Quality Filtering and barcode assignment

$ denoiser/split_libraries.py -f 454Reads.fasta -q 454Reads.qual -m barcode_to_sample_mapping.txt -w 50 -r -l 150 -L 350
where,
-f fasta file
-q quality file
-m barcode mapping file
-w enable sliding window test for quality scores

-r remove unassigned reads (deprecated)
-l minimum sequence length
-L maximum sequence length


Prefix clustering

$ preprocess.py -i 454Reads.sff.txt -f seqs.fna -o example_pp -s -v -p CATGCTGCCTCCCGTAGGAGT
where, 
-i SFF text file
-f quality filtered sequence file
-o output directory
-s squeeze, run-length encoding for prefix-filtering
-v verbose
-p the primer sequence

Flowgram clustering and Denoising

denoiser.py -i 454Reads.sff.txt -p example_pp -v -o example_denoised
where,
-i SFF text file
-p Output directory of prefix clustering

-v verbose
-o output directory




2. Pre-cluster
- It is part of the mothur package
- It a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors.


3. SeqNoise

No comments:

Post a Comment