Download source code from
http://bmf2.colorado.edu/unifrac/about.psp
$ unzip unifrac.zip
$ cd unifrac.zip
read README.txt
Requirements
- Python
- Python module Numeric
To check if Numeric module is present
$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric
>>>
- if the above return error, then
$ sudo apt-get install python-numeric-ext-dbg
- i tried installing only the python-numeric but it gave a error saying that MLab module is missing.
- so i used the above installation set
Steps
>>> from set import Set
- there is some deprecated code, so use this line
>>> from tree_comparison_api import TreeAnalysis
- To start with the analysis
>>> a=TreeAnalysis()
- create object a
>>> print a.__doc__
- lists all the methods along with the parameters required.
session for doing an analysis on a tree
Class: TreeAnalysis(interactive=True)
Usage: session = TreeAnalysis()
Starts a new session called 'session'.
Interactive indicates whether the session will be run on the command
line (True) or from a script (False). By default is True.
Properties:
Tree: tree on which all called stats functions are run
LastSession: saves the current TreeAnalysis object whenever
the tree is changed
DataInfo: maps the name of a node (node.Data) to the node object
Output: records the results of analyses done during a session as a list
SessionLog: records the steps performed during a session as a list
Methods:
To get detailed info on any method type:
print TreeAnalysis.method_name.__doc__
***Loading a Tree:***
loadTreeFromFile(self, file_path, format, tree_name=None)
loads a tree from a Nexus or Newick from Arb formatted file
loadTreeFromString(self, input)
loads a tree from a string or list of strings
***Setting the Branch Lengths:***
setBlFromFile(self, file_path, format='NexLog')
sets node BranchLengths from info in a file
setBlToValue(self, value=1.0)
sets the BranchLengths property of all nodes to value
***Loading Environment Information***
loadEnvs(self, format, file_path=None):
sets Envs property of nodes
***Modifying Tree Contents (taxa represented by tree)***
pruneTree(self):
removes terminal descendants that have no environment assigned
removeNodeByName(name):
removes a node from the tree identified by name (Data property)
removeNodeByLCA(self, first_name, second_name):
removes a node identified as Last Common Ancestor of 2 nodes
isolateNodeByName(self, name):
sets the Tree to a node identified by name
isolateNodeByLCA(self, first_name, second_name):
sets the Tree to the Last Common Ancestor of 2 nodes
restoreLastSession(self):
sets the current TreeAnalysis session to LastSession
***Modify Node Names (Data property of nodes in a tree)***
nameUnnamedNodes(self):
assigns an arbitrary name to unnamed nodes
addEnvsToName(self):
appends the envs to the node name (Data property of node)
***Statistical analysis***
phylogeneticTestP(self, p_output='Tree', pop_size):
generates p-values for the Tree using the Phylogenetic (P) test
uniFracP(self, p_output='Tree', pop_size=1000, Weight=False):
generates p-values for the Tree using the UniFrac metric
makeEnvDistanceMatrix(self, Weight=False, pad=None, norm=False):
generates a Distance Matrix for environments in Tree using UniFrac
clusterEnvs(self, Weight=False, norm=False):
uses UPGMA to cluster the environments in a tree based on UniFrac
UniFracPCA(self, Weight=False, norm=False):
performs principal coordinates analysis on self.Tree using UniFrac
PD_rarefaction(self, file_path_base='PD_rarefaction_output', env_list=None
num_reps = 50, stride=1)
Creates PD rarefaction curves
G_rarefaction(self, file_path_base='G_rarefaction_output', env_list=None
num_reps = 50, stride=1)
Creates G rarefaction curves
Class: TreeAnalysis(interactive=True)
Usage: session = TreeAnalysis()
Starts a new session called 'session'.
Interactive indicates whether the session will be run on the command
line (True) or from a script (False). By default is True.
Properties:
Tree: tree on which all called stats functions are run
LastSession: saves the current TreeAnalysis object whenever
the tree is changed
DataInfo: maps the name of a node (node.Data) to the node object
Output: records the results of analyses done during a session as a list
SessionLog: records the steps performed during a session as a list
Methods:
To get detailed info on any method type:
print TreeAnalysis.method_name.__doc__
***Loading a Tree:***
loadTreeFromFile(self, file_path, format, tree_name=None)
loads a tree from a Nexus or Newick from Arb formatted file
loadTreeFromString(self, input)
loads a tree from a string or list of strings
***Setting the Branch Lengths:***
setBlFromFile(self, file_path, format='NexLog')
sets node BranchLengths from info in a file
setBlToValue(self, value=1.0)
sets the BranchLengths property of all nodes to value
***Loading Environment Information***
loadEnvs(self, format, file_path=None):
sets Envs property of nodes
***Modifying Tree Contents (taxa represented by tree)***
pruneTree(self):
removes terminal descendants that have no environment assigned
removeNodeByName(name):
removes a node from the tree identified by name (Data property)
removeNodeByLCA(self, first_name, second_name):
removes a node identified as Last Common Ancestor of 2 nodes
isolateNodeByName(self, name):
sets the Tree to a node identified by name
isolateNodeByLCA(self, first_name, second_name):
sets the Tree to the Last Common Ancestor of 2 nodes
restoreLastSession(self):
sets the current TreeAnalysis session to LastSession
***Modify Node Names (Data property of nodes in a tree)***
nameUnnamedNodes(self):
assigns an arbitrary name to unnamed nodes
addEnvsToName(self):
appends the envs to the node name (Data property of node)
***Statistical analysis***
phylogeneticTestP(self, p_output='Tree', pop_size):
generates p-values for the Tree using the Phylogenetic (P) test
uniFracP(self, p_output='Tree', pop_size=1000, Weight=False):
generates p-values for the Tree using the UniFrac metric
makeEnvDistanceMatrix(self, Weight=False, pad=None, norm=False):
generates a Distance Matrix for environments in Tree using UniFrac
clusterEnvs(self, Weight=False, norm=False):
uses UPGMA to cluster the environments in a tree based on UniFrac
UniFracPCA(self, Weight=False, norm=False):
performs principal coordinates analysis on self.Tree using UniFrac
PD_rarefaction(self, file_path_base='PD_rarefaction_output', env_list=None
num_reps = 50, stride=1)
Creates PD rarefaction curves
G_rarefaction(self, file_path_base='G_rarefaction_output', env_list=None
num_reps = 50, stride=1)
Creates G rarefaction curves
The exact python code depends on the type of UniFrac analyses that you want to do, ie;.e UPGMA clustering, PCoA, Pairwise significance testing, whether you want to do a weighted or unweighted analysis, etc. It also depends on things like the format of your tree file (it can be in Newick or Nexus format). An example of a list of python commands that one might type in to do an analysis, however, is below:
#load the necessary code
>>> from tree_comparison_api import TreeAnalysis
#make an UniFrac session object
>>> a = TreeAnalysis()
#load a tree file in Newick format
>>> a.loadTreeFromFile('path_to_tree_file.tree', 'NwA')
#load Environment information that has no abundance information
>>> a.loadEnvs('TabDel', 'env_file.txt')
#run unweighted UniFrac pairwise significance test
>>> a.uniFracP(p_output='Pairwise', Weight=False, pop_size=1000)