Note of "Fundamentals of Molecular Molecular Evolution" by Graur and Li
Chapter 1: Genes, Genetic codes, and mutations
Purines: A,G (Adenine, Guanine). They are heavier.
Pyrimidines: T, C (Thymine, Cytosin)
A:T is a
weak bond (two hydrogen bonds) G:C is a
strong bond (three hydrogen bonds)
5' to 3' is "
downstream" direction. 3' to 5' is the "
upstream" direction.
The heavy strand of the dna is the one that contains more than 50% of the heavier bucleotides, the purines A and G.
Types of genes: (1) protein-coding, (2) RNA-specifying, and (3) untrasncribed.
In Eucaryotes: Wide distribution of
intron number, size and location.
exons peak at 150bps.
Extensive post-transcription modification (RNA editing): splicing,
<is the content of exon important, except the "donor and acceptor sites" and the TACTAAC box near the edges?> <if the rest is not important, why is is so big? to slow down transcription rate? Surely there could be less wateful mechanisms? more generally, why is there so much unused medium ("blank paper") in the genome? Does it enhance inventiveness? What would happen if we removed all the introns from the genome of a yeast?>
Pseudo genes: unprocessed (ch. 6) and processed (ch. 7). Ubiquitous everywhere, a few get trasncribed, and a handful are actually translated <so why are they considered pseudo-genes?>.
Amino Acids.
Proteins: amino terminal (N) and Carboxyl terminal (C).
Primary structure, secondary, tertiary, quarternary (subunits arrangement and contacts).
Genetic Codes:
Universal (more accurately, "standard") for nuclear genes and plastids: 61 sense codons, 3 stop codons.
A slightly different one for vertebarate mitochondrial genes.
Mapping is unambiguous except in rare cases involving termination codons.
synonymous vs.
nonsynonymous codons
codon family.
Usually AUG (Methionine) is the first AA in the protein, later removed.
From genetic code table: 2nd position is most significant, then 1st, then 3rd. This points to evolutionary history of the code. The genetic table (and other visualization) should be rearranged follow this order: (2,1,3). From Wobble theory, the 3rd position is flexible.
Mutations:
Point mutations vs. segmental mutations. Also:
substitutions,
recombinations,
deletions,
insertions,
inversions.
Substitutions: 4 possible transitions (A<-->G, C<-->T) and 8 possible transversions. Thought to arrise of base mispairing during DNA replication.
synonymous vs.
nonosynonymous mutation. Synonymous is also "
silent" unless e.g. it creates a new splicing site.
Nonsynonymous:
missense vs.
nonsense.
Recombination:
crossing over (reciprocal) vs.
gene conversion (non-reciprocal). Also
site-specific recombination, where a long segment replaces (splices in) a few nucleotides. This is how phages get into bacterial genomes -- really an insertion.
Deletion/insertions happen by several mechanisms, including unequal crossing over. Length follows a bimodal distribution.
Inversions: vast majority are very long (100s to 1000s nucleaotides).
Mutation rates:
Difficult to determine directly because rare. Can be estimated from pseudogenes: 3-5 X 10E-9 subs / site / year.
<Why per year rather than per duplication? How would a bacterium year compare to a mammal year?>
<In humans this comes to 9-15 mutations/year, or 135-375 mutations/generation.>
Mutation rate varies enormously with genomic region <defined by content or chromosom location?>: E.g. in microsatellites it's about 10E-3.
In mammalian nuclear DNA, G,C mutate more frequently than A,T.
In mammalian mitochondria: at least 10 higher than avg nuclear rate.
In animal nuclear DNA,
transitions/transversions rate is 2:1 (should be 1:2 if uniform). In animal mitochondria, its 15:1 to 20:1.
RNA viruses lack correction mechanisms, so their rate is several orders of magnitude higher, e.g. for flu and sarcoma virus, 10E-2/site/year <
what does it mean to measure virus mutation rate per year? without selection pressure? Read Yokoyama (1985).>
For Rous Sarcoma virus, ~10E-4 mutations/site/replication.
There are very few estimates of rates of other mutations.
Spatial distribution of mutations: "hotspots", e.g.
CpG ("CG"-->"TG"). In prokaryotes, "TT" is also a hotspot.
In bacteria, palindromes are more prone to mutate.
In eucaryotes, short tandem repeats are prone to deletions/insertions (probably slipped-strand mispairing).
Runs of purine-pyramidine dimers can adopt a left hand comformation (Z-DNA) which is prone to even-length deletion.
Some fierce debate on whether mutations are truly random wrt their effect on fitness.
Chapter 2 - Dynamics of Genes in Populations
Main Topic: Population genetics deals with genetic changes that occur within populations on the
gene level.
Basic problem: determine how the frequency of a mutant will change in time under the effect of various evolutionary force. The driving forces of allele frequency changes include
Natural Selection (directional, classical evolutionary study),
Random Genetic Drift (random, molecular level).
1.Natural selection: the
differential reproduction of genetically distinct individuals or genotypes within a popultaion. The differential reproduction is caused by different fitness. Fitness dynamics of alleles under different dominant situation vary, i.e. codominance, dominance, over/underdominance.
Consequence: directional selection.
Unlike morphological changes, many molecular changes have little effect on the phenotype, and consequently on fitness.
To study genetic changes in populations, there are two math models:
Deterministic models are less accurate but easier to develop;
Stochastic models are more realistic.
Absolute fitness (w): the individual's ability to survive and reproduce.
Relative Fitness: fitness in comparison to other genotypes in the population. More relevant b/c pop size is constrained by the carrying capacity of the environment.
Assuming: (1) fitness determined solely by genotype, and (2) contribution of the different loci to fitness is independent.
Most mutations are
deleterious, leading to
negative or purifying selection. Some are
neutral. Very rarely, some are
advantageous, leading to
advantageous or positive selection.
A deterministic model in a diploid: 2 alleles, genotypes A1A1, A1A2, A2A2, with relative fitnesses w11, w12, w22 respectively.
Assuming random mating, we get the
Hardy-Weinberg equilibrium on their frequencies:
p^2, 2pq, q^2.
Consider A2 as the mutant, with initial frequency q0. Then delta-q can be expressed as a fraction involving
p,q,w11,w12,w22.
Now
5 cases, depending on fitness values:
-
Codominance (1, 1+s, 1+2s) (aka genic selection): q grows like a sigmoid. This is a type of directional selection (q always grows).
<Why is growth slow when q is small? Book says it's because a small fraction of A2 is in A2A2 homozygots, which experience more selective advantage. But I disagree. Each allele A2 contributes equally regardless of which parent it came from. The sigmoid shape should also hold in the haploid case. Derive the formula! >
-
Dominance/A2 dominant (1, 1+s, 1+s): q grows like a sigmoid. <The book shows it growing faster than (1, 1+s, 1+2s). That doesn't make sense>.
-
Dominance/A2 recessive (1, 1, 1+s): q grows very, very slowly, remaining concave almost the whole time. --> selection is not very effective at reducing deleterious recessive alleles. --> It's almost impossible to get rid of them (Tay Sachs, cystic fibrosis).
-
OverDominance (1, 1+s, 1+t), s>t: Heterozygotes has highest fitness, resulting in a
stable equilibrium with q* = s/(2s-t), regardless of starting conditions,
-
UnderDominance (1, 1+s, 1+t), s<0, s<t: Heterozygote has the lowest fitness, resulting in
unstable equilibrium. Any deviation leads to the loss of one of the alleles.
2. Genetic Drift: the process of change in allele frequency due solely to
chance effects. An important factor of Genetic Drift is the random sampling of gametes in the process of reproduction, in which the number of gametes available in any generation is much larger than the number of adult individuals produced.
Consequence:
loss/extinction or
fixation.
Even with no selection, one allele will eventually fixate. Over many generations, the mean of A1 frequency remains constant at p0, and the
variance grows and asymptotes at p0*(1-p0), namely the binomial distribution.
<Each memeber of the allele population has an equal 1/2N chance of saturating the population.>
Gene Substitution: the process whereby a mutant allele completely replaces the predominant or wild type allele in a population. Issues involved in gene substitution:
fixation probability,
fixation time, and the
rate of gene substituion.
Consequence: a mutant allele replaces the wild type allele.
Effective population size N_e: the size of an idealized population that would have the same effect of random sampling on allele frequencies. Factors affecting it include limited reproductive periods, non-reproductive castes, and male-female ratio. For mosquitos in Kenya, N_e=2000. For humans, N_e ~= N/3. If only 1 male is involved in reproduction, N_e=4 regardless of what N is!
Long term N_e is the harmonic mean of the generations, and is therefore dominated by bottlenecks. For humans in last 2 million years, N_e=10,000.
Fixation probability: depends on initial frequency q, selective advantage s, and effective population size N_e. Under genic (=co-dominant) selection (1, 1+s, 1+2s), there is a simple exponential formula, which, as s goes to 0, becomes Prob(fixation)=q, the initial allele frequency. Therefore,
for a new neutral mutant, Prob(fixate)=1/2N. For new advantageous mutant (assuming N_e=N large and s small), Prob(fixate)=2s. As the population size increases, fixation probability for advantagous mutations remain 2s <it just needs to "take">, for neutral ones becomes smaller (1/2N), and for deleterious ones become exponentially small.
Fixation time: depends on initial frequency q, selective advantage s, and population size N <? the example that follows uses N_e>. The mean
conditional fixation time (condition on knowing that fixation will eventually occur) is 4N generations for neutral mutations. For advantageous ones it's much less: (2/s)*ln(2N) generations -- the same as for disadvantageous with -s ! So, advantageous alleles are rapidly lost or fixated, whereas neutral ones change frequency slowly, and are reposible for a disproportionately large fraction of transient polymorphism.
Rate K of Gene Substitution (# fixations per unit time): suppose mutations occur at a rate of u per gene <rather, per locus?> per generation. Then for neutral mutations,
K=u (larger populations give rise to more mutations, but each is less likely to fixate). For advantageous mutations and genic selection,
K=4Nsu -- fixation rate grows with the population size!
Genetic Polymorphism: usually defined pragmatically and arbitrarily as "highest allele is less than 99%".
Single-locus expected heterozygosity, h=1-\sum_i Pi^2, is the probability that 2 randomly selected alleles are different. The average of h over all loci studied is the
mean expected heterozygosity. When dealing with DNA sequences, more appropriate is the
nucleotide diversity, which is the pairwise average of the % difference in nucleotides.
Evolutionary Paradigms: Mutationists explain things mostly in terms of mutational input and random genetic drift.
Neutralists explain in terms of mutation, random genetic drift, plus purifying selection.
Selectionists emphasize the effect of advantageous and balancing selection.
Pan-selectionism (a brand of
new-Darwinism which explains everything in terms of natural (=positive) selection, and ignores mutations and drift) claims that gene substitution is caused by advantageous selection, and polymorphism is maintained by balancing selection. Namely, these are two distinct processes. The neutral theory claims that, at the molecular level, most changes and cross-species variability are due to random drift (this is true if |s| < 1/2N_e ).
"The
essence of the dispute between neutralists and selectionists concerns the distribution of fitness values of mutant alleles." Both agree that most new mutations are deleterious and quickly disappear. But among the rest, selectionists believe that very few mutations are selectively neutral, whereas neutralists believe the majority are.
"Fixed differences" between two species represent fixation events since they split. According to neutralists, polymorphism represents a transient picture, on the way to random fixation or loss, and therefore the ratio of polymorphisms to fixations should not depend on synonymity. If it does, and fixations are more numerous in non-synonymous loci, this means many of them are advantageous.
Concepts:
population genetics
locus
allele
allele frequency / gene frequency
gene pool
deterministic / stochastic model
natural selection
genotype
fitness
absolute / relative fitness
deleterious, negative / purifying selection
neutral, no selection
advantageous, positive / advantageous selection
Hardy-Weinberg equilibrium
selective advantage / disadvantage / neutrality
directional selection
codominance / genic selection, dominance, overdominance, underdominance
gene substitution
fixation probability
fixation time
rate of gene substitution
monomorphic, polymorphic
heterozygosity / gene diversity
single-locus expected heterozygosity
neo-Darwinism, pan-selectionism,neutral theory of molecular evolution
Chapter 3 - Evolutionary Change in Nucleotide Sequences
Main Topic: evolutionary change on the
nucleotide level.
Nucleotide substitution model
The model of Nucleotide substitution in a DNA sequence.
-
Jukes and Cantor's one-parameter model: complete symmetry among the 4 bases. Pr(change)=\alpha, which leads to a simple difference or differential equation, where the frequency of each basis exponentially asymptoting to 1/4.
Pii(t) = 1/4 + 3/4*exp(-4at)
Pij(t) = 1/4 - 1/4*exp(-4at)
-
Kimura's two-parameter model: Pr(transition)=\alpha, Pr(transversion)=\beta. This leads to a set of 4 differential equations, with Pr_t(remain_same), Pr_t(differs_by_transition), Pr_t(differs_by_transversion) all exponentially asymptoting to 1/4.
Y(t) = 1/4 + 1/4*exp(-4bt) - 1/2*exp(-2(a+b)t) (
transition)
Z(t) = 1/4 - 1/4*exp(-4bt) (
transversion)
Number of nucleotide substitution between 2 DNA seqs
- noncoding sequences
indels are ignored for now. Multiple hits become a problem when divergence is substantial. Using the models above, we can relate the probability (or, equivalently, expected proportion) of identical nucleotides to the step-probability of mutation (\alpha) and the time since divergence (t). Since t is usually not known, and thus we can't estimate \alpha, instead we estimate the
number of substitutions per site since divergence, K, which is 2(3\alpha t) for the Juke-Cantor model, (and is to be constrasted with the (observable) number of differences per site). So we get an estimate of K from the observable proportion of disagreement, and can also derive the variance. For Kimura's model, the estimate of K involves the proportion of transitional and transversional differences between the two sequences. Variance goes down as
1/L (the sequence length).
- coding sequences
We want to distinguish
synonymous (n) and
nonsynonymous (ns) substitutions. To compute the per-site substitution rate, we need to estimate the denominators, namely the
number of synonymous (N_s) and
nonsynonymous (N_a) sites. But these are fractional, and due to multiple hits they are also latent.
Substituion schemes with more than two parameters
Can go up to 4x3=12 parameters. But "for the 6-parameter model, ... one must assume that the common ancestral sequence was at equilibrium wrt nucleotide frequencies" <why?>. More importantly, for more parameters the estimates become unstable, and "inapplicable" <involving log of zero or negative>, especially with high divergence rate. E.g. even for L=3000, if K=2.0 (very high!), the 4-parm model is "inapplicable" in 25% of cases. <Is there room for improvement here using shrinkage/backoff and/or a Bayesian framework?>.
Violation of assumptions: The models above assume a stationary, site-independent, context-independent, history-independent substitution matrix. In reality all of these are violated at different times. Some corrections were developed to relax some of these assumptions.
Number of amino acid replacements between two proteins
Alignment of nucleotide and amino acid sequences
- Dot matrix method
- Distance and similarity methods
- Alignment algorithms: Needleman-Wunsch algo, Multiple alignments,
Concepts:
one-parameter model
two-parameter model
degree of divergence / Hamming distance
multiple substitution / multiple hits
synonymous / nonsynonymous
weighted / unweighted method
nondegenerate, twofold degenerate, fourfold degenerate
gap penalty
Chapter 4 - Rates and Patterns of Nucleotide Substitution
Main Topic: the rates and patterns of nucleotide substitution and the factors affecting them.
Rates of nucleotide substitution: The rates of the coding reagions is slower than the one of noncoding regions. The synonymous rate is faster than nonsynonymous rate, but varies less from gene to gene.
Causes of variation in substitution rates:
- Functional constraints
Synonymous vs nonsynonymous rates
different gene regions
different genes
- Positive selection
- Mutational input
Patterns of substitution and replacement:
- Pattern of spontaneous mutation
- Pattern of substitution in human mitochondrial DNA
- Patterns of amino acid replacement
Biased usage of codons
Molecular clocks
-
relative rate test
- Margoliash, Sarich, and Wilson's test
- Tajima's 1D method
- local clocks
- Nearly equal rates in mice and rats
- lower rates in humans than in African apes and monkeys
- higher rates in rodents than in primates
- evaluation of the molecular clock hypothesis
- rates of substitution in organelle DNA
- rates of substitution in organelle RNA viruses
Concepts:
similarity profile
relaxation of selection
male-driven evolution
pattern of nucleotide substitution
pattern of spontaneous mutation
molecular clock
relative rate test
replication-dependent / replication-independent factors
generation time effect
living fossils
phyletic gradualism
punctuated equilibria hypothesis
Roni's questions about evolutionary explanations:
<evolution in response to environmental pressure: for it to be fast (on the order of tens to thousands of generations), it is restricted to changes in the frequencies of existing alleles, as opposed to creation of new meaningfull mutations, which should take much, much longer (and at that point the species will become extinct, and/or the environment will change).
Example: the introduction of dairy farming, and the consumpotion of milk. The allele that de-suppressed lactase generation must have already existed, no? How likely is it to be invented in a few hundred generations? On the other hand, if it was invented in the more distant past, what kept it from mutating away? What does it take to keep an allele around?>
<If a typical species lasts for some 10MY, for mammals that's about 5-10 million generations. P. 56 calculates that, for a typical mammalian species with effective pop size = 10^6 and mean generation time of 2 years, a mutation with a 1% selective advantage, if fixed, will take an average of 5,800 years, or ~3,000 generations. This allows for just over a thousand consecutive fixations. However, fixations don't need to be consecutive. What limit can we derive from this that can be compared with the observed # of differences between sibling species? >
<Why don't eukaryotic genes overlap? If they do occasionally, why don't they overlap much more often? If introns are not subject to evolutionary pressure, and the coding regions are a small fraction of the full gene length, then why not overlapping exons? Could this point to duplicate-and-mutate playing a very big role, and "out-of-the-blue" appearance of genes being very, very rare?>
<Is there room for improvement in estimating K (ch. 3), using shrinkage/backoff and/or a Bayesian framework?>.
Chapter 5 - Molecular Phylogenetics
Molecular phylogenetics: the study of evolutionary relationships among organisms by using molecular data, e.g. DNA and protein sequenses, insertions of transposable elements, or other molecular markers.
The study of molecular phylogeny has a long history, and is greatly impacted by the rapid accumulation of DNA sequence data since the late 1970s.
Advantages of molecular data in phylogenetic studies: 1) DNA and protein sequences are heritable entities. 2) description of molecular data are unambiguous. 3) Molecular traits evolve in a more regular manner. 4) molecular data are more amenable to mathematical and statistical theories. 5) easy to assess homology. 6) capable to assess evolutionary relationships among distantly related organisms. 7) abundant for study.
Terminology of Phylogenetic trees:
-
Phylogenetic tree: a graph composed of nodes and branches, in which only one branch connects
any two adjacent nodes. - Topology: the branching pattern of a tree.
-
terminal/internal nodes, external/internal branches
-
OUTs/HTUs(operational taxonomic units/hypothetical taxonomic units)
-
bifurcating/multifurcating nodes: In evolutionary studies it's assumed the process of speciation is usually a binary one.
-
rooted/unrooted tree:
-
scaled/unscaled tree: the length of branch is/isn't proportional to the number of changes.
-
Newick format: a kind of representation of trees in a linear form of a series of nested parentheses.
Number of possible phylogenetic trees:
-
Bifurcating rooted tree (Nr) for n OTUs: Nr(n) = (2n-3)!/[ 2^(n-2)*(n-2)! ]
-
Bifurcating unrooted tree (Nu) for n OTUs: Nu(n) = (2n-5)!/[ 2^(n-3)*(n-3)! ]
-
Nr(n-1) = Nu(n)
- The number of possible phylogenetic trees is extremely large when n is large, so it is usually very difficult to identify the true one.
-
True tree: unique
- I
nferred trees: obtained by a certain set of data and a certain method of tree reconstruction.
Gene trees & Species trees: they might differ with each other.
Taxa and clades
Types of data: character data, distance data
Methods of tree reconstruction: Inferring a phylogeny is an estimation procedure using a set of OTUs to infer the whole phylogenetic history.
Two steps: 1) definition of an optimality criterion 2) design of specific algorithms to compute the value of the objective function.
Cladistics(Cladogram) vs. Phenetics(Phenogram)
Distance matrix vs. character state approaches
-
Distance matrix methods
1)
UPGMA: employs a sequential clustering algorithm (assuming rates of evolution are constant among different lineages).
Advantages: high speed of computation.
Disadvantages: works well only if the rate constancy holds, and due to the speedy algorithms available, it's rarely used, except for pedagogic purposes.
2)
Transformed distance method: uses an outgroup as a reference in order to make correction. To determine outgroup: for rooted tree, first construct using UPGMA then reconstruct each side by referring the other side as the outgroup; for unrooted tree, each node can be used as a reference.
3)
neighbors-relation method(Sattath and Tversky): it's based on neighborliness approach, which uses the four-point condition to construct the tree. Sattath and Tversky extends the methods to more than four OTUs.
4)
neighbors-joining method(Saitou and Nei): it aims to find the shortest tree by sequentially finding neighbors that minimize the total length of the tree.
2)3)4) Advantages: fast, can be used on enormous numbers of OTUs; free of systematic error if the distance data satisfy the four-point condition. Works very well when the distances are small and the sequences used are long and even under nonconstant rates of evolution.
Disadvantages: the performance depends on the method used to transform the raw data into distances. The performance maybe compromised if the mothods used do not compensate adequately for multiple substitutions at a site, or if the sequences are short, or if the distances are large or if the rate varies greatly among sites.
-
Maximum parsimony methods: to identify a topology that requires the smallest number of evolutionary changes. The principle abides which the best hypothesis is the one requiring the smallest number of assumptions. The resulting tree is called maximum parsimony tree, and there could be several equally parsimonious trees. To construct a maximum parsimonious tree, we can just look at informative sites and ignore uninformative sites, since the number of changes for uninformative sites are the same for all the trees. The total number of substitutions at both informative and uninformative sites in a
tree is the tree length, and it's minimized in maximize parsimonious trees. If all the different nucleotide substitutions were given equal weight, it's called unweighted parsimony, otherwise it's weighted parsimony. Transversion parsimony is when transitions are completely ignored.
Searching for the maximum parsimony tree, 1) exhaustive search. 2) branch-and-bound method: due to the facts that adding branches to a tree can only increase its length, by dispensing with the evaluation of all the descendant trees from all the partial trees that are longer than L (the bound) we may greatly reduce the total number trees to be considered. If the OTUs number is greater than 20, we need to use heuristic searches, in which similar trees will be constructed to compare. There are several methods of branch swapping or rearrangement that can be used to generate topologically similar trees, e.g. subtree pruning and regrafting.
Advantages: no explicit assumptions except that a tree that requires fewer substitutions is better. Works well when the degree of divergence between sequences is small (homoplasy is rare). Efficient if the number of informative sites is large.
Disadvantages: Time-consuming because it needs to compare all possible trees, infeasible if the number of OTUs is large and the sequence is long, and heuristic methods are not guarantee obtaining the maximum parsimony trees. May perform poorly whenever some brances of the tree are much longer than the other branches - l
ong-branch attraction or
Felsenstein zone. Performs poorly when homoplasy is high (when degree of divergence is large, or if transitions occur more often than transversions). Inefficient if the number of informative sites is small.
-
Maximum likelihood methods: to maximize the
likelihood L, which is the probability of observing the data under a given tree and a specified model of character state changes. For a particular tree, the likelihood is the product of the individual likelihoods for all n sites, and the likelihood of each site equals the sums of the probabilities of every possible reconstruction of ancestral states at internal nodes.
Advantages: contain 'full' information (compared with parsimony method) since it uses the character state information at all sites.
Disadvantages: Time-consuming because it needs to consider all alternative trees, infeasible if the number of OTUs is large and the sequence is long, and heuristic methods are not guarantee obtaining the maximum likelihood trees. Requires explicit assumptions (rate and pattern of nucleotide substitution). Performs poorly if the stochastic model used is unrealistic and if some sequences are highly divergent.
-
Rooting unrooted trees: 1) Using an outgroup. The outgroup is believed to branch off earlier than the taxa under study, and it should not be too distant. 2) without an outgroup. By assuming that the rate of evolution has been approximately uniform over all the branches, we put the root at the midpoint of the longest pathway between two OTUs.
-
Estimating branch lengths: UPGMA and maximum likelihood have the branch lengths estimated; maximum parsimony method has not.
-
Estimating species divergence times
-
Topological comparisons: Penny and Hendy's topological distance is used to measure dissimilarity between two tree topological distance. When there are several trees generated rather than one unique phylogeny, a consensus tree is drew used strict consensus or majority-rule consensus trees.
-
Assessing tree reliability: bootstrap is used to interpret the confidence level of a tree. To test two competing trees, Kishino and Hasegawa devised a parametric test under the assumptions that all nucleotide sites are independent and equivalent.
-
Strategies to minimize error: 1) use large amounts of data. 2) use sequences that evolve at an appropriate rate. 3) use more realistic models or more suitable methods. 4) sometimes examining the assumption of independent evolution among sites. 5) all the region studied have similar rates of substitutions. 6) small distances among OTUs. 7) identify 'misinformative' characters.
Molecular phylogenetic examples
The universal phylogeny