Titre : |
Initial assessment of character sets from five nuclear gene sequences in animals |
Type de document : |
texte imprimé |
Auteurs : |
Timothy P. Friedlander, Auteur ; Jerome C. Regier, Auteur ; Charles Mitter, Auteur |
Année de publication : |
1996 |
Importance : |
p 301-320 |
Langues : |
Anglais (eng) |
Catégories : |
SCIENCES DE LA VIE
|
Mots-clés : |
BIODIVERSITE GENE NUCLEAIRE SEQUENCE DE GENE PHYLOGENIE ANALYSE PHYLOGENETIQUE |
Résumé : |
There is a growing agreement that, because any single sequence alone may be misleading, molecular systematic inferences can rest securely only on concordant results from multiple independent sequences. For nucleotides sequences, most inferences in animals have been based on either the mitochondrial genome or the nuclear ribosomal gene family. The analyses described herein are directed at documenting the informativeness of additional nuclear gene sequences. The selection of phylogenetically informative sequences from the nuclear genome is not trivial and current methods of inference deal most effectively with point substitutions and simple insertion/deletion events within independently evolving orthologous sequences. Problems such as distinguishing orthologs from paralogs or detecting nonindependent evolution are often intractable. Even data on point substitutions are hard to interpret when divergence is great and multiple hits are common. For these reasons, most of the nuclear genome probably is not useful for any given systematic question. For example, nontranscribed regions, which constitute the vast majority of the nuclear genome, are unlikely to be informative about higher-level taxonomic relationships. The difficulty of a priori selection of likely sequences currently limits the exploitation of nuclear genes for phylogenetic studies.
In a previous study, we delimited 14 protein-encoding nuclear genes whose sequences are likely to contain interpretable phylogenetic information, largely in the form of point substitutions. The criteria for their selection included appropriate levels of sequence conservation for deep taxonomic splits as inferred from comparisons of published sequences and desirable features of gene structure. In particular, these genes are present in just one or a few copies, simplifying identification of orthologous comparisons. They each contain over 1000 basepairs of fairly uniformly evolving coding regions, hence many potential characters. They are free of internal repetitive elements or obvious nucleotide bias that would complicate sequence alignments and analysis.
In the current study, we have sought to gauge the phylogenetic information content of five promising genes for animal phylogenetics, they were examined for phylogenetic informativeness in three ways : first, the sequences were mapped onto accepted phylogenies strongly supported by previous evidence, this permitted the assessment of character support for each clade and the temporal partitioning of characters by their times of divergence. Mapping also permitted identification of homoplasious characters.
Secondly, the sequences were analyzed by parsimony and the resulting minimum-length trees compared with the accepted phylogeny. Character sets consisting of amino acids of total nucleotides and of nucleotides from each of the three codon positions were analyzed separately. Characters were partitioned in this manner because nucleotides within protein-encoding sequences evolve at different rates and thus are likely to be maximally informative at different taxonomic levels. Such tests provide the strongest available validation of the phylogenetic utility of a candidte gene.
A third, less direct but readily obtainable, predictor of phylogenetic informativeness is pairwise sequence divergence. Over the lowest part of its range, pairwise divergence should be related to the number of informative characters. This model,in conjunction with empirical saturation plots, provides heuristic guidelines for the onset of saturation. These guidelines, in combination with the phylogenetic concordance studies, permit a first estimate of the taxonomic level over which each character set in the five genes will be maximally informative.
|
Numéro du document : |
A/BIO |
Niveau Bibliographique : |
2 |
Bull1 (Theme principale) : |
BIOLOGIE |
Bull2 (Theme secondaire) : |
BIOLOGIE GENERALE |
Initial assessment of character sets from five nuclear gene sequences in animals [texte imprimé] / Timothy P. Friedlander, Auteur ; Jerome C. Regier, Auteur ; Charles Mitter, Auteur . - 1996 . - p 301-320. Langues : Anglais ( eng)
Catégories : |
SCIENCES DE LA VIE
|
Mots-clés : |
BIODIVERSITE GENE NUCLEAIRE SEQUENCE DE GENE PHYLOGENIE ANALYSE PHYLOGENETIQUE |
Résumé : |
There is a growing agreement that, because any single sequence alone may be misleading, molecular systematic inferences can rest securely only on concordant results from multiple independent sequences. For nucleotides sequences, most inferences in animals have been based on either the mitochondrial genome or the nuclear ribosomal gene family. The analyses described herein are directed at documenting the informativeness of additional nuclear gene sequences. The selection of phylogenetically informative sequences from the nuclear genome is not trivial and current methods of inference deal most effectively with point substitutions and simple insertion/deletion events within independently evolving orthologous sequences. Problems such as distinguishing orthologs from paralogs or detecting nonindependent evolution are often intractable. Even data on point substitutions are hard to interpret when divergence is great and multiple hits are common. For these reasons, most of the nuclear genome probably is not useful for any given systematic question. For example, nontranscribed regions, which constitute the vast majority of the nuclear genome, are unlikely to be informative about higher-level taxonomic relationships. The difficulty of a priori selection of likely sequences currently limits the exploitation of nuclear genes for phylogenetic studies.
In a previous study, we delimited 14 protein-encoding nuclear genes whose sequences are likely to contain interpretable phylogenetic information, largely in the form of point substitutions. The criteria for their selection included appropriate levels of sequence conservation for deep taxonomic splits as inferred from comparisons of published sequences and desirable features of gene structure. In particular, these genes are present in just one or a few copies, simplifying identification of orthologous comparisons. They each contain over 1000 basepairs of fairly uniformly evolving coding regions, hence many potential characters. They are free of internal repetitive elements or obvious nucleotide bias that would complicate sequence alignments and analysis.
In the current study, we have sought to gauge the phylogenetic information content of five promising genes for animal phylogenetics, they were examined for phylogenetic informativeness in three ways : first, the sequences were mapped onto accepted phylogenies strongly supported by previous evidence, this permitted the assessment of character support for each clade and the temporal partitioning of characters by their times of divergence. Mapping also permitted identification of homoplasious characters.
Secondly, the sequences were analyzed by parsimony and the resulting minimum-length trees compared with the accepted phylogeny. Character sets consisting of amino acids of total nucleotides and of nucleotides from each of the three codon positions were analyzed separately. Characters were partitioned in this manner because nucleotides within protein-encoding sequences evolve at different rates and thus are likely to be maximally informative at different taxonomic levels. Such tests provide the strongest available validation of the phylogenetic utility of a candidte gene.
A third, less direct but readily obtainable, predictor of phylogenetic informativeness is pairwise sequence divergence. Over the lowest part of its range, pairwise divergence should be related to the number of informative characters. This model,in conjunction with empirical saturation plots, provides heuristic guidelines for the onset of saturation. These guidelines, in combination with the phylogenetic concordance studies, permit a first estimate of the taxonomic level over which each character set in the five genes will be maximally informative.
|
Numéro du document : |
A/BIO |
Niveau Bibliographique : |
2 |
Bull1 (Theme principale) : |
BIOLOGIE |
Bull2 (Theme secondaire) : |
BIOLOGIE GENERALE |
|  |