Phylogeny and Evolution of RNA Structure
Darwin’s conviction that all living beings on Earth are related and the graph of relatedness is tree-shaped has been essentially confirmed by phylogenetic reconstruction first from morphology and later from data obtained by molecular sequencing. Limitations of the phylogenetic tree concept were recognized as more and more sequence information became available. The other path-breaking idea of Darwin, natural selection of fitter variants in populations, is cast into simple mathematical form and extended to mutation-selection dynamics. In this form the theory is directly applicable to RNA evolution in vitro and to virus...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

De Novo Discovery of Structured ncRNA Motifs in Genomic Sequences
De novo discovery of “motifs” capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding ge...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

RNA Structural Alignments, Part II: Non-Sankoff Approaches for Structural Alignments
In structural alignments of RNA sequences, the computational cost of Sankoff algorithm, which simultaneously optimizes the score of the common secondary structure and the score of the alignment, is too high for long sequences (O(L 6) time for two sequences of length L). In this chapter, we introduce the methods that predict the structures and the alignment separately to avoid the heavy computations in Sankoff algorithm. In those methods, neither of those two prediction processes is independent, but each of them utilizes the information of the other process. The first process typically includes prediction of b...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

RNA Structural Alignments, Part I: Sankoff-Based Approaches for Structural Alignments
Simultaneous alignment and secondary structure prediction of RNA sequences is often referred to as “RNA structural alignment.” A class of the methods for structural alignment is based on the principles proposed by Sankoff more than 25 years ago. The Sankoff algorithm simultaneously folds and aligns two or more sequences. The advantage of this algorithm over those that separate the folding and alignment steps is that it makes better predictions. The disadvantage is that it is slower and requires more computer memory to run. The amount of computational resources needed to run the Sankoff algorithm is so high that...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

Introduction to RNA Secondary Structure Comparison
Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on ...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

Abstract Shape Analysis of RNA
Abstract shape analysis abstract shape analysis is a method to learn more about the complete Boltzmann ensemble of the secondary structures of a single RNA molecule. Abstract shapes classify competing secondary structures into classes that are defined by their arrangement of helices. It allows us to compute, in addition to the structure of minimal free energy, a set of structures that represents relevant and interesting structural alternatives. Furthermore, it allows to compute probabilities of all structures within a shape class. This allows to ensure that our representative subset covers t...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

Class-Specific Prediction of ncRNAs
Many RNA families, i.e., groups of homologous RNA genes, belong to RNA classes, such as tRNAs, snoRNAs, or microRNAs, that are characterized by common sequence motifs and/or common secondary structure features. The detection of new members of RNA classes, as well as the comprehensive annotation of genomes with members of RNA classes is a challenging task that goes beyond simple homology search. Computational methods addressing this problem typically use a three-tiered approach: In the first step an efficient and sensitive filter is employed. In the second step the candidate set is narrowed down using computationally expens...
Source: Springer protocols feed by Bioinformatics - December 4, 2013 Category: Bioinformatics Source Type: news

Using PLINK for Genome-Wide Association Studies (GWAS) and Data Analysis
Within this chapter we introduce the basic PLINK functions for reading in data, applying quality control, and running association analyses. Three worked examples are provided to illustrate: data management and assessment of population substructure, association analysis of a quantitative trait, and qualitative or case–control association analyses. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Statistical Analysis of Genomic Data
In this chapter we describe methods for statistical analysis of GWAS data with the goal of quantifying evidence for genomic effects associated with trait variation, while avoiding spurious associations due to evidence not being well quantified or due to population structure. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)
This chapter provides an overview of statistical methods for genome-wide association studies (GWAS) in animals, plants, and humans. The simplest form of GWAS, a marker-by-marker analysis, is illustrated with a simple example. The problem of selecting a significance threshold that accounts for the large amount of multiple testing that occurs in GWAS is discussed. Population structure causes false positive associations in GWAS if not accounted for, and methods to deal with this are presented. Methodology for more complex models for GWAS, including haplotype-based approaches, accounting for identical by descent versus identic...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Quality Control for Genome-Wide Association Studies
This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing with a fully filtered dataset ready for downstream analysis. The emphasis is on automation of data storage, filtering, and manipulation to ensure data integrity throughput the process and on how to extract a global summary from these high dimensional datasets to allow better-informed downstream analytical decisions. All examples will be ru...
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Managing Large SNP Datasets with SNPpy
Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Designing a GWAS: Power, Sample Size, and Data Structure
In this chapter we describe a novel Bayesian approach to designing GWAS studies with the goal of ensuring robust detection of effects of genomic loci associated with trait variation. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Descriptive Statistics of Data: Understanding the Data Set and Phenotypes of Interest
A good understanding of the design of an experiment and the observational data that have been collected as part of the experiment is a key pre-requisite for correct and meaningful preparation of field data for further analysis. In this chapter, I provide a guideline of how an understanding of the field data can be gained, preparation steps that arise as a consequence of the experimental or data structure, and how to fit a linear model to extract data for further analysis. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news

Genomic Selection in Animal Breeding Programs
Genomic selection can have a major impact on animal breeding programs, especially where traits that are important in the breeding objective are hard to select for otherwise. Genomic selection provides more accurate estimates for breeding value earlier in the life of breeding animals, giving more selection accuracy and allowing lower generation intervals. From sheep to dairy cattle, the rates of genetic improvement could increase from 20 to 100 % and hard-to-measure traits can be improved more effectively. (Source: Springer protocols feed by Bioinformatics)
Source: Springer protocols feed by Bioinformatics - January 1, 2013 Category: Bioinformatics Source Type: news