DNA Motif Databases and Their Uses.
Authors: Stormo GD Abstract Transcription factors (TFs) recognize and bind to specific DNA sequences. The specificity of a TF is usually represented as a position weight matrix (PWM). Several databases of DNA motifs exist and are used in biological research to address important biological questions. This overview describes PWMs and some of the most commonly used motif databases, as well as a few of their common applications. © 2015 by John Wiley & Sons, Inc. PMID: 26334922 [PubMed - in process] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - September 6, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Protein Function Prediction: Problems and Pitfalls.
Authors: Pearson WR Abstract The characterization of new genomes based on their protein sets has been revolutionized by new sequencing technologies, but biologists seeking to exploit new sequence information are often frustrated by the challenges associated with accurately assigning biological functions to newly identified proteins. Here, we highlight some of the challenges in functional inference from sequence similarity. Investigators can improve the accuracy of function prediction by (1) being conservative about the evolutionary distance to a protein of known function; (2) considering the ambiguous mean...
Source: Current Protocols in Bioinformatics - September 6, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using RAxML to Infer Phylogenies.
Authors: Stamatakis A Abstract Inference of phylogenetic trees under the maximum likelihood (ML) criterion represents a routine task in biological data analysis. In this unit we describe how to plan analyses and use Randomized Accelerated Maximum Likelihood (RAxML) for phylogenetic inferences under ML, how to infer support values using the standard bootstrap procedure as well as other statistical measures, and how to conduct post-analyses on collections/sets of phylogenetic trees including statistical significance tests and consensus tree methods. We also discuss what measures can be taken and what further...
Source: Current Protocols in Bioinformatics - September 6, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Unix Survival Guide.
Authors: Stein LD Abstract Most bioinformatics software has been designed to run on Linux and other Unix-like systems. Unix is different from most desktop operating systems because it makes extensive use of a text-only command-line interface. It can be a challenge to become familiar with the command line, but once a person becomes used to it, there are significant rewards, such as the ability to string a commonly used series of commands together with a script. This appendix will get you started with the command line and other Unix essentials. © 2015 by John Wiley & Sons, Inc. PMID: 26334925 [...
Source: Current Protocols in Bioinformatics - September 6, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Efficient Alignment of Illumina-Like High-Throughput Sequencing Reads with the GEnomic Multi-tool (GEM) Mapper.
Authors: Marco-Sola S, Ribeca P Abstract Modern Illumina-like high-throughput sequencing machines allow the cheap decoding of great amounts of DNA. The GEnomic Multi-tool (GEM) mapper is one of the fastest and most sensitive methods known to date to align such data to a known genomic reference. This unit explains how to use it effectively. © 2015 by John Wiley & Sons, Inc. PMID: 26094690 [PubMed - in process] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - June 27, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

The Importance of Biological Databases in Biological Discovery.
Authors: Baxevanis AD, Bateman A Abstract Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric ...
Source: Current Protocols in Bioinformatics - June 27, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Installing a Local Copy of the Reactome Web Site and Knowledgebase.
Authors: McKay SJ, Weiser J Abstract The Reactome project builds, maintains, and publishes a knowledgebase of biological pathways. The information in the knowledgebase is gathered from the experts in the field, peer reviewed and edited by Reactome editorial staff, and then published to the Reactome Web site, http://www.reactome.org. The Reactome software is open source and builds on top of other open-source or freely available software. Reactome data and code can be freely downloaded in its entirety and the Web site installed locally. This allows for more flexible interrogation of the data and also makes i...
Source: Current Protocols in Bioinformatics - June 22, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins.
Authors: Sillitoe I, Lewis T, Orengo C Abstract The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D res...
Source: Current Protocols in Bioinformatics - June 22, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Searching and Navigating UniProt Databases.
Authors: Pundir S, Magrane M, Martin MJ, O'Donovan C, UniProt Consortium Abstract The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. The UniProt Web site receives ∼400,000 unique visitors per month and is the primary means to access UniProt. It provides ten searchable datasets and three main tools. The key UniProt datasets are the UniProt Knowledgebase (UniProtKB), the UniProt Reference Clusters (UniRef), the UniProt Archive (UniParc), and protein sets for completely sequenced genomes (Proteomes). Other supporting datasets include information ab...
Source: Current Protocols in Bioinformatics - June 22, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Investigating Protein Structure and Evolution with SCOP2.
Authors: Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG Abstract SCOP2 is a successor to the Structural Classification of Proteins (SCOP) database that organizes proteins of known structure according to their structural and evolutionary relationships. It was designed to provide a more advanced framework for the classification of proteins. The SCOP2 classification is described in terms of a directed acyclic graph in which each node defines a relationship of particular type that is represented by a region of protein structure and sequence. The SCOP2 data are accessible via SCOP2-Browser and SCOP2-Gra...
Source: Current Protocols in Bioinformatics - March 12, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using REDItools to Detect RNA Editing Events in NGS Datasets.
Authors: Picardi E, D'Erchia AM, Montalvo A, Pesole G Abstract RNA editing is a post-transcriptional/co-transcriptional molecular phenomenon whereby a genetic message is modified from the corresponding DNA template by means of substitutions, insertions, and/or deletions. It occurs in a variety of organisms and different cellular locations through evolutionally and biochemically unrelated proteins. RNA editing has a plethora of biological effects including the modulation of alternative splicing and fine-tuning of gene expression. RNA editing events by base substitutions can be detected on a genomic scale by...
Source: Current Protocols in Bioinformatics - March 12, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Scoring Large-Scale Affinity Purification Mass Spectrometry Datasets with MiST.
We describe how to run the full MiST analysis pipeline in an R environment and discuss a number of configurable options that allow the lay user to convert any large-scale AP-MS data into an interpretable, biologically relevant protein-protein interaction network. © 2015 by John Wiley & Sons, Inc. PMID: 25754993 [PubMed - in process] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - March 12, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Expression data analysis with reactome.
Authors: Jupe S, Fabregat A, Hermjakob H Abstract The Reactome database of curated biological pathways provides a tool for visualizing user-supplied expression data as an overlay on pathway diagrams, thereby affording an effective means to examine expression of the constituents of the pathway and determine whether all that are necessary are present. Several experiments can be visualized in succession, to determine whether expression changes with experimental conditions, a useful feature for examining a time-course, dose-response, or disease progression. © 2015 by John Wiley & Sons, Inc. PMID:...
Source: Current Protocols in Bioinformatics - March 12, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using pLink to Analyze Cross-Linked Peptides.
Authors: Fan SB, Meng JM, Lu S, Zhang K, Yang H, Chi H, Sun RX, Dong MQ, He SM Abstract pLink is a search engine for high-throughput identification of cross-linked peptides from their tandem mass spectra, which is the data-analysis step in chemical cross-linking of proteins coupled with mass spectrometry analysis. pLink has accumulated more than 200 registered users from all over the world since its first release in 2012. After 2 years of continual development, a new version of pLink has been released, which is at least 40 times faster, more versatile, and more user-friendly. Also, the function of the new ...
Source: Current Protocols in Bioinformatics - March 12, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection.
Authors: Koboldt DC, Larson DE, Wilson RK Abstract The identification of small sequence variants remains a challenging but critical step in the analysis of next-generation sequencing data. Our variant calling tool, VarScan 2, employs heuristic and statistic thresholds based on user-defined criteria to call variants using SAMtools mpileup data as input. Here, we provide guidelines for generating that input, and describe protocols for using VarScan 2 to (1) identify germline variants in individual samples; (2) call somatic mutations, copy number alterations, and LOH events in tumor-normal pairs; and (3) iden...
Source: Current Protocols in Bioinformatics - January 3, 2015 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research