Using the Saccharomyces Genome Database (SGD) for analysis of genomic information.
Authors: Skrzypek MS, Hirschman J Abstract Analysis of genomic data requires access to software tools that place the sequence-derived information in the context of biology. The Saccharomyces Genome Database (SGD) integrates functional information about budding yeast genes and their products with a set of analysis tools that facilitate exploring their biological details. This unit describes how the various types of functional data available at SGD can be searched, retrieved, and analyzed. Starting with the guided tour of the SGD Home page and Locus Summary page, this unit highlights how to retrieve data usi...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Biological sequence motif discovery using motif-x.
Authors: Chou MF, Schwartz D Abstract The Web-based motif-x program provides a simple interface to extract statistically significant motifs from large data sets, such as MS/MS post-translational modification data and groups of proteins that share a common biological function. Users upload data files and download results using common Web browsers on essentially any Web-compatible computer. Once submitted, data analyses are performed rapidly on an associated high-speed computer cluster and they produce both syntactic and image-based motif results and statistics. The protocols presented demonstrate the use of...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.
Authors: Borodovsky M, Lomsadze A Abstract This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tool...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Eukaryotic gene prediction using GeneMark.hmm-E and GeneMark-ES.
Authors: Borodovsky M, Lomsadze A Abstract This unit describes how to use the gene-finding programs GeneMark.hmm-E and GeneMark-ES for finding protein-coding genes in the genomic DNA of eukaryotic organisms. These bioinformatics tools have been demonstrated to have state-of-the-art accuracy for many fungal, plant, and animal genomes, and have frequently been used for gene annotation in novel genomic sequences. An additional advantage of GeneMark-ES is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). ...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using OrthoMCL to assign proteins to OrthoMCL-DB groups or to cluster proteomes into new ortholog groups.
Authors: Fischer S, Brunk BP, Chen F, Gao X, Harb OS, Iodice JB, Shanmugam D, Roos DS, Stoeckert CJ Abstract OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their sequence similarity. OrthoMCL-DB is a public database that allows users to browse and view ortholog groups that were pre-computed using the OrthoMCL algorithm. Version 4 of this database contained 116,536 ortholog groups clustered from 1,270,853 proteins obtained from 88 eukaryotic genomes, 16 archaean genomes, and 34 bacterial genomes. Future versions of OrthoMCL-DB will include more proteomes as more genomes are seq...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using QIIME to analyze 16S rRNA gene sequences from microbial communities.
Authors: Kuczynski J, Stombaugh J, Walters WA, González A, Caporaso JG, Knight R Abstract QIIME (canonically pronounced "chime") is a software application that performs microbial community analysis. It is an acronym for Quantitative Insights Into Microbial Ecology, and has been used to analyze and interpret nucleic acid sequence data from fungal, viral, bacterial, and archaeal communities. The following protocols describe how to install QIIME on a single computer and use it to analyze microbial 16S sequence data from nine distinct microbial communities. PMID: 22161565 [PubMed - indexed for MEDLINE...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

An introduction to the informatics of "next-generation" sequencing.
Authors: Stein LD Abstract Next-generation sequencing (NGS) packs the sequencing throughput of a 2000's-era genome center into a single affordable machine. However, software developed for conventional sequencing technologies is often inadequate to deal with the nature of NGS technologies, which produce short, massively parallel reads. This unit surveys the software packages that are available for managing and analyzing NGS data. PMID: 22161566 [PubMed - indexed for MEDLINE] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Identification of novel and known miRNAs in deep-sequencing data with miRDeep2.
Authors: Mackowiak SD Abstract miRNAs comprise an abundant class of small non-coding RNAs that play important roles in a wide range of biological processes by post-transcriptional regulation of a large fraction of animal genes. High-throughput sequencing machines and the availability of completely sequenced genomes make it possible to reliably identify miRNAs with computational methods. This unit documents how to use the miRDeep2 software package to identify novel and known microRNAs in small RNA deep-sequencing data. Moreover, the usage of miRDeep2 to profile miRNA expression across samples is illustrated...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using the scan-x Web site to predict protein post-translational modifications.
We describe a protocol to use the scan-x Web site to view predicted acetylation sites in the human proteome and predicted phosphorylation sites in the human, mouse, fly, and yeast proteomes with high specificity. This tool is accessible from virtually any computer with a Web browser. The only requirement is a means of searching for a protein of interest in one of the represented organisms. PMID: 22161568 [PubMed - indexed for MEDLINE] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Installation and use of LabKey Server for proteomics.
Authors: Eckels J, Hussey P, Nelson EK, Myers T, Rauch A, Bellew M, Connolly B, Law W, Eng JK, Katz J, McIntosh M, Mallick P, Igra M Abstract LabKey Server (formerly CPAS, the Computational Proteomics Analysis System) provides a Web-based platform for mining data from liquid chromatography-tandem mass spectrometry (LC-MS/MS) proteomic experiments. This open source platform supports systematic proteomic analyses and secure data management, integration, and sharing. LabKey Server incorporates several tools currently used in proteomic analysis, including the X! Tandem search engine, the ProteoWizard toolkit, ...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Analyzing molecular interactions.
Authors: Petsko GA, Yates JR Abstract Molecular interactions are key processes that drive the functions of molecules. Large data sets of protein interaction data are being assembled that will help analyze structural and functional aspects of interactions. This chapter introduces techniques for analyzing both the structural aspects of molecular interactions and methods to use this information to understand and define biological pathways. PMID: 22161570 [PubMed - indexed for MEDLINE] (Source: Current Protocols in Bioinformatics)
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Using BLAT to find sequence similarity in closely related genomes.
Authors: Bhagwat M, Young L, Robison RR Abstract The BLAST-Like Alignment Tool (BLAT) is used to find genomic sequences that match a protein or DNA sequence submitted by the user. BLAT is typically used for searching similar sequences within the same or closely related species. It was developed to align millions of expressed sequence tags and mouse whole-genome random reads to the human genome at a higher speed. It is freely available either on the Web or as a downloadable stand-alone program. BLAT search results provide a link for visualization in the University of California, Santa Cruz (UCSC) Genome Bro...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

The Bluejay genome browser.
Authors: Soh J, Gordon PM, Sensen CW Abstract The Bluejay genome browser is a stand-alone visualization tool for the multi-scale viewing of annotated genomes and other genomic elements. Bluejay allows users to customize display features to suit their needs, and produces publication-quality graphics. Bluejay provides a multitude of ways to interrelate biological data at the genome scale. Users can load gene expression data into a genome display for expression visualization in context. Multiple genomes can be compared concurrently, including time series expression data, based on Gene Ontology labels. Externa...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Identifying proteomic LC-MS/MS data sets with Bumbershoot and IDPicker.
Authors: Holman JD, Ma ZQ, Tabb DL Abstract The identification of peptides and proteins by LC-MS/MS requires the use of bioinformatics. Tools developed in the Tabb Laboratory contribute significant flexibility and discrimination to this process. The Bumbershoot tools (MyriMatch, DirecTag, TagRecon, and Pepitome) enable the identification of peptides represented by MS/MS scans. All of these tools can work directly from instrument capture files of multiple vendors, such as Thermo RAW format, or from standard XML-based formats, such as mzML or mzXML. Peptide identifications are written to mzIdentML or pepXML ...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research

Identification of peptide features in precursor spectra using Hardklör and Krönik.
Authors: Hoopmann MR, MacCoss MJ, Moritz RL Abstract Hardklör and Krönik are software tools for feature detection and data reduction of high-resolution mass spectra. Hardklör is used to reduce peptide isotope distributions to a single monoisotopic mass and charge state, and can deconvolve overlapping peptide isotope distributions. Krönik filters, validates, and summarizes peptide features identified with Hardklör from data obtained during liquid chromatography mass spectrometry (LC-MS). Both software tools contain a simple user interface and can be run from nearly any desktop computer. These tools are...
Source: Current Protocols in Bioinformatics - November 12, 2014 Category: Bioinformatics Tags: Curr Protoc Bioinformatics Source Type: research