PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of ...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Mi, H., Huang, X., Muruganujan, A., Tang, H., Mills, C., Kang, D., Thomas, P. D. Tags: Database Issue Source Type: research

The neXtProt knowledgebase on human proteins: 2017 update
The neXtProt human protein knowledgebase (https://www.nextprot.org) continues to add new content and tools, with a focus on proteomics and genetic variation data. neXtProt now has proteomics data for over 85% of the human proteins, as well as new tools tailored to the proteomics community. Moreover, the neXtProt release 2016-08-25 includes over 8000 phenotypic observations for over 4000 variations in a number of genes involved in hereditary cancers and channelopathies. These changes are presented in the current neXtProt update. All of the neXtProt data are available via our user interface and FTP site. We also provide an A...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Gaudet, P., Michel, P.-A., Zahn-Zabal, M., Britan, A., Cusin, I., Domagalski, M., Duek, P. D., Gateau, A., Gleizes, A., Hinard, V., Rech de Laval, V., Lin, J., Nikitin, F., Schaeffer, M., Teixeira, D., Lane, L., Bairoch, A. Tags: Database Issue Source Type: research

Uniclust databases of clustered and deeply annotated protein sequences and alignments
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence ...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Mirdita, M., von den Driesch, L., Galiez, C., Martin, M. J., Söding, J., Steinegger, M. Tags: Database Issue Source Type: research

UniProt: the universal protein knowledgebase
The UniProt knowledgebase is a large resource of protein sequences and associated detailed annotation. The database contains over 60 million sequences, of which over half a million sequences have been curated by experts who critically review experimental and predicted data for each protein. The remainder are automatically annotated based on rule systems that rely on the expert curated knowledge. Since our last update in 2014, we have more than doubled the number of reference proteomes to 5631, giving a greater coverage of taxonomic diversity. We implemented a pipeline to remove redundant highly similar proteomes that were ...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: The UniProt Consortium Tags: Database Issue Source Type: research

TFBSbank: a platform to dissect the big data of protein-DNA interaction in human and model species
Genome-wide transcription factors (TFs) binding data has been extensively generated in the past few years, which poses a great challenge to data interpretation. Therefore, comprehensive and dedicated functional annotation databases for TF–DNA interaction are in great demands to manage, explore and utilize those invaluable data resources. Here, we constructed a platform ‘TFBSbank’ which houses the annotation of 1870 chromatin immunoprecipitation (ChIP) datasets of 585 TFs in five species (human, mouse, fly, worm and yeast). There are mainly five functional modules in TFBSbank aimed at characterizing ChIP p...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Chen, D., Jiang, S., Ma, X., Li, F. Tags: Database Issue Source Type: research

TcoF-DB v2: update of the database of human and mouse transcription co-factors and transcription factor interactions
Transcription factors (TFs) play a pivotal role in transcriptional regulation, making them crucial for cell survival and important biological functions. For the regulation of transcription, interactions of different regulatory proteins known as transcription co-factors (TcoFs) and TFs are essential in forming necessary protein complexes. Although TcoFs themselves do not bind DNA directly, their influence on transcriptional regulation and initiation, although indirect, has been shown to be significant, with the functionality of TFs strongly influenced by the presence of TcoFs. In the TcoF-DB v2 database, we collect informat...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Schmeier, S., Alam, T., Essack, M., Bajic, V. B. Tags: Database Issue Source Type: research

SNP2TFBS - a database of regulatory SNPs affecting predicted transcription factor binding site affinity
SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular ...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Kumar, S., Ambrosini, G., Bucher, P. Tags: Database Issue Source Type: research

RNALocate: a resource for RNA subcellular localizations
In conclusion, RNALocate will be of help in elucidating the entirety of RNA subcellular localization, and developing new prediction methods. The database is available at http://www.rna-society.org/rnalocate/. (Source: Nucleic Acids Research)
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Zhang, T., Tan, P., Wang, L., Jin, N., Li, Y., Zhang, L., Yang, H., Hu, Z., Zhang, L., Hu, C., Li, C., Qian, K., Zhang, C., Huang, Y., Li, K., Lin, H., Wang, D. Tags: Database Issue Source Type: research

RNAcentral: a comprehensive database of non-coding RNA sequences
RNAcentral is a database of non-coding RNA (ncRNA) sequences that aggregates data from specialised ncRNA resources and provides a single entry point for accessing ncRNA sequences of all ncRNA types from all organisms. Since its launch in 2014, RNAcentral has integrated twelve new resources, taking the total number of collaborating database to 22, and began importing new types of data, such as modified nucleotides from MODOMICS and PDB. We created new species-specific identifiers that refer to unique RNA sequences within a context of single species. The website has been subject to continuous improvements focusing on text an...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: The RNAcentral Consortium Tags: Database Issue Source Type: research

R-loopDB: a database for R-loop forming sequences (RLFS) and R-loops
R-loopDB (http://rloop.bii.a-star.edu.sg) was originally constructed as a collection of computationally predicted R-loop forming sequences (RLFSs) in the human genic regions. The renewed R-loopDB provides updates, improvements and new options, including access to recent experimental data. It includes genome-scale prediction of RLFSs for humans, six other animals and yeast. Using the extended quantitative model of RLFSs (QmRLFS), we significantly increased the number of RLFSs predicted in the human genes and identified RLFSs in other organism genomes. R-loopDB allows searching of RLFSs in the genes and in the 2 kb upstream ...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Jenjaroenpun, P., Wongsurawat, T., Sutheeworapong, S., Kuznetsov, V. A. Tags: Database Issue Source Type: research

RAID v2.0: an updated resource of RNA-associated interactions across organisms
With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other d...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Yi, Y., Zhao, Y., Li, C., Zhang, L., Huang, H., Li, Y., Liu, L., Hou, P., Cui, T., Tan, P., Hu, Y., Zhang, T., Huang, Y., Li, X., Yu, J., Wang, D. Tags: Database Issue Source Type: research

POSTAR: a platform for exploring post-transcriptional regulation coordinated by RNA-binding proteins
We present POSTAR (http://POSTAR.ncrnalab.org), a resource of POST-trAnscriptional Regulation coordinated by RNA-binding proteins (RBPs). Precise characterization of post-transcriptional regulatory maps has accelerated dramatically in the past few years. Based on new studies and resources, POSTAR supplies the largest collection of experimentally probed (~23 million) and computationally predicted (approximately 117 million) RBP binding sites in the human and mouse transcriptomes. POSTAR annotates every transcript and its RBP binding sites using extensive information regarding various molecular regulatory events (e.g., splic...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Hu, B., Yang, Y.-C. T., Huang, Y., Zhu, Y., Lu, Z. J. Tags: Database Issue Source Type: research

NGSmethDB 2017: enhanced methylomes and differential methylation
The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methyl...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Lebron, R., Gomez-Martin, C., Carpena, P., Bernaola-Galvan, P., Barturen, G., Hackenberg, M., Oliver, J. L. Tags: Database Issue Source Type: research

miRPathDB: a new dictionary on microRNAs and target pathways
In the last decade, miRNAs and their regulatory mechanisms have been intensively studied and many tools for the analysis of miRNAs and their targets have been developed. We previously presented a dictionary on single miRNAs and their putative target pathways. Since then, the number of miRNAs has tripled and the knowledge on miRNAs and targets has grown substantially. This, along with changes in pathway resources such as KEGG, leads to an improved understanding of miRNAs, their target genes and related pathways. Here, we introduce the miRNA Pathway Dictionary Database (miRPathDB), freely accessible at https://mpd.bioinf.uni...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Backes, C., Kehl, T., Stöckel, D., Fehlmann, T., Schneider, L., Meese, E., Lenhof, H.-P., Keller, A. Tags: Database Issue Source Type: research

MethSMRT: an integrative database for DNA N6-methyladenine and N4-methylcytosine generated by single-molecular real-time sequencing
DNA methylation is an important type of epigenetic modifications, where 5- methylcytosine (5mC), 6-methyadenine (6mA) and 4-methylcytosine (4mC) are the most common types. Previous efforts have been largely focused on 5mC, providing invaluable insights into epigenetic regulation through DNA methylation. Recently developed single-molecule real-time (SMRT) sequencing technology provides a unique opportunity to detect the less studied DNA 6mA and 4mC modifications at single-nucleotide resolution. With a rapidly increased amount of SMRT sequencing data generated, there is an emerging demand to systematically explore DNA 6mA an...
Source: Nucleic Acids Research - January 2, 2017 Category: Research Authors: Ye, P., Luan, Y., Chen, K., Liu, Y., Xiao, C., Xie, Z. Tags: Database Issue Source Type: research