Optimizing R with SparkR on a commodity cluster for biomedical research

• R is a popular environment for clinical data analysis. Computational demanding tasks can often be paralelized in computing clusters using the Message Passing Interface.• (MPI) on traditional clusters or the relatively new SparkR variant as part of the Hadoop family.• SparkR allows supporting big data analysis using R even on non-dedicated resources with minimal change to original code. It offers elastic resources and tight integration with Hadoop distributed services for huge files.•Computation in SparkR scales better than with the Message Passing Interface (MPI) due to optimized dat a communication.
Source: Computer Methods and Programs in Biomedicine - Category: Bioinformatics Authors: Source Type: research
More News: Bioinformatics