Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens

Motivation: High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. Results: We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. Availability and Implementation: We provide R functions to implement and illustrate our method as supplementary information. Contact: sltaylor@ucdavis.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Source: Bioinformatics - Category: Bioinformatics Authors: Tags: GENOME ANALYSIS Source Type: research