Northeastern University

College of Science
College of Computer and Information Science
360 Huntington Ave.
Boston, Massachusetts 02115

Research interests

We develop statistical and computational methods for systems-wide molecular investigations of biological organisms.

Our group works with high-throughput large-scale investigations in quantitative genomics, proteomics, metabolomics and ionomics, which rely on mass spectrometry and other complementary technologies to characterize the components of the biological systems, their functional interactions, and their relevance to disease.

Our goal is to provide statistical and computational methods and open-source software for design of these experiments, and for accurate and objective interpretation of the resulting large and complex datasets.

The methods build on the insight that the biological systems, and their large-scale measurements, contain redundancy. We therefore use groups of the measurements known to share sources of variation in single or multiple datasets, or discover these groups empirically from the data, to best represent their stochastic structure.

Examples of projects

volcano

Inferential relative quantification of proteins for mass spectrometry-based proteomics

Liquid chromatography coupled with global, targeted or data-independent mass spectrometry is widely used for quantitative proteomic investigations. While a typical output is a list of identified and quantified spectral features, the biological and clinical questions often focus on the protein level.

We propose a general statistical approach for protein quantification in arbitrary complex experimental designs It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions.

sparse

Experimental designs for quantitative mass spectrometry-based proteomics

Targeted proteomics is a method of choice for accurate and high-throughput quantification of predefined sets of proteins. Many workflows use isotope-labeled reference peptides for every target protein, which is time consuming and costly.

We report a statistical approach for quantifying full protein panels with a reduced set of reference peptides. this label-sparse approach achieves accurate quantification while reducing experimental cost and time. it is implemented in the software tool sparseQuant.

Cardinal

Statistical analysis of mass spectrometry-based images

Mass spectrometry-based imaging characterizes the chemical composition of biological samples at spatial resolution, but produces highly complex datasets.

We developed statistical methods for (1) image segmentation, which partitions a tissue into homogeneous regions, selects the informative ions, and characterizes the associated uncertainty, and (2) image classification, which assigns locations on the tissue to pre-defined classes, selects the informative ions, and estimates the resulting classification error.