Methodological Research

The methodological research of the faculty of the Department of Biostatistics and Bioinformatics involves developing and applying statistical methodology in search of answers to medical and public health questions. BIOS faculty members, post-docs, and students are involved in a variety of traditional (such as clinical trials and Bayesian methodology) and contemporary (such as imaging statistics and bioinformatics) research areas. Details on these methodological research areas and examples of specific research projects in each area can be found below.

View a list of BIOS graduate faculty and their research interests

Learn about our collaborative research 

Agreement studies have wide and important applications in biomedical research and clinical practices since data from multiple observers or measurement methods commonly occur in such settings. For example, when a new assay or instrument is developed, it is important to assess whether the new method can reproduce the results of a traditional method or of a gold standard. Our department has established a strong research program in developing agreement methodology for complex outcomes in biomedical studies. 

Research conducted by our faculty includes:

  • Development of new statistical methods to extend the existing agreement paradigm to handle multiple scale (continuous/ordinal) measurements
  • Development of nonparametric as well as parametric approaches to assess agreement for survival outcomes which involve censored or truncated observations
  • Development of agreement methods to investigate the alignment between traditional behavior/clinical outcomes and the emerging high-dimensional neuroimaging data
  • Development of agreement methods for assessing comparability among brain images acquired from multi-center neuroimaging studies
  • Agreement methods for data resulting from studies where each observer makes replicated or repeated readings on each study subject

Faculty: Ying Guo, Amita Manatunga, Limin Peng

Bayesian methods have become increasingly popular due to computational advances. Bayesian inference focuses on probability distributions characterizing unknown parameters through both prior knowledge and the data. One key advantage of Bayesian methods is the ability to integrate various sources of information in a model-based framework that can address complicated dependence structures in data often encountered in today's real-world problems.

Bayesian approaches are particularly useful in settings where: (1) the statistical model is highly complex, (2) the sample size is small, (3) prior knowledge needs to be incorporated in the analysis, (4) a comprehensive approach to quantify different sources of uncertainty is needed, and (5) the analysis of multimodal data involves structured priors.

Our department has an active group of faculty working in the development and application of Bayesian methodology for a wide range of applications, including bioinformatics (Dr. Zhaohui (Steve) Qin), time series and wearable devices (Dr. Krafty), environmental statistics (Drs. Howard Chang and Lance Waller), and high-dimensional data analysis (Dr. Suprateek Kundu).

Faculty: Howard Chang, Robert Krafty, Suprateek Kundu, Zhaohui (Steve) Qin, Lance Waller

Bioinformatics is an interdisciplinary field developing methods and software tools to analyze high-dimensional data generated from biomedical experiments. With biomedical data sets becoming larger, more diverse and more complex, information science and biostatistics play a larger role in the biomedical sciences. Bioinformatics is a very diverse field, with applications ranging across DNA sequence analysis, protein structure assessment, models of molecular evolution, and imaging analysis. Research in our department concentrates primarily on omics: genomics, epigenomics, and metabolomics.

High throughput technologies, such as next generation sequencing and mass spectrometry, produce massive amounts of data compounded with substantial noise and uncertainties, and present a rich variety of fascinating and challenging scientific questions for biostatistics researchers. Our faculty's research has resulted in novel statistical methodologies, as well as powerful and efficient algorithms and software developed for biomedical researchers. In addition, our faculty members collaborate extensively with biologists and clinicians at Emory and elsewhere to assist efforts to identify novel biological insights from these rapidly expanding experimental data.

Faculty: Karen Conneely, Suprateek Kundu, Zhaohui (Steve) Qin, Hao Wu, Xiangqin Cui

Precision medicine hinges on the development of valid biomarkers for disease diagnosis, disease prognosis, and prediction of response to specific therapeutic interventions. Fueled by the rapid recent advances in the scientific knowledge of molecular biology and high-throughput omics technologies, a large number of candidate biomarkers have been or are being identified. Statistical and computational methods play a critical role in rigorously evaluating these biomarkers and further developing clinically relevant prediction rules to ultimately improve and advance disease treatment and patient management.

Research topics include:

  • Evaluation and comparison of biomarkers in terms of clinical utilities
  • Covariate adjustment in biomarker assessment
  • Optimal combination of biomarkers in the development of multiplex prediction rules
  • Prediction rule development with high dimensional data
  • Dynamic prediction with longitudinal biomarker measurements

Faculty: Yijian (Eugene) Huang, Mary Kelley, Yi-An Ko

The goal of causal inference is to make inference on causation and treatment effects using data often collected from observational studies (subject to complications such as selection bias and confounding) as well as data from complex clinical trials. Our faculty members are actively engaged in methodological and collaborative research in this area.

Examples include:

  • Development of a general class of hybrid trial designs that combine features of treatment randomization and patient choice of treatments. Such trials are useful in behavioral intervention studies where treatment assignments cannot be blinded and strong motivation is often required to maintain compliance
  • Extension of propensity score approaches to non-binary treatment regimens and other approaches related to generalized propensity scores
  • Semiparametric methods for estimating effect of non-binary treatment regimens
  • Development of efficiency theory for estimators of causal effects
  • Developing robust methods for drawing inference about causal effects
  • Incorporation of machine learning into estimation of casual effects

 Faculty: David Benkeser, Yuan Liu

Clinical trials (prospective studies to evaluate the effect of interventions in humans under pre-specified conditions) represent the gold standard for quantifying the health impacts of given treatments and interventions. Our faculty members maintain active research in advancing the practice of clinical trials by enhancing experimental designs, developing novel analytic tools, and providing user-friendly statistical software. Examples include:

  • Improving the accuracy of maximum tolerated dose estimation and trial efficiency of Phase I clinical trials of toxicity by treating toxicity response as a quasi-continuous variable and fully utilizing all toxicities
  • Extending Phase I trials with dose escalation with overdose control designs to allow under-dose control, flexible patient administration, and full utilization of partially completed data
  • Improving the power and efficiency of Phase II trials by treating tumor clinical response as a continuous variable (percentage of tumor shrinkage or increase) instead of a categorical variable
  • Increasing the success rate of Phase III clinical trials in cancer by establishing a stronger relationship between tumor responses in Phase II trials and survival outcomes in Phase III trials
  • Propose hybrid trial designs which combine features of randomization and patient choice in the assignment of treatments
  • Propose Bayesian methods for monitoring and predicting patient accrual in ongoing clinical trials in order incorporate multiple levels of uncertainty
  • Development and testing of a measure of treatment success entitled the Illness Density Index. This index uses the longitudinal area under the response curve and is particularly useful for medical device studies, where switching treatments is less feasible and long-term outcomes are of great interest.

Faculty: Mary KelleyMichael Kutner, David Benkeser, Kirk Easley

Data science involves concepts from statistics, informatics, and computer science to draw inference from complicated, large (and often very messy) data sets. Methods involve bioinformatics, machine learning, causal inference, Bayesian inference, and spatial statistics, among many other areas.

Biostatistics and Bioinformatics faculty participate in methodologic research in data science relating to large scale data management for global health studies; high-throughput genomics, transcriptomics, and single-cell RNA metabolomics; geographic information systems and special statistics for environmental health; remote sensing satellites; immunology assays; and electronic medical records.

Faculty: Zhaohui (Steve) Qin, Hao Wu, Lance Waller, David Benkeser, Ben Risk, Traci Leong, Xiangqin Cui

Neuroimaging techniques have become an increasingly important tool in clinical research to help diagnose, treat and prevent brain diseases. In recent years, imaging statistics has emerged as one of the fastest growing research areas in biostatistics.

The main goal of imaging statistics is to develop and apply state-of-the-art statistical methods to help extract the most relevant and accurate information from neuroimaging data to advance scientific understanding of the human brain function among healthy as well as diseased subjects. Our department hosts one of the first imaging statistics research centers in the country, the Center for Biomedical Imaging Statistics (CBIS). CBIS currently develops statistical methods for data acquired from various imaging modalities including functional and structural magnetic resonance imaging, magnetic resonance spectroscopic imaging, and positron emission tomography. CBIS faculty and students have conducted statistical methodological research in:

  • Brain network analysis using analytical tools such as independent component analysis and graphical models to understand brain architecture and neural circuits
  • Imaging-based predictive modeling that aims to extract features in imaging data to predict individual disease status and treatment response
  • Reproducibility of imaging studies using analytical tools such as agreement methodologies and meta-analysis
  • Imaging genetics that integrate neuroimaging and genetic data to investigate how genetic variations impact brain structure and function which further leads to alterations in subjects' behavioral and psychiatric outcomes
  • Impacts of image acquisition, reconstruction, and preprocessing methods with applications to experimental design

In addition to methodological research, CBIS has collaborated with imaging researchers from Emory's Departments of Psychiatric and Behavioral Sciences, Radiology, and Biomedical Engineering, as well as the Winship Cancer Institute.

Faculty: Ying Guo, Robert Krafty, Suprateek Kundu, Benjamin Risk

Latent class analysis is a powerful approach to elucidate the subtypes of a disease, or more generally, to explore the heterogeneity underlying a population. This methodology is rooted in a rigorous, fully parametric framework and thus allows more efficient inferences than heuristic clustering procedures, but this approach also faces a number of challenges. Our faculty are developing semiparametric methods of latent class analysis to make the approach more robust and scalable. We are also developing joint models to incorporate information from both a high-dimensional collection of longitudinal trajectories and survival outcomes subject to competing risks. In addition, we are investigating methods to flexibly incorporate high-dimensional covariates into these latent class models.

Faculty: John Hanfelt, Amita Manatunga, Limin Peng, Mary Kelley, Suprateek Kundu

Machine learning and artificial intelligence describe techniques for automated recognition of patterns in data. From image recognition to prediction of clinical disease, these techniques have myriad applications in modern society. Biostatisticians play an important role in developing and understanding the theoretical properties of such techniques. Examples include:

  • Development of new machine learning algorithms for prediction of clinical disease and optimal allocation of treatments
  • Methods for estimating the predictive performance of machine learning algorithms
  • Methods for optimal selection of tuning parameters for machine learning algorithms
  • Methods for scalable machine learning for online data collection

Faculty: David Benkeser, Donald Lee 

Mental disorders are the leading cause of disability in the USA and mental health problems constitute a large part of the burden of disease worldwide. To address the burden of mental illness, the National Institute of Mental Health encourages the development of computational approaches that may provide novel ways to understand relationships among complex, large datasets to further the understanding of the underlying pathophysiology of mental diseases. 

Our faculty members in the department are engaged in methodological and collaborative research motivated by both cross-sectional and longitudinal mental health studies (Depression, Post Traumatic Stress Disorder) conducted at Emory University. These studies include multi-dimensional data with clinical assessments, behavioral symptoms, biological measurements such as neuroimaging and psychophysiological data. Our faculty aim to advance methodology for analyzing such data for addressing questions such as: to more effectively extract relevant information that is predictive of disease, to improve the understanding of individual variability in clinical and neurobiological phenotypes, and to provide the capacity to handle both cross-sectional and longitudinal data.

Areas of methodological interest include but are not limited to:

  • Methods for studying relationships of outcomes measured from different modalities that are diverse and complex, often in different scales (continuous, ordinal) and of different data representations (scalar, vector, matrix) 
  • Latent Class analyses for understanding the unobserved heterogeneity in mental health diseases.
  • Methods for identifying relevant features in high-dimensional neuroimaging data that align with specific symptom clusters; and assessing agreement and calibrating images from multi-center studies.
  • Methods for designing and evaluation of self-administered instruments consisting of several items in measuring the disease phenotype.
  • Tensor regression methods and tensor quantile regression methods that can simultaneously achieve accurate prediction of clinical outcomes and efficient feature extraction from high dimensional neuroimaging biomarkers
  • Tensor response quantile regression methods and global inference can achieve a robust understanding of the heterogeneity in high-dimensional neuroimaging phenotypes in terms of environmental factors.

Faculty: Ying Guo, Limin Peng, Amita Manatunga

The human microbiome is the community of microbes in and on the human body. Recent developments in high-throughput sequencing has allowed all the microbes in a community to be identified in a single, simple experiment such as 16S rRNA gene sequencing or metagenome shotgun sequencing. Many of these 16S studies have produced headlines in the popular press and tantalizing hints in the scientific literature that conditions as widely varied as obesity, rheumatoid arthritis, autism, preterm birth, and Alzheimer's disease may be related to the microbiome.  Further, interventions to change the microbiome and so affect human health are easy to imagine. For example, fecal microbial transplant has recently been shown to be a highly effective, low-cost treatment for persistent Clostridium difficile infection.

Although headlines are being generated, the basic statistical science required to fully understand and analyze data has not kept pace to answer even basic questions. Here we only outline only a few of the many data complexities and statistical challenges.

  • Because there are typically hundreds or thousands of species cohabiting at a site, the data that represent the microbial community are high-dimensional.
  • Most OTUs are rare (that is, absent from a large number of samples), due to both biological and technical phenomena: some organisms are found in only a small percentage of samples, whereas others are simply not detected owing to insufficient sequencing depth.
  • Human microbiome studies frequently adopt complex study designs such as paired, clustered, or longitudinal schemes.
  • The presence of confounding variables (e.g., gender and ancestry) and more sophisticated outcomes (e.g., possibly censored survival time) are inherent issues in many observational studies of human microbiome.
  • Complicated process of recruiting samples in medical contexts typically results in samples being sequenced in different batches, leading to strong batch effects.
  • Due to the high dimensionality of microbiome data, it is important to adjust findings for multiple comparisons.
  • Complex issues such as causal inference and mediation analysis (e.g., how much of the effect of baby aspirin on hazard of myocardial infarction is due to a change in the gut microbiome) have not been addressed in the context of analysis of 16S microbiome data.

Faculty: Yijuan Hu, Glen Satten

Missing data are ubiquitous in medical and epidemiologic research. Specific examples include survey nonresponse, missed clinical visits by study subjects, patient drop out, respondents refusing to answer certain items on a questionnaire, or data lost in transcription. Inadequate handling of missing data is known to lead to biased and less precise results.

Consequently, the development and application of methods dealing with missing data draws substantial interest and remains a very active area of research. To this end, many statistical methods have been developed for conducting appropriate statistical inference in the presence of missing data. The four common methodologies for handling missing data include likelihood-based approaches, multiple imputation, Bayesian methods, and semi-parametric methods including those based on inverse-probability weighted estimating equations.

Covariate contamination or misclassification of categorical data is also a common issue in medical and epidemiologic research. Examples include measures of CD4 count and viral load in HIV/AIDS studies, blood pressure in cardiovascular disease research, and dietary intake in cancer prevention. Ignoring this measurement error can result in substantial estimation bias in analyses and misinterpretation of results.

Statistical methods to account for covariate measurement error have been under active development—particularly functional modeling methods for nonlinear models. Such methods do not impose distributional assumptions on the unobserved true covariates and are thus appealing for their robustness. Structural measurement error models also permit flexibility in such settings, despite the need for further modeling or distributional assumptions. Similar methods have been extended to handle complex missing data mechanisms based on supplemental sampling designs.

Our faculty is particularly interested in advancing statistical methodology in:

  • Robust imputation methods
  • Imputation methods for big data
  • Functional modeling methods for covariate measurement error
  • Structural modeling methods to handle missing data, misclassification, and measurement error via efficient validation or reassessment study designs
  • Methods to adjust for preferential sampling in observational studies

Faculty: Yijian (Eugene) Huang, Bob LylesSuprateek Kundu

Public health data are increasingly being collected with geospatial information. The analysis of spatially referenced data provides opportunities for a wide variety of methodological and applied statistical research. These approaches often involve the use of spatially correlated random effects within generalized linear mixed models to accurately estimate fixed effects accounting for the presence of spatial correlation.

Research typically uses geographic information systems to manage and visualize data, and Bayesian hierarchical models to examine associations between outcomes and possible explanatory variables. The department's faculty members are involved in numerous research projects developing spatio-temporal models for a wide range of applications. Examples include:

  • Infectious disease (spatial dynamics of raccoon rabies, malaria, and schistosomiasis)
  • Ecology (spatial patterns in sea turtle nesting)
  • Epidemiology (measuring and mapping disparities in disease burdens and accessibility to health care/sanitation)
  • Exposure assessment (data assimilation of satellite imagery and ground monitor exposure data)
  • Environmental health (estimating the health impacts of air quality, extreme heat, and climate change)

Faculty: Howard ChangLance Waller, Ben Risk

Statistical genetics focuses on disease gene mapping, that is, linking inherited genetic variations and to draw inference on genetic drivers of disease risk. Statistical genetics is mostly centered on human genetics where results of the Human Genome Project are changing the practice of medicine and public health allowing human genetics to play a more central role in all the biomedical sciences.

Our faculty members develop new statistical methodology to explore these issues as well as new algorithms and popular software. Specific research foci include methods for design and analysis of large-scale data sets from genome-wide association studies and next-generation sequencing studies, using both unrelated and related individuals. These developments present remarkable opportunities for the prevention and cure of human diseases, allowing investigators to work at the interface between human genetics and the mathematical sciences. In addition, our faculty members collaborate closely with geneticists, molecular biologists, clinicians, and bioinformaticians to address real-world questions of human health and disease.

Faculty: Karen Conneely, Michael Epstein, Yijuan Hu, Glen Satten

Our work in the area of biostatistics and bioinformatics in cancer research continues to grow with the support of faculty members like Drs. Yuan Liu, and Jeffrey Switchenko from the Biostatistics and Bioinformatics Shared Resource (at the Winship Cancer Institute). Faculty research is motivated by state-of-the cancer data arising from ongoing retrospective and prospective study designs. 

Drs. Liu and Switchenko have experience with the analysis of large cancer databases, such as the National Cancer Data Base and the Surveillance, Epidemiology, and End Results (SEER) Program. Dr. Kutner leads the coordination of Biostatistics Cores for Center Grants and Program Project Grants at the Winship Cancer Institute. Dr. Kutner has developed stopping rules for Phase III clinical trials.

Faculty: Michael Kutner, Yuan Liu, Jeffrey Switchenko

Biostatisticians have made and continue to make important contributions with respect to the study of infectious diseases. Statistical methods evaluate how well vaccines and other interventions prevent infections, morbidity, and mortality. They are also used in predicting the incidence of infectious diseases in different locations over time. 

Statistical analyses of infectious disease data are challenging since standard assumptions of independence among individuals often do not apply. Complicating issues even further, infection and illness status can be misclassified easily. Moreover, exposure to an infectious agent is often difficult to quantify (we usually do not know who infected whom). Faculty at the Department of Biostatistics and Bioinformatics use statistical models and methods to address these and related issues. Several faculty are involved in collaborative research activities associated with infectious diseases (for example, the Biostatistics and Bioinformatics Core for the Emory Center for AIDS Research).

Faculty: Michael Haber, Christina Mehta, Lance Waller, David Benkeser, Bob Lyles, Howard Chang, Kirk Easley, Max Lau

In addition to methodological work directly related to the health sciences, our faculty members also engage in fundamental research in statistical theory.

One area of theoretical research addresses the problem of "many nuisance parameters", which arises when there is substantial heterogeneity in the population that is not of main scientific interest but that must be accounted for in order to arrive at valid inference and robust conclusions. The presence of many nuisance parameters is pathological: it invalidates standard methods of statistical inference. Department faculty members have developed approaches to reduce or eliminate the harmful effects of nuisance parameters in either the full likelihood context (e.g., relaxed conditional likelihood under a rectangular array asymptotic setting) or the estimating function context (e.g., composite conditional score functions, orthogonal second-order locally ancillary estimating functions, and G-ancillary estimating functions). These methods are designed to be computationally feasible and robust while avoiding unnecessary modeling assumptions.

Another area of research in statistical theory concerns the application and adaptation of empirical process methods to provide accurate and reliable inference for complex data structures. Examples include development of classification strategy, study of global quantile regression in high-dimensional settings, and efficient estimation and robust inference for causal effects.

Faculty: John Hanfelt, Yijian (Eugene) Huang, Limin Peng, David Benkeser, Suprateek Kundu

Epidemiology and environmental health represent two highly important traditional disciplines in the broad field of public health. Our faculty members are engaged in methodological and collaborative research motivated by studies such as: air pollution and health, HIV and cancer epidemiology, and investigations of associations between biomarker levels (e.g., in blood or urine) and reproductive health outcomes.

Key ongoing public health problems provide the motivation for much of the methodological research conducted by department faculty in this area. Areas of particular interest include but are not limited to:

  • Methods for causal inference in observational studies
  • Methods for handling missing and mismeasured data
  • Methods for assessing agreement between multiple biomarkers of exposure and/or disease
  • Spatial analysis and geographic information systems
  • Modeling the dynamics of outbreaks of infectious disease in space and time.
  • Survival analysis and quantile regression relating to limit-of-detection of environmental exposures
  • High-dimensional measures of lifetime exposures (the exposome)
  • Statistical genetics and estimation of gene-environment interactions
  • Imaging statistics, including remote sensing images as markers of environmental exposure.
  • Methods to account for pooled laboratory specimens and non-detectable measurements

Collaborative ties with faculty in Rollins' Department of Epidemiology and Department of Environmental Health enhance the opportunities and breadth of the impact associated with ongoing research in these areas.  

Faculty: Howard Chang, Julie Clennon, Bob Lyles, Amita Manatunga, Limin Peng, Lance Waller

Survival analysis addresses time-to-event data, which arise routinely in clinical trials and observational follow-up studies. One distinguishing focus of survival analysis is the ability to draw information from incomplete observations of time-to-event responses in real data settings, addressing complications known as censoring, competing risks, and truncation.

Methodology has been well established for traditional types of survival data, where assumptions (such as independent censoring and independent truncation) are deemed reasonable. Techniques such as the Kaplan-Meier curve, log-rank test, and Cox's proportional hazards regression model have been well accepted and are widely used in many areas across biomedical research. Despite the success of these standard survival analysis techniques, there has been increasing attention to their limitations in practical scenarios where their underlying assumptions are considered unrealistic.

There are also many interesting research problems arising from the rapid development of new, high-dimensional data structures applied to new investigative goals. Examples of such problems include assessment of dynamic survival processes, screening and selection of high-dimensional survival predictors, and delineation of fine-tuned or personalized treatment effects on survival. These challenges provide an exciting outlook for survival analysis methodological research in the future, requiring creative integration with other modern developments of statistical techniques.

Dynamic regression provides another research direction currently under active development by department faculty. Classical models, including the proportional hazards model and accelerated failure time model, presume constant effects of covariates. Such constancy assumption, however, is not realistic in many applications where effects of covariates may actually evolve over time. For instance, the effectiveness of an AIDS drug is typically eroded over time due to drug resistance. To address this issue, quantile regression provides a popular and flexible means, allowing covariate effects to vary across data quantiles. Department faculty members are actively involved in developing quantile regression methods that can appropriately handle special features of survival data.

Faculty: Ying Guo, Yijian (Eugene) Huang, Amita Manatunga, Limin Peng, Yuan Liu, Donald Lee