Analyzing the overall effects of the microbiome abundance data with a Bayesian predictive value approach
Abstract
The microbiome abundance data is known to be over-dispersed and sparse count data. Among various zero-inflated models, zero-inflated negative binomial (ZINB) model and zero-inflated beta binomial (ZIBB) model are the methods to analyze the microbiome abundance data. ZINB and ZIBB have two sets of parameters, which are for modeling the zero-inflation part and the count part separately. Most previous methods have focused on making inferences in terms of separate case-control effect for the zero-inflation part and the count part. However, in a case-control study, the primary interest is normally focused on the inference and a single interpretation of the overall unconditional mean (also known as the overall effect) of the microbiome abundance in microbiome studies. Here, we propose a Bayesian predictive value (BPV) approach to estimate the overall effect of the microbiome abundance. This approach is implemented based on R package brms. Hence, the parameters in the models will be estimated with two Markov chain Monte Carlo (MCMC) algorithms used in Stan. We performed simulations and real data applications to compare the proposed approach and R package glmmTMB with simulation method in the estimation and inference in terms of the ratio function between the overall effects from two groups in a case-control study. The results show that the performance of the BPV approach is better than R package glmmTMB with the simulation method in terms of lower absolute biases and relative absolute biases, and coverage probability being closer to the nominal level especially when the sample size is small and zero-inflation rate is high.