This function performs common preprocessing steps for mass spectrometry (MS)-like omics datasets, including QC sample removal, zero-to-NA conversion, feature prevalence filtering, transformation, and feature-wise value imputation.
Arguments
- X
A numeric data frame or matrix (samples in rows, features in columns).
- remove_ids
A regex or character vector to filter out rows in
X
(e.g. QCs). Set toNULL
to skip.- min_prev
Numeric between 0 and 1. Minimum non-missing prevalence threshold. Zeros are first converted to NA.
- rename_feat
Logical. If
TRUE
, features will be renamed as "feat_n" and original labels stored.- transform
One of
"none"
,"log"
, or"sqrt"
.- log_base_num
Numeric logarithm base. Required if
transform = "log"
.- impute
One of
"none"
,"min_val"
, or"QRILC"
. Note:imputeLCMD::impute.QRILC()
requires log-transformed data. Log-transform will be forced internally regardless oftransform =
setting.- min_val_factor
Numeric >= 1. Scaling factor for min value imputation.
- platform
whether data was generated by mass spectrometry (
"ms"
) or nuclear magnetic resonance spectroscopy ("nmr"
), the latter allowing negative values in the matrix.- seed
Optional integer. If provided, sets the random seed for reproducible
imputeLCMD::imputeQRILC()
permutation results.- verbose
Logical. Show messages about the processing steps.
- ...
Extra arguments passed to
imputeLCMD::impute.QRILC()
.
References
Lazar, C., Gatto, L., Ferro, M., Bruley, C., & Burger, T. (2016). Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. Journal of Proteome Research, 15(4), 1116–1125. doi:10.1021/acs.jproteome.5b00981
Wei, R., Wang, J., Su, M., Jia, E., Chen, S., Chen, T., & Ni, Y. (2018). Missing value imputation approach for mass spectrometry-based metabolomics data. Scientific Reports, 8, 663. doi:10.1038/s41598-017-19120-0
See also
imputeLCMD::impute.QRILC()
for imputing missing values.