This function imputes missing values using a user-specified imputation method.
Usage
impute_na(
df,
method = "minProb",
tune_sigma = 1,
q = 0.01,
maxiter = 10,
ntree = 20,
n_pcs = 2,
seed = NULL
)
Arguments
- df
A
raw_df
object (output ofcreate_df
) containing missing values or anorm_df
object after performing normalization.- method
Imputation method to use. Default is
"minProb"
. Available methods:"minDet", "RF", "kNN", and "SVD"
.- tune_sigma
A scalar used in the
"minProb"
method for controlling the standard deviation of the Gaussian distribution from which random values are drawn for imputation.
Default is 1.- q
A scalar used in
"minProb"
and"minDet"
methods to obtain a low intensity value for imputation.q
should be set to a very low value. Default is 0.01.- maxiter
Maximum number of iterations to be performed when using the
"RF"
method. Default is10
.- ntree
Number of trees to grow in each forest when using the
"RF"
method. Default is20
.- n_pcs
Number of principal components to calculate when using the
"SVD"
method. Default is 2.- seed
Numerical. Random number seed. Default is
NULL
Details
impute_na
function imputes missing values using a user-specified imputation method from the available options,minProb
,minDet
,kNN
,RF
, andSVD
.Note: Some imputation methods may require that the data be normalized prior to imputation.
Make sure to fix the random number seed with
seed
for reproducibility
.
References
Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.
See also
More information on the available imputation methods can be found in their respective packages.
For
minProb
andminDet
methods, seeimputeLCMD
package.For Random Forest (
RF
) method, seemissForest
.For
SVD
method, seepca
from thepcaMethods
package.
Examples
## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 1923 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)
# \donttest{
## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
method = "RF",
maxiter = 5, ntree = 100,
seed = 3312
)
#> missForest iteration 1 in progress...done!
#> estimated error(s): 0.136547
#> difference(s): 0.001001133
#> time: 27.744 seconds
#>
#> missForest iteration 2 in progress...done!
#> estimated error(s): 0.1323305
#> difference(s): 2.465832e-06
#> time: 25.382 seconds
#>
#> missForest iteration 3 in progress...done!
#> estimated error(s): 0.1320162
#> difference(s): 9.145074e-07
#> time: 24.98 seconds
#>
#> missForest iteration 4 in progress...done!
#> estimated error(s): 0.1319143
#> difference(s): 7.914833e-07
#> time: 29.267 seconds
#>
#> missForest iteration 5 in progress...done!
#> estimated error(s): 0.1322113
#> difference(s): 6.409383e-07
#> time: 25.618 seconds
#>
## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)
# }
## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)
#> change in estimate: 0.007642841
## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)
## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")
#> Warning: Nothing to impute, because no NA are present (also after using makeNA)