Impute missing values

This function imputes missing values using a user-specified imputation method.

Usage

impute_na(
  df,
  method = "minProb",
  tune_sigma = 1,
  q = 0.01,
  maxiter = 10,
  ntree = 20,
  n_pcs = 2,
  seed = NULL
)

Arguments

df: A raw_df object (output of create_df) containing missing values or a norm_df object after performing normalization.
method: Imputation method to use. Default is "minProb". Available methods: "minDet", "RF", "kNN", and "SVD".
tune_sigma: A scalar used in the "minProb" method for controlling the standard deviation of the Gaussian distribution from which random values are drawn for imputation.
Default is 1.
q: A scalar used in "minProb" and "minDet" methods to obtain a low intensity value for imputation. q should be set to a very low value. Default is 0.01.
maxiter: Maximum number of iterations to be performed when using the "RF" method. Default is 10.
ntree: Number of trees to grow in each forest when using the "RF" method. Default is 20.
n_pcs: Number of principal components to calculate when using the "SVD" method. Default is 2.
seed: Numerical. Random number seed. Default is NULL

Value

An imp_df object, which is a data frame of protein intensities with no missing values.

Details

impute_na function imputes missing values using a user-specified imputation method from the available options, minProb, minDet, kNN, RF, and SVD.
Note: Some imputation methods may require that the data be normalized prior to imputation.
Make sure to fix the random number seed with seed for reproducibility

References

Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.

Author

Chathurani Ranathunge

Examples

## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 1923 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.

## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)

# \donttest{
## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
  method = "RF",
  maxiter = 5, ntree = 100,
  seed = 3312
)
#>   missForest iteration 1 in progress...done!
#>     estimated error(s): 0.136547 
#>     difference(s): 0.001001133 
#>     time: 27.744 seconds
#> 
#>   missForest iteration 2 in progress...done!
#>     estimated error(s): 0.1323305 
#>     difference(s): 2.465832e-06 
#>     time: 25.382 seconds
#> 
#>   missForest iteration 3 in progress...done!
#>     estimated error(s): 0.1320162 
#>     difference(s): 9.145074e-07 
#>     time: 24.98 seconds
#> 
#>   missForest iteration 4 in progress...done!
#>     estimated error(s): 0.1319143 
#>     difference(s): 7.914833e-07 
#>     time: 29.267 seconds
#> 
#>   missForest iteration 5 in progress...done!
#>     estimated error(s): 0.1322113 
#>     difference(s): 6.409383e-07 
#>     time: 25.618 seconds
#> 


## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)
# }


## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)
#> change in estimate:  0.007642841 

## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)

## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")
#> Warning: Nothing to impute, because no NA are present (also after using makeNA)