Skip to contents

This function imputes missing values using a user-specified imputation method.

Usage

impute_na(
  df,
  method = "minProb",
  tune_sigma = 1,
  q = 0.01,
  maxiter = 10,
  ntree = 20,
  n_pcs = 2,
  seed = NULL
)

Arguments

df

A raw_df object (output of create_df) containing missing values or a norm_df object after performing normalization.

method

Imputation method to use. Default is "minProb". Available methods: "minDet", "RF", "kNN", and "SVD".

tune_sigma

A scalar used in the "minProb" method for controlling the standard deviation of the Gaussian distribution from which random values are drawn for imputation.
Default is 1.

q

A scalar used in "minProb" and "minDet" methods to obtain a low intensity value for imputation. q should be set to a very low value. Default is 0.01.

maxiter

Maximum number of iterations to be performed when using the "RF" method. Default is 10.

ntree

Number of trees to grow in each forest when using the "RF" method. Default is 20.

n_pcs

Number of principal components to calculate when using the "SVD" method. Default is 2.

seed

Numerical. Random number seed. Default is NULL

Value

An imp_df object, which is a data frame of protein intensities with no missing values.

Details

  • impute_na function imputes missing values using a user-specified imputation method from the available options, minProb, minDet, kNN, RF, and SVD.

  • Note: Some imputation methods may require that the data be normalized prior to imputation.

  • Make sure to fix the random number seed with seed for reproducibility

.

References

Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.

See also

More information on the available imputation methods can be found in their respective packages.

  • For minProb and minDet methods, see imputeLCMD package.

  • For Random Forest (RF) method, see missForest.

  • For kNN method, see kNN from the VIM package.

  • For SVD method, see pca from the pcaMethods package.

Author

Chathurani Ranathunge

Examples

## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 1923 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.

## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)

# \donttest{
## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
  method = "RF",
  maxiter = 5, ntree = 100,
  seed = 3312
)
#>   missForest iteration 1 in progress...done!
#>     estimated error(s): 0.136547 
#>     difference(s): 0.001001133 
#>     time: 27.744 seconds
#> 
#>   missForest iteration 2 in progress...done!
#>     estimated error(s): 0.1323305 
#>     difference(s): 2.465832e-06 
#>     time: 25.382 seconds
#> 
#>   missForest iteration 3 in progress...done!
#>     estimated error(s): 0.1320162 
#>     difference(s): 9.145074e-07 
#>     time: 24.98 seconds
#> 
#>   missForest iteration 4 in progress...done!
#>     estimated error(s): 0.1319143 
#>     difference(s): 7.914833e-07 
#>     time: 29.267 seconds
#> 
#>   missForest iteration 5 in progress...done!
#>     estimated error(s): 0.1322113 
#>     difference(s): 6.409383e-07 
#>     time: 25.618 seconds
#> 


## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)
# }


## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)
#> change in estimate:  0.007642841 

## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)

## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")
#> Warning: Nothing to impute, because no NA are present (also after using makeNA)