This function creates a data frame of protein intensities
Usage
create_df(
prot_groups,
exp_design,
input_type = "MaxQuant",
data_type = "LFQ",
filter_na = TRUE,
filter_prot = TRUE,
uniq_pep = 2,
tech_reps = FALSE,
zero_na = TRUE,
log_tr = TRUE,
base = 2
)
Arguments
- prot_groups
File path to a proteinGroups.txt file produced by MaxQuant or a standard input file containing a quantitative matrix where the proteins or protein groups are indicated by rows and the samples by columns.
- exp_design
File path to a text file containing the experimental design.
- input_type
Type of input file indicated by
prot_groups
. Available options are: "MaxQuant", if a proteinGroups.txt file is used, or "standard" if a standard input file is used. Default is "MaxQuant."- data_type
Type of sample protein intensity data columns to use from the proteinGroups.txt file. Some available options are "LFQ", "iBAQ", "Intensity". Default is "LFQ." User-defined prefixes in the proteinGroups.txt file are also allowed. The
data_type
argument is case-sensitive, and only applies wheninput_type = "MaxQuant"
.- filter_na
Logical. If
TRUE
(default), filters out empty rows and columns from the data frame.- filter_prot
Logical. If
TRUE
(default), filters out reverse proteins, proteins only identified by site, potential contaminants, and proteins identified with less than the minimum number of unique peptides indicated byuniq_pep
. Only applies wheninput_type = "MaxQuant"
.- uniq_pep
Numerical. Proteins that are identified by this number or fewer number of unique peptides are filtered out (default is 2).Only applies when
input_type = "MaxQuant"
.- tech_reps
Logical. Indicate as
TRUE
if technical replicates are present in the data. Default isFALSE
.- zero_na
Logical. If
TRUE
(default), zeros are considered missing values and replaced with NAs.- log_tr
Logical. If
TRUE
(default), intensity values are log transformed to the base indicated bybase
.- base
Numerical. Logarithm base. Default is 2.
Value
A raw_df
object which is a data frame containing protein
intensities. Proteins or protein groups are indicated by rows and samples
by columns.
Details
It then reads in the expDesign.txt file provided as
exp_design
and extracts relevant information from it to add to the data frame. an example of the expDesign.txt is provided here: https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt.First, empty rows and columns are removed from the data frame.
Next, if a proteinGroups.txt file is used, it filters out reverse proteins, proteins that were only identified by site, and potential contaminants.Then it removes proteins identified with less than the number of unique peptides indicated by
uniq_pep
from the data frame.Next, it extracts the intensity columns indicated by
data type
and the selected protein rows from the data frame.Converts missing values (zeros) to NAs.
Finally, the function log transforms the intensity values.
Examples
# \donttest{
### Using a proteinGroups.txt file produced by MaxQuant as input.
## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt",
input_type = "MaxQuant"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 1923 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
## Data containing technical replicates
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg2.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed2.txt",
input_type = "MaxQuant",
tech_reps = TRUE
)
#> 0 empty row(s) removed.
#> 1 empty column(s) removed.
#> 12 reverse protein(s) (rows) removed.
#> 29 protein contaminant(s) (rows) removed.
#> 188 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
## Alter the number of unique peptides needed to retain a protein
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt",
input_type = "MaxQuant",
uniq_pep = 1
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 961 protein(s) identified by 1 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
## Use "iBAQ" values instead of "LFQ" values
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt",
input_type = "MaxQuant",
data_type = "iBAQ"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> 80 protein(s) (rows) only identified by site removed.
#> 65 reverse protein(s) (rows) removed.
#> 42 protein potential contaminant(s) (rows) removed.
#> 1923 protein(s) identified by 2 or fewer unique peptides removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
### Using a universal standard input file instead of MaxQuant output.
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/st.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt",
input_type = "standard"
)
#> 0 empty row(s) removed.
#> 0 empty column(s) removed.
#> Zeros have been replaced with NAs.
#> Data have been log-transformed.
# }