Skip to contents

Normalizes protein abundance data using a TPM-like approach (Transcripts Per Million), adapting this RNA-seq normalization method for use with proteomics data.

Usage

convert_to_tpm(data)

Arguments

data

A numeric matrix containing protein abundance data with identifiers as row names and samples as columns. Values should be in linear scale, not log-transformed.

Value

A numeric matrix with the same dimensions as the input, with values normalized to a TPM-like scale (sum of each column equals 1 million).

Details

This function applies a TPM-like normalization to proteomics data, where each protein abundance value is scaled by the total abundance in the sample and multiplied by 1 million.

If your input data is log-transformed, use unlog2_data first to convert it to linear scale before applying this normalization.

Examples

# Create example protein abundance data matrix
prot_mat <- matrix(abs(rnorm(12, mean = 500, sd = 200)), nrow = 4, ncol = 3)
rownames(prot_mat) <- paste0("Protein", 1:4)
colnames(prot_mat) <- paste0("Sample", 1:3)

# View original values and column sums
print(prot_mat)
#>            Sample1  Sample2  Sample3
#> Protein1 551.06341 729.6823 443.4589
#> Protein2  12.54728 135.6365 389.2601
#> Protein3 498.88574 450.5349 625.7964
#> Protein4 624.31054 451.1601 913.0050
print(colSums(prot_mat))
#>  Sample1  Sample2  Sample3 
#> 1686.807 1767.014 2371.520 

# Convert to TPM-like normalization
tpm_mat <- convert_to_tpm(prot_mat)

# Verify that column sums equal 1 million
print(tpm_mat)
#>             Sample1   Sample2  Sample3
#> Protein1 326690.261 412946.59 186993.5
#> Protein2   7438.479  76760.28 164139.5
#> Protein3 295757.458 254969.68 263879.8
#> Protein4 370113.803 255323.46 384987.2
print(colSums(tpm_mat))
#> Sample1 Sample2 Sample3 
#>   1e+06   1e+06   1e+06