SmulTCan

SmulTCan

The SmulTCan app allows users to analyze groups of genes, miRNAs or gene-level CNVs or methylation levels for which they have downloaded values and sample information from UCSC Xena (http://xena.ucsc.edu), with regard to the four types of survival data; overall, disease-specific, disease-free, and progress-free on TCGA (https://www.cancer.gov/tcga). The TSV file obtained from Xena should be in the following structure: “sample” and “samples” for the first two columns, followed by expression columns for the genes of interest, and “cancer type abbreviation” and “sample_type” for the last two columns. Xena uses \(\log_2(norm\_value+1)\) gene and miRNA expressions (a normalized form). Ideally, the TSV file containing either gene or miRNA expressions or gene-level CNVs or methylation \(\beta\)-values should be generated through selecting TCGA-PANCAN as the study type in Xena. When using gene-level methylation \(\beta\)-values, values in the downloaded TSV file should be multiplied by 100 before upload onto the app by the user for better interpretation of results. Only primary (or primary blood derived for LAML) tumor samples and their corresponding survival information are used in the analyses in the app for each TCGA dataset.

The app uses the Cox proportional hazards (CPH) model to analyze the impacts on survival of input genes multivariably. Once the input TSV file with the required Xena format is uploaded, the user can select/deselect names of the genes with expressions inside the input file from the side-bar panel. Several best subset selection methods are incorporated in the app, which can be made directly through the CPH model by picking genes whose coefficients have \(p < 0.05\), or by applying additional algorithms such as “glmnet” or “BeSS”. The “glmnet” tab has options for elastic net regularization, where lasso (= 1) is the default method and users can drag the slider all the way through to ridge (= 0) regression. By default, “glmnet” uses 10-fold cross-validation, however users can also choose the sample size of each dataset for the number of folds. Also included in the app is the “BeSS” package for best subset selection using ridge regression. The K-M plots in the app allow users to view results of the prognostic risk analyses based on a median prognostic index (PI) calculated from the corresponding best subset selection method.

Help

Users can click on the Walk-through button in the top left of the app interface for a walk-through of the app, from uploading the TSV file to selecting a dataset and determining the number of input genes to be analyzed, to selecting input genes.

Warnings

The app contains additional warnings that are intended to protect users from erroneous calculations that might arise from unconvergence of the CPH models or infinite coefficients resulting from the CPH models. Users are also warned when a selected dataset lacks information for a certain type of survival or when there are missing expressions for one or more input genes in the TSV file. When a selected best subset selection method can not find coefficients for the input genes in the selected dataset, this will also be indicated.

Downloads

The app currently supports the TSV format for data downloads and high-resolution PNG and PDF formats for plots.

Packages

SmulTCan is based on “shiny” (Chang et al. (2021)). The following packages are used for the computations in the app: “survival” (Therneau and Grambsch (2000)), “survminer” (Kassambara, Kosinski, and Biecek (2020)), “riskRegression” (Gerds and Ozenne (2020)), “rms” (Harrell Jr (2021)), “caret” (Kuhn (2021)), “glmnet” (Simon et al. (2011)), and “BeSS” (Wen et al. (2020)).

The “readr” package (Wickham, Hester, and Bryan (2021)) is used for reading in the survival TXT files. The “ggplot2” package (Wickham (2016)) is used for plotting best subset coefficients, while “ggrepel” (Slowikowski (2021)) is additionally used for plotting ANOVA coefficients of the CPH models. The “rintrojs” package (Ganz (2016)) was used in creating the walk-through for the app.

Citation

Please cite SmulTCan if you’d like to use figures or data generated with the app in your research. The APA format of SmulTCan’s original article is provided below:

Ozhan, A., Tombaz M., & Konu, O. (2021). SmulTCan: A Shiny application for multivariable survival analysis of TCGA data with gene sets., Computers in Biology and Medicine, 137, 104793. https://doi.org/10.1016/j.compbiomed.2021.104793

References

Chang, Winston, Joe Cheng, Joseph J. Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert, and Barbara Borges. 2021. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Ganz, Carl. 2016. rintrojs: A Wrapper for the Intro.js Library.” Journal of Open Source Software 1. https://dx.doi.org/10.21105/joss.00063.
Gerds, Thomas A., and Brice Ozenne. 2020. riskRegression: Risk Regression Models and Prediction Scores for Survival Analysis with Competing Risks. https://CRAN.R-project.org/package=riskRegression.
Harrell Jr, Frank E. 2021. Rms: Regression Modeling Strategies. https://CRAN.R-project.org/package=rms.
Kassambara, Alboukadel, Marcin Kosinski, and Przemyslaw Biecek. 2020. Survminer: Drawing Survival Curves Using ’Ggplot2’. https://CRAN.R-project.org/package=survminer.
Kuhn, Max. 2021. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Simon, Noah, Jerome Friedman, Trevor Hastie, and Rob Tibshirani. 2011. “Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent.” Journal of Statistical Software 39 (5): 1–13.
Slowikowski, Kamil. 2021. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://CRAN.R-project.org/package=ggrepel.
Therneau, Terry M., and Patricia M. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. 1st ed. New York: Springer.
Wen, Canhong, Aijun Zhang, Shijie Quan, and Xueqin Wang. 2020. “BeSS: An r Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models.” Journal of Statistical Software 94 (4). https://doi.org/10.18637/jss.v094.i04.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2021. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.