Demo Datasets
Demo1: GSE136868
C: Control; TrkB.FL: Full length splice variant; TrkB.T1: Truncated splice variant
Demo2: GSE190998
siCtrl_P: siRNA control of proliferating cells; siCtrl_S: siRNA control of senescent cells; siNTRK2: NTRK2 knockdown senescent cells
Demo3: GSE201085
RDR: residual disease (RD) with recurrence; RDnR: RD without recurrence ; pCR: Pathologic complete response ; NoNAC: not receive neoadjuvant chemotherapy
Sample Input File Formats

We recommend uploading filtered data that excludes genes with counts of 0 and low values. Alternatively, you have the option to filter your data during the upload process by retaining genes with counts per million (CPM) above a specified threshold for a minimum number of samples.
Demo
Let’s say you want to analyze the sample dataset, Demo 1. You can either display the dataset as raw counts to use the data as it is or convert the counts into logCPM values, and visualize the biological replicates of a group in your downstream cluster analyses.
Proceed to the Clustering tab upon your selections.
Demo
You can evaluate the branches of the cluster dendogram to decide on a cluster number at first glance. For instance, it can be interpreted that a greater number of clusters as 8 is more convenient to analyze gene expression patterns in the selected dataset.
Hierarchical clustering results can help select the number of clusters to be used in k-means clustering.
Preferentially, you can visualize the generated clusters as boxplots. They may provide better interpretations for datasets with larger numbers of groups or those having low variation among their groups.
Boxplot visıalization can also be preferred for k-means clusters generated using group means of logCPM values.
The average silhouette widths for increasing numbers of clusters can be viewed from the Cluster analysis button. The silhouette scores for cluster numbers can be evaluated for optimal clustering of the gene set of interest.
For instance, when the range of 4-16 is evaluated for an optimal cluster number k, the highest number of well clustered clusters are seen to be shared among the options where 8, 13 and 16 clusters are generated. Meanwhile, the average silhouette widths are decreasing further when k > 8, thus generating 8 clusters can be preferred for further analysis.
Demo
Evaluating clusters having complete opposite expression profiles can be important for obtaining a better insight into the molecular mechanisms induced by a certain condition. For instance, Clusters 4 and 7 are seen to be negatively correlated with a score of -0.98, hence it might be worthwile to investigate them together.
Demo

The silhouette information is also provided for each gene from your gene set of interest together with the information of its cluster. The information of neighboring clusters are also provided.
Note that you can also search for the silhouette information of a specific gene of interest.
Silhouette scores are based on the distance between a gene and other genes within its assigned cluster and the average distance of genes within the nearest cluster to which the gene is not assigned. Silhouette scores range from −1 to +1, where a high value indicates that the object is well matched to its own cluster and poorly matched to its neighboring clusters. See: silhouette function of the cluster package.
Demo
You can select any of the MSigDB collection(s) and examine the enriched genes with their ratios within their clusters. The gene signatures can be filtered by gene counts, p values and q values.
MSigDB Collections are retrieved by the msigdbr package.
The gene clusters can also be divided into essential and non-essential genes to evaluate separately.
Enrichment of essential genes only may provide a better insight into more critical pathways induced by a condition. Separate evaluation of non-essential genes for functional annotations can facilitate making biological interpretations that are irrelevant with cell survival.
The Cancer Dependency Map (DepMap) data was used for categorizing genes by essentiality.
The Network panel allows for the use of available gene sets from MSigDB Collections for conducting a network analysis on the clusters of interest.
For instance, you can see the network plot of the mirror clusters 4 and 7 based on the selected control and treatment groups. Upregulation of gene sets are shown in red while downregulation in blue. Top 10 enriched terms are shown in default, yet you can increase or decrase the number of terms you want to visualize in your network. The edge widths are directly proportional to the gene ratios in the clusters enriched for the presented terms. Similarly to edge widths, edge lengths are proportional to normalized gene ratios.
You can search for all the clusters and terms presented in the generated network plot. It will facilitate your observations on the plots if you have preliminary hypotheses for the clustered gene sets.
Visualizing all of the generated clusters at once will provide insight into shared gene signatures and thus a better interpretation of the clusters. For instance, Clsuters 1 and 7 share a greater proportion of terms and bot hare upregulated.
Note that you can check the gene expression patterns of your clusters from the View Clusters tab as a reminder.
The priority scores are retrieved from The Project Score Database, part of the DepMap Portal.
Demo
You can sort and filter the clusters by the odds ratios and p-values of priority genes for the given cancer types. For instance, Clusters 1 and 7 have the highest odd ratios with p < 0.05 for pan cancer. You can also check specific cancer types to see the association of your clusters.
The interactive scatter plot (x, y, z) on the top left corner of the section allows you to make observations on regression analyses using values of efficacy, selectivity, range of expression (CPM), average of expression (CPM), coefficient of variation or maximum log fold change.
For instance, the regression plot of log2 average CPM vs log2 range CPM can help compare the variations in essential and non-essential genes. Simultaneously, selection of point size as maximum logFC can help compare gene expression levels. You can simply replot by changing the parameters and selecting a cluster of interest. Note that you can discard either essential or non-essential genes from the plot for separate visualization.
On the top right corner of the section, graphs of Kolmogorov-Smirnov Test, density and Mann Whitney U Test results of genes within the selected cluster can be obtained for the selected parameter. Genes can be compared by either their efficacy scores or essentialty scores in each test.
Demo
For instance, it is seen from both Density and Mann Whitney U Test plots that non-essential genes have higher coefficients of variation whereas essential genes have slightly higher expression levels overall.
The barplot on the bottom left corner of the section shows the number of essential genes that are either common essentials, i.e. genes exhibiting shared essentiality among all lineages, or lineage-dependent, which impact efficacy scores.
The line graph on the bottom right corner is provided as a reminder of the selected cluster’s pattern.
Human MSigDB Collections from the Molecular Signatures Database (GSEA) are retrieved by the msigdbr package.
The results are provided together with the distance correlation, Differential gene expression results, information of gene essentiality with the efficacy, selectivity values and linear regression values for mRNA and protein correlation including the correlation coefficient (r), R square (R2), Adjusted R square (Adj.R2), intercept, slope and p value.
A distance correlation is calculated between the expression profile of each gene in a selected cluster and that of the cluster centroid. In other words, it shows how closely the expression pattern of each gene in a chosen cluster matches the pattern of the cluster’s center.
Demo
Genes within a selected cluster under the specified sample condition can be particularly investigated. In the Gene-Specific Analysis panel, the normalized expression levels of a selected gene in each replicate are plotted on the left and the lining of the pattern of the gene of interest with that of the cluster centroid is plotted on the right upon clicking on the gene from the results table.
You can check the deviations among the replicates of a group for your gene of interest and see the distance of your gene from the cluster centroid.
The selected gene can also be evaluated at the protein level. The proportion of protein expression in cancer cells and healthy tissues can be plotted upon the selection of cancer types.
Note that there might not be any data for some of the selected genes in the HPA database.
For instance, it is seen that while PDE4D has low to medium level expression in both healthy glial or neuronal cells and gliomas, it has high level expression in several other types of cancers as melanoma and colorectal cancer.
You can also search for another gene of interest by name. For instance, the MCM4 protein that is searched is seen to have high expression in 75% of glioma patients.
Note that you can search for multiple genes of your interest – either comma or space delimited. Gene name search is not case sensitive.
The RNA expression levels of the gene can be visualized along with the normalized quantitative protein profile by mass spectrometry within distinct tissues and preferentially within the specified diseases or cell lines.
The data is extracted from 22Q1 DepMap public release that includes 55825 genes, 1165 cell lines, 33 primary diseases, and 32 lineages, retrieved by the depmap R package.
You can conduct a linear regression analysis for the gene you have selected from the results table to check for its mRNA-protein correlation in different lineages and preferentially in specified diseases or cell lines. Robust regression results will be displayed as in the Cluster Prioritization tab.
The cluster where a specific gene or genesets of interest resides can be found from the tab Search Gene/Geneset. Note that gene names should be comma separated or empty spaces if multiple genes are searched for. Gene name search is not case sensitive. It presents two tables: one indicating genes and their cluster IDs in the current dataset, and the other providing a frequency table with both observed and expected counts. Additionally, It computes the p-value for the chi-squared test, comparing the observed and expected counts.

The red lines represent the gene’s exons; blue (forward) and orange (reverse) dots indicates the positions where each primer binds to the gene.
Validating the primers designed in CAP-RNAseq is highly recommended using another tool!
Demo
Please cite our app:
Raw File
CSV file with Gene Identifiers (Gene Symbol) in first column and raw count values in other columns
Do not use duplicated gene names! If you have duplicated gene names, they will be removed.
Condition File
TXT file (one line- row or column) without header
When you click 'Apply vst+ANOVA', vst normalization (variance-stabilizing transformation) will be performed before ANOVA to stabilize the variance along the mean. Then, ANOVA will be used to remove the genes whose expression do not change between samples.
This step is optional and can be skipped.
You can visualize your data with Hierarchical Clustering. This can give an idea for the number of clusters to use in K-means Clustering.
The dissimilarity is calculated by applying the Pearson's correlation method to the centroids of the clusters to determine mirror clusters.
The target priority score is a metric used to prioritize potential therapeutic targets based on their relevance and importance. For more information, you can visit the Project Score database.
Below, the table displays count of genes that have a priority score in any cancer type, as well as the odds ratio and p-value calculated for each cluster by comparing the number of these genes within the cluster to the total number of genes across all clusters.
Please select a cluster from pull-down menu to visualize the correlation between mRNA-protein expression values of this cluster’s genes.
This tab helps the user visualize protein expression levels of genes selected from the table above or entered manually by the user based on Human Protein Atlas database (HPA).
Primer Design part will be active when the table in 'Gene-Specific Analysis' tab is generated.