The t-distributed Stochastic Neighbor Embedding (tSNE) is an algorithm for non-linear dimensionality reduction. The tSNE plots figure makes the complex data displayed on a two-dimensional plane. The tSNE algorithm can make the Euclidean distance between samples more significant to overcomes the pool clustering of linear dimensionality reduction. This tool is not available for single-cell database for the moment.
Function:
This tool can make the complex data displayed on a two-dimensional plane via the non-linear dimensionality reduction algorithm. The samples with high similarity will stay nearly on the coordinate system.
Input:
Abundance matrix: The file consists of a two-dimensional matrix. The first row is usually filled with gene IDs or OTUs. The first column is usually filled with sample names.
Group file: In this file, sample names locate in the first column and group names locate in the second column. The samples with the same group names will be grouped in one group. The plots in output figure will be colored on the basis of the groups. If there’s on group file uploaded, the plots will be colored according to the samples.
Parameter:
(1) Linear dimensional reduction: Linear dimensional reduction algorithm is to preprocess the data. The arithmetic speed will be efficiently increased after preprocessed. The partial_PCA algorithm improve the arithmetic speed most with the lowest accuracy. So we suggest partial_PCA only used in complex data. The PCA and partial_PCA can’t be chosen at the same time.
(2) Data scaled: We use Z-score to scale the data. Z-score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. Standardizing can minimized the effect caused by huge differences between the data. We recommend standardizing the data.
(3) Perplexity: It is a parameter deciding the largest number of plots around another plot. You can choose a large number when there’s few samples and a small number when there’s lots of samples to make the figure more distinct.
(4) row_col: Decide the row or column which is regarded as the “samples”.
Output:
TSNE.xls: This file shows the location of samples in the two-dimensional plane.
TSNE.png(pdf): The tSNE plots of samples. *.png is a bitmap file. *.pdf is a vectorgraph file. The figure is drawn by tSNE-1 and tSNE-2.
The file input must be txt format and the data should be separated by separator.
Input:
1. Abundance matrix:
Samples |
Otu000005 |
Otu000003 |
Otu000011 |
Otu000001 |
Otu000002 |
A-1 |
28341 |
8364 |
15894 |
247 |
85 |
A-2 |
18513 |
9906 |
22261 |
390 |
133 |
A-3 |
22002 |
10373 |
22096 |
528 |
208 |
A-4 |
22164 |
12623 |
13696 |
459 |
211 |
A-6 |
24334 |
13724 |
3705 |
543 |
232 |
B-2 |
982 |
9484 |
88 |
15798 |
12886 |
B-3 |
918 |
9253 |
391 |
15569 |
13742 |
B-4 |
945 |
10204 |
300 |
16832 |
16816 |
B-5 |
802 |
9254 |
75 |
13295 |
14057 |
B-6 |
833 |
8786 |
200 |
12798 |
10805 |
2. Group file:
A-1 |
A |
A-2 |
A |
A-3 |
A |
A-4 |
A |
A-6 |
A |
B-2 |
B |
B-3 |
B |
B-4 |
B |
B-5 |
B |
B-6 |
B |
Output:
1. Coordinate table for samples
samples |
groups |
tSNE_1 |
tSNE_2 |
A-1 |
A |
-21.65079567 |
84.29810986 |
A-2 |
A |
-24.276575 |
89.32886812 |
A-3 |
A |
-21.90623727 |
87.98581002 |
A-4 |
A |
-24.30854873 |
84.6331113 |
A-6 |
A |
-25.55152121 |
81.11511 |
B-2 |
B |
23.36802676 |
-83.87482244 |
B-3 |
B |
22.39161677 |
-86.26611549 |
B-4 |
B |
18.7772739 |
-85.64608858 |
B-5 |
B |
25.3749806 |
-87.170141 |
B-6 |
B |
27.78177984 |
-84.4038418 |
2. tSNE plots of samples
