tSNE





choosefile  example


choosefile  example
























The t-distributed Stochastic Neighbor Embedding (tSNE) is an algorithm for non-linear dimensionality reduction. The tSNE plots figure makes the complex data displayed on a two-dimensional plane. The tSNE algorithm can make the Euclidean distance between samples more significant to overcomes the pool clustering of linear dimensionality reduction. This tool is not available for single-cell database for the moment.


Function:

This tool can make the complex data displayed on a two-dimensional plane via the non-linear dimensionality reduction algorithm. The samples with high similarity will stay nearly on the coordinate system.


Input:

Abundance matrix: The file consists of a two-dimensional matrix. The first row is usually filled with gene IDs or OTUs. The first column is usually filled with sample names.

Group file: In this file, sample names locate in the first column and group names locate in the second column. The samples with the same group names will be grouped in one group. The plots in output figure will be colored on the basis of the groups. If there’s on group file uploaded, the plots will be colored according to the samples.


Parameter:

(1) Linear dimensional reduction: Linear dimensional reduction algorithm is to preprocess the data. The arithmetic speed will be efficiently increased after preprocessed. The partial_PCA algorithm improve the arithmetic speed most with the lowest accuracy. So we suggest partial_PCA only used in complex data. The PCA and partial_PCA can’t be chosen at the same time.

(2) Data scaled: We use Z-score to scale the data. Z-score is a dimensionless quantity obtained by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation. Standardizing can minimized the effect caused by huge differences between the data. We recommend standardizing the data.

(3) Perplexity: It is a parameter deciding the largest number of plots around another plot. You can choose a large number when there’s few samples and a small number when there’s lots of samples to make the figure more distinct.

(4) row_col: Decide the row or column which is regarded as the “samples”.


Output:

TSNE.xls: This file shows the location of samples in the two-dimensional plane.

TSNE.png(pdf): The tSNE plots of samples. *.png is a bitmap file. *.pdf is a vectorgraph file. The figure is drawn by tSNE-1 and tSNE-2.

The file input must be txt format and the data should be separated by separator.


Input:

1. Abundance matrix:

Samples

Otu000005

Otu000003

Otu000011

Otu000001

Otu000002

A-1

28341

8364

15894

247

85

A-2

18513

9906

22261

390

133

A-3

22002

10373

22096

528

208

A-4

22164

12623

13696

459

211

A-6

24334

13724

3705

543

232

B-2

982

9484

88

15798

12886

B-3

918

9253

391

15569

13742

B-4

945

10204

300

16832

16816

B-5

802

9254

75

13295

14057

B-6

833

8786

200

12798

10805


2. Group file:

A-1

A

A-2

A

A-3

A

A-4

A

A-6

A

B-2

B

B-3

B

B-4

B

B-5

B

B-6

B


Output:

1. Coordinate table for samples

samples

groups

tSNE_1

tSNE_2

A-1

A

-21.65079567

84.29810986

A-2

A

-24.276575

89.32886812

A-3

A

-21.90623727

87.98581002

A-4

A

-24.30854873

84.6331113

A-6

A

-25.55152121

81.11511

B-2

B

23.36802676

-83.87482244

B-3

B

22.39161677

-86.26611549

B-4

B

18.7772739

-85.64608858

B-5

B

25.3749806

-87.170141

B-6

B

27.78177984

-84.4038418


2. tSNE plots of samples