izpis_h1_title_alt

Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification
Poličar, Pavlin Gregor (Author), Zupan, Blaž (Mentor) More about this mentor... This link opens in a new window

.pdfPDF - Presentation file, Download (16,34 MB)

Abstract
Dimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data. When working with multiple data sets, a straightforward application of these methods often fails; instead of revealing underlying classes, the resulting visualizations expose data set-specific clusters. To circumvent these batch effects, we propose a principled embedding procedure that enables the addition of new data points into existing t-SNE embeddings. We provide an open-source implementation of the proposed method and demonstrate the utility of our approach with an analysis of six recently published single-cell gene expression data sets containing up to tens of thousands of cells and thousands of genes. We present surprising evidence that our computationally more direct procedure solves the batch effect problem, one of the core challenges in the analysis of gene expression data, and enables the reuse of t-SNE embeddings, paving the way for interpretable visualizations of high-dimensional data sets.

Language:English
Keywords:batch effects, embedding, t-SNE, visualization, single-cell transcriptomics, data integration, domain adaptation
Work type:Master's thesis/paper (mb22)
Organization:FRI - Faculty of computer and information science
Year:2019
COBISS.SI-ID:1538306243 Link is opened in a new window
Views:437
Downloads:304
Metadata:XML RDF-CHPDL DC-XML DC-RDF
 
Average score:(0 votes)
Your score:Voting is allowed only to logged in users.
:
Share:AddThis
AddThis uses cookies that require your consent. Edit consent...

Secondary language

Language:Slovenian
Title:Dodajanje primerov v referenčno vložitev t-SNE odstrani razlike med različnimi podatkovnimi viri
Abstract:
Tehnike zmanjševanja dimenzij, kot je t-SNE, nam omogočajo gradnjo informativnih vizualizacij visokorazsežnih naborov podatkov. Pri analizi več naborov podatkov hkrati te metode pogosto ne uspejo odkriti pomenljive skupine, temveč izpostavijo nezaželene razlike med podatkovnimi viri. Da bi odstranili vplive posameznih podatkovnih virov in odkrili strukture skupne vsem podatkom, predlagamo teoretično utemeljeno metodo za dodajanje novih primerov v obstoječo vložitev t-SNE. Metodo vključimo v našo odprtokodno implementacijo metode t-SNE in pokažemo na uporabnost predlagane metode na analizi šestih nedavno objavljenih podatkovnih naborov genskih izrazov posameznih celic. Rezultati so presenetljivi; predlagana metoda namreč povsem odstrani vplive različnih podatkovnih virov, ki so eden temeljnih izzivov pri analizi podatkov s področja molekularne biologije. Predlagana tehnika poleg tega tudi omogoča uporabo vnaprej zgrajenih vložitev t-SNE, kar odpira nove možnosti uporabe interpretabilnih vizualizacij visokorazsežnih naborov podatkov.

Keywords:razlike med različnimi podatkovnimi viri, vložitev, t-SNE, vizualizacija, transkriptomika posameznih celic, integracija podatkov, domenska adaptacija

Similar documents

Similar works from RUL:
Similar works from other Slovenian collections:

Comments

Leave comment

You have to log in to leave a comment.

Comments (0)
0 - 0 / 0
 
There are no comments!

Back