by Nikolay Oskolkov
Lund University and National Bioinformatics Infrastructure Sweden (NBIS)
Dimensionality reduction is an Exploratory Data Analysis (EDA) approach allowing for fast visualization of high-dimensional data and the possibility of discovering hidden systematic patterns within a data set. While linear dimensionality reduction techniques, such as Principal Component Analysis (PCA), are considered the golden standard in many areas of data science, they seem to be inadequate for analyzing non-linear high-dimensional data (e.g. images, text, gene expression). Instead, in this case, non-linear dimensionality reduction with t-distributed Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP) have been widely used, providing state-of-the-art methods to explore high-dimensional data. This chapter will give an overview of dimension reduction techniques, with a particular focus on PCA, tSNE, and UMAP and their applications within the fields of data science and computational biology.