Application of HDBSСAN Method for Clustering scRNA-seq Data
https://doi.org/10.15514/ISPRAS-2020-32(5)-8
Abstract
One of the main tasks in the analysis of single cell RNA sequencing (scRNA-seq) data is the identification of cell types and subtypes, which is usually based on some method of clustering. There is a number of generally accepted approaches to solving the clustering problem, one of which is implemented in the Seurat package. In addition, the quality of clustering is influenced by the use of preprocessing algorithms, such as imputation, dimensionality reduction, feature selection, etc. In the article, the HDBSCAN hierarchical clustering method is used to cluster scRNA-seq data. For a more complete comparison Experiments and comparisons were made on two labeled datasets: Zeisel (3005 cells) and Romanov (2881 cells). To compare the quality of clustering, two external metrics were used: Adjusted Rand index and V-measure. The experiments demonstrated a higher quality of clustering by the HDBSCAN method on the Zeisel dataset and a poorer quality on the Romanov dataset.
About the Authors
Maria Andreevna AKIMENKOVARussian Federation
Laboratory assistant, Information Systems Department
Anna Anatolyevna MAZNINA
Russian Federation
Laboratory assistant at the Genomic Engineering Laboratory
Anton Yurievich NAUMOV
Russian Federation
Research assistant, Information Systems Department
Evgeny Andreevich KARPULEVICH
Russian Federation
Researcher, Information Systems Department
References
1. A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nature biotechnology, vol. 36, no. 5, 2018, pp. 411–420.
2. L. McInnes, J. Healy, and S. Astels. hdbscan: Hierarchical density based clustering. Journal of Open Source Software, vol. 2, no. 11, 2017, article no. 205.
3. S. Wold, K. Esbensen, and P. Geladi. Principal component analysis. Chemometrics and intelligent laboratory systems, vol. 2, no. 1-3, 1987, pp. 37–52.
4. L. McInnes, J. Healy, and J. Melville. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
5. G. Eraslan, L. M. Simon, M. Mircea, N. S. Mueller, and F. J. Theis. Single-cell rna-seq denoising using a deep count autoencoder. Nature communications, vol. 10, no. 1, 2019, pp. 1–14.
6. E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu. Dbscan revisited, revisited: why and how you should (still) use dbscan. ACM Transactions on Database Systems, vol. 42, no. 3, 2017, pp. 1–21.
7. H. Lu, M. Halappanavar, and A. Kalyanaraman. Parallel heuristics for scalable community detection. Parallel Computing, vol. 47, 2015, pp. 19–37.
8. P. Brennecke, S. Anders, J. Kim et al. Accounting for technical noise in single-cell rna-seq experiments. Nature Methods, vol. 10, 2013, pp. 1093–1095.
9. S. Prabhakaran, E. Azizi, A. Carr, D. Pe'er. Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. In Proc. of the 33rd International Conference on Machine Learning, 2016, pp 1070-1079.
10. A. Zeisel, A.B. Mũnoz-Manchado et al. Cell types in the mousecortex and hippocampus revealed by single-cell rna-seq. Science, vol. 347, no. 6226, 2015, pp.1138–1142.
11. R.A. Romanov, A. Zeisel et al. Molecular interrogation of hypothalamic organization reveals distinct dopamine neuronal subtypes. Nature neuroscience, vol. 20, no. 2, 2017, pp. 176–188.
12. W. Li and J.J. Li. An accurate and robust imputation method scimpute for single-cell rna-seq data. Nature communications, vol. 9, no. 1, 2018, article no. 997.
13. D. Steinley. Properties of the hubert-arable adjusted rand index. Psychological methods, vol. 9, no. 3, 2004, pp. 386–396.
14. A. Rosenberg and J. Hirschberg. V-measure: A conditional entropy-based external cluster evaluation measure. In Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 2007, pp. 410–420.
Review
For citations:
AKIMENKOVA M.A., MAZNINA A.A., NAUMOV A.Yu., KARPULEVICH E.A. Application of HDBSСAN Method for Clustering scRNA-seq Data. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2020;32(5):111-120. https://doi.org/10.15514/ISPRAS-2020-32(5)-8