Math Biology Seminar: Joshua Scurll
Topic
Speakers
Details
Identifying and visualizing meaningful clusters within high-dimensional (HD) data is an important but challenging problem with applications to molecular biology and biomedicine. For example, mass cytometry (CyTOF) is a high-throughput single-cell technology that can quantify the abundance of >30 proteins simultaneously in single cells. CyTOF is commonly used to to investigate phenotypic heterogeneity in tumours, but this demands identification of biologically meaningful clusters from CyTOF data.
The "curse of dimensionality" poses computational challenges when analyzing HD data. Another major difficulty common to most clustering algorithms is choosing optimal values for input parameters such as the number of clusters, K. I will present a new measure of similarity between clusters in HD data using local dimensionality reduction followed by triangulation of alpha shapes. Using this new approach, which I call ASTRICS, HD data can initially be over-clustered using an existing clustering algorithm with a very conservative choice of K. ASTRICS then generates a K x K similarity matrix, which can also be interpreted as a weighted graph, for the K clusters. In turn, this can be used as input for any similarity- or graph-based clustering algorithm, and force-directed layout of the graph can be used to visualize the initial K clusters in two or three dimensions.
The introduction of ASTRICS as a fully automated, intermediate step in clustering or visualization of HD data alleviates some of the difficulties of parameter selection. Some community detection (i.e. clustering) algorithms for graphs do not require any input parameters to be specified by the user. Otherwise, the visualization afforded by ASTRICS can be used to guide parameter selection for the final clustering step. In this talk, I will demonstrate application of ASTRICS to clustering and visualizing CyTOF data. I will also illustrate the broader utility of ASTRICS beyond biology by applying it to the popular MNIST digital image dataset.
Additional Information
Joshua Scurll, UBC