Population structure prediction using support vector machines
Usman Roshan
Computer Science, NJIT
Clustering individuals into groups of similar geographical ancestry is a key problem in population and medical genetics. The model-based EM approach implemented in STRUCTURE and k-means clustering on the principal component projection are two techniques that have been shown to accurately separate admixed populations. In this talk I will present a support vector machine clustering algorithm for this problem. Support vector machines are discriminative classification methods with strong theoretical guarantees and impressive empirical performance. I will illustrate the performance of the algorithm and its improvement over standard approaches through empirical studies on real datasets.
|