Summer 2012 REU Project

      Unsupervised Learning by Data Clustering



    Student Participants
    • Mindy Hong, Emory University
    • Robert Pearce, NC State University
    • Kevin Valakuzhy, University of North Carolina at Chapel Hill

    Advisors
    • Carl D. Meyer (Primary Faculty Advisor, NC State)
    • Shaina Race (Graduate Student Advisor, NC State)

    Project Description
    • Data Mining is one of the fastest growing disciplines in mathematics and computer science today. Advances in data collection and storage have allowed companies and scientific researchers to create huge stores of data in the hopes that data miners will be able to discern valuable information from it. The vast majority of data mining models are examples of supervised learning; a model is created using training and test data for which the variable to be predicted is known, and the goal is to minimize the error of the prediction. We will focus on unsupervised data mining techniques that aim to detect patterns and structure in unlabeled data where no value for error or accuracy can be placed on the final result. Emphasis will be placed on clustering algorithms. Many existing clustering algorithms are inadequate in that they require knowledge of the number k of clusters that exist in the data, and in that their underlying assumptions make them ineffective in certain situations. The work revolves around the method of consensus clustering that seeks to rectify the latter problem by incorporating the results of multiple clustering algorithms to achieve one final grouping. The goal is to investigate a novel method of iterative consensus clustering (ICC) which solves both the problem of determining the best value of k as well as improving cluster determination.
    • The project begins by learning and understanding some state-of-the art clustering techniques.
    • The main part of the research will involve exploring and experimenting with some new methodologies and algorithms.
    • The mathematics employed involves linear algebra, probability and statistics, networks and graphs, numerical analysis, and scientific computing principles. Computer programming is required.

    Presentations
    • Poster presentation, Tenth Annual North Carolina State University Undergraduate Summer Research Symposium, Tally Center, NC State University, August 1, 2012.
    • Download The Poster (pdf).

    Papers

    Photos From The Poster Presentation