Computer Science 8751
Machine Learning
Ideas for Projects
As part of your class grade, you must complete a class project that involves
some significant application of machine learning.
I envision three types of possible projects:
- Implementing some ML algorithm not already part of WEKA. In this type of project you would select an algorithm that is not part of WEKA and write code for that algorithm in WEKA, then perform tests on a set of datasets to verify your work. We would need to discuss
- Significant tests on a large dataset related to your thesis. This would involve significant ML work on same dataset created for your thesis. We would need to discuss task representations to explore, algorithms to try, and testing.
- (Group project) Participate in a large scale empirical testing project employing WEKA. In this project, the students as a group would perform a large scale empirical analysis of the classification (and if possible, regression) algorithms in WEKA. Key aspects:
- Data set selection / feature design - students involved in this aspect would work through all of the UCI datasets identifying which datasets are appropriate for WEKA and which are not. The students would then be required to set up appropriate feature representations (each dataset may have several). The datasets would be rerepresented in WEKA format and the feature representations documented and annotated.
- N-fold cross validation control - one student would be tasked to investigate the random generation of folds in the WEKA method to determine if we can control the folds from within or without, and if from without, generate a fold structure to allow the same set of folds to be reused.
- Experimental script generation - students would be responsible for generating a set of scripts that would perform the various tests called for in this work. This would include generating scripts that could be attached to the web site to repeat tests by users at remote sites (downloading appropriate test data and info).
- Bias-variance estimation - several students would be involved in generating test code for estimating Bias/Variance values using Bauer and Kohavi's method.
- Web site implementation - at least one student would be involved in creating a web site for distributing (and graphing) the information.
- Experiment performance - all students would be responsible for some of the testing to be performed.
Desired experiments:
- Five 2-fold cross validation experiments
- Twenty 10-fold cross validation experiments (with error bars)
- One hundred 10-fold cross validation experiments (with error bars)
- Bias/variance estimates using the Bauer/Kohavi method