The final for 8751 will be comprehensive and out of 300 points. The format of the final is 5 definitions, each worth 12 points on the first two pages, followed by 8 pages, each with one question to give you plenty of room to write. Exam questions will be drawn from material related to your presentations, material presented after midterm and some questions each from the material covered in midterms 1 and 2. I will give one question below for each of the nine presentations made in class. Three of these questions will be repeated exactly on the final.
There will be questions covering material from the first two midterms (sample questions can be found at these links):
Additional sample questions for the remaining material covere: Sample questions from class lecture: 1. Briefly define the following terms: Linear programming Slack variable Margin (of a SVM decision surface) Support vector Domain theory Bagging Boosting Stacking Market Basket Itemset The Apriori Properties 2. Explain the fundamental difference between the Bagging and Ada-Boosting ensemble learning methods? How do these notions relate to the concept of generating a good ensemble? What are the advantages and disadvantages of each method? 3. How is a problem phrased as a linear program in a support vector machine? Give an example. What are slack variables used for and how are they represented in the linear program? 4. Explain the concept of a Kernel function in Support Vector Machines. Why are kernels so useful? What properties should a kernel have to be used in an SVM? 5. How does the Apriori algorithm learn an association rule (give the algorithm)? Give two examples of ways to speedup this algorithm. Show an example of how the algorithm works. Questions from student presentations (the questions regarding students presentations will be limited to this set): 1. In the paper on Netflix Prize prediction, Singular Value Decomposition was used to perform a feature transformation. What does SVD do and why is it so useful in cases such as these? 2. Give the DIET algorithm for performing feature weighting in nearest- neighbor algorithms. What are the advantages and disadvantages of this algorithm. 3. The Osmot system is a search engine that allows researchers to gather data about users' online behavior. In the paper presented in class, these researchers attemped to infer behavior and feedback from the searching done. Explain how they proposed to do this and indicate any potential problems you see with this approach. 4. SVMTool proposes to learn a Part-Of-Speech tagger from a dictionary and samples of statements in that language. Give five examples of the types of features SVMTool considers and samples of such features. 5. Explain how wrapper and filter algorithms work for variable elimination. How do Stracuzzi and Utgoff propose to use random samples of sets of variables to set key parameters for selecting a good sample of variables. 6. How does Semi-Supervised Support Vector Machines propose to make use of unlabeled data. How might this lead to better generalization? 7. What is a "hard" learning problem for skewing? How does skewing make it possible to effectively learn a decision tree for such a problem? 8. Caruana uses multiple metrics in evaluating supervised learning algorithms, why? Define four of the metrics Caruana used in his work and why these metrics might be of interest. 9. What is meant by saliency in Optimal Brain Damage? How is it defined? What are the potential problems with this approach?