The final for 8751 will be comprehensive and out of 300 points. The format of the final will be 6 or 7 pages similar to the midterms, where the first page will have 5 definitions and the remaining pages will each have to questions.
There will be between 170 and 220 points of questions covering material from the first two midterms (sample questions can be found at these links):
Additional sample questions for the remaining material covered: Sample questions from class lecture: 1. Briefly define the following terms: Market Basket Itemset The Apriori Properties 2. How does the Apriori algorithm learn an association rule (give the algorithm)? Give two examples of ways to speedup this algorithm. Show an example of how the algorithm works. Questions from student presentations (the questions regarding students presentations will be limited to this set): 3. What is a loss function? Give examples of three loss functions to use in hierarchical classification. What are the strengths and weaknesses of these functions? 4. How does cancer prognosis and prediction learning differ from cancer diagnosis and detection learning? What aspects of the former problem(s) make these tasks harder than the latter tasks? 5. The NEAT system uses genetic algorithms to evolve a network, explain how this works (especially how mutation might work). How does the Whiteson and Stone method alter the original NEAT system? 6. One approach to filtering spam involves compression models. Explain how this method works. How does the resulting system determine if a new message is spam or not? How does this method adapt over time? 7. Tao et al. (2007) propose a method for generating new GO terms for annotating genes based on a KNN approach. How does their method work (especially, how is similarity calculated in their method)? 8. What is a module network? Explain how a module network can be used to organize a network of variables? Give an example of how a module network might be used to model gene expression variables. 9. What is the schema matching approach? How does it apply to the general problem of question answering? Outline the Doan et al. approach to solving this problem. 10. How do Fumera et al. propose to recognize spam attached as images to email? What are the difficulties with this approach? 11. Lane and Brodley propose a method for detecting anomalous user behavior on a computer. What types of anomalies were they looking for? What behavior were they examining? What type of learning method was used in the detection process? 12. Explain how a ratio template works. How might such a template be used to recognize objects (like pedestrians)? How could such a template be used to capture motion information? How could that information be used to recognize objects? 13. How does the term version space apply in active learning as presented by Tong and Koller? How do they propose to choose queries? What are some of the difficulties of their approach? 14. What is the difference between primitive and non-primitive skills in ICARUS? Give examples of each. Explain what teleoreactive means and how it relates to the programs in ICARUS. 15. What is the difference between web content mining and web usage mining? Which of these notions is more closely related to machine learning (and why?)