CS 5751 Spring 2001 Take Home Final Exam

CS 5751 Take Home Final

Due May 3rd by noon -- NO LATE EXAMS ALLOWED

To take this final you must signup in class or see me (Rich) by April 26

You must submit a signed copy of the following page with your exam.

YOU MUST SHOW YOUR WORK TO RECEIVE FULL CREDIT!!!

For the following dataset (in C4.5 format):
```
   data.names:
     no, yes. | classes

     a: true, false
     b: true, false
     c: continuous
     d: true, false
     e: continuous

   data.data:
     false,true,15,false,20,yes
     true,false,1,true,5,no
     false,false,10,true,10,no
     false,false,8,false,15,yes
     true,true,13,true,16,yes
     false,true,9,false,8,no
     false,false,1,true,5,no
     true,false,12,false,13,no
     true,true,15,false,6,yes
     true,true,15,true,10,yes
     false,true,13,true,7,no
     false,true,3,false,5,yes
   
```
- What decision tree would be learned using ID3? [15 points]
- What rules would be learned using the sequential covering algorithm assuming we start from the most general rules and we use the adaptation discussed in the homework to allow the rules to incorporate continuous features? [15 points]
- For each of the following data points, what class would be predicted using the 3-Nearest Neighbor algorithm using a Manhattan distance measure where the two continuous features are scaled to be values between 0 and 1? [15 points]
  - true,true,10,true,15 class?
  - false,true,3,true,9 class?
  - false,false,15,false,3 class?
- What predictions would be made using the Naive Bayes learning method for the data points shown in the previous section? [15 points]
- Using the agglomerative single link clustering method, determine the clusters that would be produced from the data points in data.data assuming we ignore the class (the no or yes value), that our distance is measured as nearest neighbor question above, and where we have the following threshold values (two points are considered to be connected if their distance is *less* than these thresholds):
  - Threshold=1.2 [5 points]
  - Threshold=2.2 [5 points]
  - Threshold=3.2 [5 points]
Describe the inductive bias for each of the following methods:
- Version Spaces [5 points]
- Decision Trees produced using ID3 [5 points]
- Neural Networks using Backpropagation [5 points]
One mechanism proposed for learning not only the weights of a neural network but also the structure (i.e., number of hidden units, connection patterns of units, etc.) of the neural network involves genetic programming. A set of random neural networks are generated and then trained using backpropagation on a set of training data. Then the "fittest" of the networks reproduce (as if the networks were genetic programs) and the resulting new networks are trained again. Fill in the details of this algorithm. Explain how the term fittest might be determined. Also, indicate how networks can reproduce to generate new networks that may have different network structures. Can you see any possible problems with this algorithm? [20 points]
How does the Minimum Description Length principle relate to the notion of Occam's razor? How could one incorporate MDL into the process of selecting a particular hypothesis (i.e., preferring a decision tree, a neural network, etc.)? [15 points]
Assume a problem has 9 states numbered 0 to 8. Further assume that in each state one can take one of two actions, L or R. In states 1 to 7, taking the action L in state I puts you in state I-1 (taking action L in state 5 puts you in state 4). In state 0 taking action L leaves you in state 0. In state 8, taking action L leaves you in state 8. For each of the states 0 to 7, taking action R in state I puts you in state (I + 3) mod 9 (e.g., taking R in state 2 puts you in state 5, taking R in state 7 puts you in state 1, etc.). Taking action R in state 8 leaves you in state 8. Any action that takes you from a different state and leaves you in state 8 has a reward of 100. All other actions have a reward of 0. Assume a discount factor of 0.8. Draw the state diagram for this problem. Show the V*(s) values for these states. Show the optimal Q(s,a) values for this problem. Show an optimal policy for this problem. [20 points]
For the following EBL domain theory:
```
   A(?x,?y), B(?y,?x,?z) -> C(?x,?y,?z)
   D(?x,?y), E(?x,?y) -> B(?x,1,?y)
   F(?x,?x), F(?y,?y) -> D(?x,?y)
   
```
Assume the following facts are asserted:
```
   A(1,2)  A(3,1)  E(2,3)  F(3,2)  F(2,2)
   A(2,2)  A(3,2)  E(1,2)  F(3,3)  F(2,3)
   
```
Explain with a proof tree that C(1,2,3) is true. [3 points]
Assuming the predicates A, E, and F are operational, what rule would EGGS learn? [12 points]
Assuming that predicate B is also operational, what rule would be learned? [5 points]
Explain the fundamental difference between the Bagging and Ada-Boosting ensemble learning methods? How do these notions relate to the concept of generating a good ensemble? What are the advantages and disadvantages of each method? [15 points]
What neural network would be generated by KBANN from the following rules assuming the output predicate is J and the input predicates are A, B, C and D? For each unit generated you should connect it to any input unit that it is not already directly connected to with a small weight link. [20 points]
```
   A, B -> E
   A, C, D -> E
   E, A -> F
   not B, D -> F
   not E, not F -> G
   A, F -> G
   E, not F -> H
   E, G, H -> J
   
```
Given a neural network with 3 input units (A, B, C), two hidden units (D, E), one output unit (F) and one unit that always has an activation value of 1 (ONE) and the following weight connections:
```
   ONE->D: 0.5
   A->D: -0.5
   B->D: 0.0
   C->D: 1.0
   ONE->E: -0.5
   A->E: 0.0
   B->E: 0.0
   C->E: 0.5
   ONE->F: 0.0
   D->F: -0.5
   E->F: 0.5
   
```
What would be the weights after each of the following points is presented (in the sequence shown) assuming a learning rate of 0.25 and a momentum term of 0.9. Assume the hidden and output units use a sigmoidal activation function and that the weights are changed using backpropagation. [15 points]
```
              A B C   F
   Point 1:   1 0 1   1 
   Point 2:   0 1 1   0
   Point 3:   1 1 1   1
   
```
Show the initial G and S sets and the G and S sets after each of the data points shown below is presented to the Version Space (Candidate Elimination) algorithm: [15 points]
```
   1,2,1,3,1,+
   2,1,2,2,2,-
   1,2,1,1,2,-
   1,1,2,3,1,-
   1,2,2,3,2,+
   
```
What do you think is the most interesting algorithm we discussed in class and why? [10 points]
What do you think is the most important issue facing machine learning and why? [10 points]