- Show the initial G and S sets and the G and S sets after each of the
data points shown below is presented to the Version Space (Candidate
Elimination) algorithm: [15 points]
1 2 1 3 1 +
2 1 2 2 2 -
1 2 1 1 2 -
1 1 2 3 1 -
1 2 2 3 2 +
- For a dataset with 5 features, A, B, C, D, and E where A, B, and D have
possible values of true and false, and C and E are continuously valued
and the following examples:
A B C D E Class
-----------------------------------------------------
false true 15 false 20 + positive
true false 1 true 5 - negative
false false 10 true 10 - negative
false false 8 false 15 + positive
true true 13 true 16 + positive
false true 9 false 8 - negative
false false 1 true 5 - negative
true false 12 false 13 - negative
true true 15 false 6 + positive
true true 15 true 10 + positive
false true 13 true 7 - negative
false true 3 false 5 + positive
NOTE:
- What decision tree would be learned using ID3? [20 points]
- For each of the following data points, what class would be predicted
using the 3-Nearest Neighbor algorithm using a Manhattan distance
measure where the two continuous features are scaled to be values
between 0 and 1? [20 points]
- true true 5 true 15
- false true 3 true 9
- false false 15 false 3
- What predictions would be made using the Naive Bayes learning method
for the data points shown in the previous question? [20 points]
- Using the agglomerative single link clustering method, determine the
clusters that would be produced from the data points above
assuming we ignore the class (the - or + value), that our distance is
measured as nearest neighbor question above, and where we have the following threshold
values (two points are considered to be connected if their distance is
*less* than these thresholds): (i) 0.9, (ii) 1.9, and (iii) 2.9. [20 points]
- Consider the use of ensembles in machine learning. [30 points]
- Explain how ensembles address the issue of overfitting avoidance.
- What is one strength of bagging compared to boosting? Justify your answer.
- Give a brief argument for the use of an ensemble consisting of one decision tree, one support-vector machine (SVM), and one neural network instead of using an ensemble of three models all produced by the same learning algorithm.
- What neural network would be generated by KBANN from the following rules
assuming the output predicate is J and the input predicates are A, B, C
and D? For each unit generated you should connect it to any input unit
that it is not already directly connected to with a small weight link.
[20 points]
A, C -> E
B, not C, D -> E
E, C -> F
not A, D -> F
E, F -> G
B, not E -> G
E, not F -> H
E, G, H -> J
- Given a neural network with 3 input units (A, B, C), two hidden units
(D, E), one output unit (F) and one unit that always has an activation
value of 1 (ONE) and the following weight connections:
ONE->D: 0.0
A->D: 0.5
B->D: 0.0
C->D: -1.0
ONE->E: 0.5
A->E: 0.0
B->E: 0.5
C->E: 0.5
ONE->F: 0.0
D->F: -0.5
E->F: 0.5
What would be the weights after each of the following points is presented
(in the sequence shown) assuming a learning rate of 0.25 and a momentum
term of 0.9. Assume the hidden and output units use a sigmoidal activation
function and that the weights are changed using backpropagation. [20 points]
A B C F
Point 1: 1 0 1 1
Point 2: 0 1 1 0
Point 3: 1 1 1 1
- A key concern in supervised learning is overfitting avoidance.
Define overfitting and explain its importance. [20 points]
Discuss one key technique (two in total) for addressing the problem
of overfitting in (i) decision trees and (ii) neural networks.
- For the maze world shown in the top of the three diagrams below with actions and rewards shown in the diagram calculate the corresponding
V*(s) and Q(s,a) values assuming a discount factor of 0.8.
Assume the agent stops moving when they reach the upper right hand
state. [20 points]
- Briefly define and explain the following terms and how they are used
in support vector machines: (i) margin, (ii) kernel, and (iii) slack
variables. [15 points]
- For the following points:
A B C D class
-------------------------------------
-1 1 -1 -1 -1
1 1 1 1 1
-1 1 1 1 1
-1 -1 1 1 -1
1 -1 -1 -1 -1
Assuming a linear kernel and the use of slack variables give the set
of constraint equations generated for these points. [20 points]
-
For the Bayes network and CPTs shown below calculate the following: [20 points]
- p(e=true|a=true,b=true,c=true)
- p(d=true|e=true,b=false)
- p(e=false|a=true,c=false)
-
Define the term association rule. Give the Apriori algorithm for learning
association rules. Show an example of how the algorithm works. Give two
examples of ways to speedup this algorithm. [20 points]
-
How would you represent the solution to a regression problem in genetic
algorithms? Give an example to demonstrate your solution. Discuss what
kind of fitness function you would use and how concepts would be selected
for reproduction. [20 points]