In this lab you will be implementing the Version Space algorithm (see the text book and class notes for more details on this algorithm). You should do this making use of the DataSet class you implemented in program 1.
Your code should accept the prefix of a C4.5 data set (the parts before the .names and .data extensions) and read in that data set into your DataSet class. You should then the version space algorithm on this data showing the G and S sets before the algorithm starts and then the G and S sets after each data point is incorporated (assume that the data is presented in order from the first point in the file). Note that you may terminate early if the version space becomes empty.
A sample run of your version space program should produce something like the following:
% vs_run table2.1 For positive example Example_0 Sky=Sunny AirTemp=Warm Humidity=Normal Wind=Strong Water=Warm Forecast=Same G: { <?,?,?,?,?,?> } S: { <Sunny,Warm,Normal,Strong,Warm,Same> } For positive example Example_1 Sky=Sunny AirTemp=Warm Humidity=High Wind=Strong Water=Warm Forecast=Same G: { <?,?,?,?,?,?> } S: { <Sunny,Warm,?,Strong,Warm,Same> } For negative example Example_2 Sky=Rainy AirTemp=Cold Humidity=High Wind=Strong Water=Warm Forecast=Change G: { <Sunny,?,?,?,?,?> <?,Warm,?,?,?,?> <?,?,?,?,?,Same> } S: { <Sunny,Warm,?,Strong,Warm,Same> } For positive example Example_3 Sky=Sunny AirTemp=Warm Humidity=High Wind=Strong Water=Cool Forecast=Change G: { <?,Warm,?,?,?,?> <Sunny,?,?,?,?,?> } S: { <Sunny,Warm,?,Strong,?,?> }
Print out a copy of all of your code files. You should hand in printouts demonstrating how your program works by running your program on several data sets (including the one above). You should also run your program on the Table 2.1 data, Table 3.2 data and on your own data set (removing any continuous features). In all likelihood, your data set will cause the version space to become empty (hand in print outs demonstrating this).
You should also write up a short report (at least one page, no more than three) discussing your design decisions in implementing the Version Spaces code and how your version of the code works.
You must also submit your code electronically. To do this go to the link https://webapps.d.umn.edu/service/webdrop/rmaclin/cs8751-1-s2003/upload.cgi and follow the directions for uploading a file (you can do this multiple times, though it would be helpful if you would tar your files and upload one file archive).
To make your code easier to check and grade please use the following procedure for collecting the code before uploading it:
rmaclin/prog02_ccNote that the suffix of all C++ code files (not .h files) should be ".cc". Only code files (for example, in C++, only .cc and .h files) should be stored in this directory.
tar cf prog02.tar login/prog02_PLcode