In this part of the assignment you will build on your buffer manager and add two critical layers to your DBMS. In this assignment you will add a File Manager layer that creates B+ tree files. You will use this representation as the primary representation for each file (we will be using Alternative 1 representations for both). There will be no separate indexes. You will also be implementing a system catalog. This catalog will be implemented as a set of relations stored as relation files. You will have a relation describing your relations with at least the relation name, how many attributes the relation has and how many index attributes it has. You will also have a relation describing the relation attributes with the relation an attribute belongs to, the name of the attribute, the type of the attribute (int, float or string with a size), the number (in order for the tuple) of the attribute and the size if a string attribute. Finally, you will have a relation describing the index attributes with at least the relation each index attribute belongs to.
To test and assess your code you will be connecting your routines to a set of commands in a simple query language. To set this up you will need to download the the file fc.tar.Z. This is a tared archive file. To unpack this file you should do the following:
% uncompress fc.tar % tar xvf fc.tar
This will create a directory FileCat containing a set of provided code. You should add your implementation of buf.C and buf.h to this directory and then try compiling. The resulting code is called as follows:
querycmd [gory] [db] [bm] [hf] [qe] [dbfile]
The options gory, db, bm, hf and qe set debugging flags as in the BufMgr which you may take advantage of. If you include a filename at the end of the querycmd line the program will attempt to open a database (that we have previously created) with that name, and this file will then be updated and saved when your session ends (making it possible to use the file again in the future).
Note that if you start up a session without providing a db file name the code will create a file named __minidb_file. This is important because if your code unexpectedly terminates before completing the resulting file will likely be left in a state such that it can no longer be read in (and you will have to delete the file).
The query language is a very simple query language designed for this project. Note that there is currently no error checking (in large part because many errors can only be checked using the code you provide). The following commands are provided in the language (and should be completed in this project):
CREATE RELATION relID ( ID : type1 , ID : type2 , ... ) PRIMARYKEY ( PID1 , PID2 , ... ) ;
This requests the DB to create a relation named relID with attributes named ID1,ID2,... with types type1,type2,... where the type string can be INT, FLOAT, or STRING(num) where num is an integer indicating the size of the string. The names of the relation (and all other names in the query language must start with a letter and can be followed by an arbitrary number of letter, digit and underscore characters.
When creating a relation, you may also list a subset of the attributes that are to be used as a key.
Your code should check all appropriate errors (does the relation name already exist, is an attribute name inappropriate, is a string size inappropriate, does an attribute name occur more than once, does each primary key attribute correspond to an actual attribute, is any primary key attribute used twice, etc.) and then make appropriate entries into the system catalog and set up the initial header page for the file.
Delete the requested relation (get rid of the relation file and the entries in the system catalog).
Delete all of the tuples matching the condition cond from the relation named relname. In order to be able to refer to fields of the relation you will supply after the relation name the name of a variable that can be used in the condition to specify attributes.
The following are legal conditions:
For evaluation purposes, a condition corresponding to an integer is false if the value is 0 and true otherwise, for a float, false if the value is 0.0 and true otherwise and for a string, false if the string is the empty string "" and true otherwise. Note also that float literals can have an optional minus sign at the start and then follow that with one or more digit characters followed by a period followed by one or more digit characters.
Insert tuples into the relation relname from the file
Insert a tuple into the relation relname with values value1,value2,... Note that the values must appear in the order of the attributes (and their types should match).
Print out the three relations of the system catalog in a nice tabular format (as in SQL).
Print out the tuples of the relation relname in a nice tabular format.
Print out the contents of the system catalog relations in a format showing the details of the implementation of that relation by page (show each page of each relation, what data is contained on that page, etc.).
Same as DUMP CATALOG but for named relations.
The query language also contains hooks for PROJECT, SELECT and JOIN queries, we will be completing these in the next section.
In order to make it impossible for users to change the system catalog relations directly you should give these relations names that could not be generated by the user (the easiest way is to start the name with an underscore character). The system catalog relations should be stored as relation files and searched appropriately.
For the B+ tree files you should have a base page with simple information on it. This should include the page # of the root page, the page # of the first actual data page and the page # of the last actual data page (remember that the data pages are in some sense a sorted list of the data).
Each data page should devote the first X byes of data to header information. This would include the previous and next page numbers, the parent page (in the non-data pages) and a slot directory (you should assume tuples are of a fixed size). The slot directory should have a number indicating the number of slots followed by the actual slot directory with one bit for each record slot. You may also want to retain a count of the number of slots currently full (so you can decide when you need to split or merge as the case may be).
Each tree page should also have some header information. This would include a pointer (page #) of the parent page as well as a number indicating how many of the slots on the page are in use. Note that as with the data pages, you will need to calculate how many items can fit. Each item on a tree page requires enough bytes to hold the primary key values and enough bytes for a page pointer (a page number of a child). Note that you also need one extra child pointer.
One tactic you may employ is anticipatory splitting on insertion. Under this variation, if you are performing an insert, as you search the tree, you may split any node that is full into two half-full nodes as you move down the tree looking for the appropriate place to insert the new tuple. Under this approach all insertions should proceed strictly top down - you should not need to move back up the tree.
For deletion you may want to consider performing a similar operation (anticipatory merging), but you may also want to simply perform normal deletion.
When processing a command you should start by figuring out the relation or relations needed. To begin to work with these relations you should get the first (header) page of each relation (use the db command get_file_entry) to get the first page of the file corresponding to that relation. Then process the file as needed.
For example, to process an insert tuple command, you will first need to verify that the relation exists (either by looking it up in the db using get_file_entry or by looking it up in the relation relation). Then you need to go through the relation relation to find out how many attributes the relation has and the format of its file, then through the attribute relation to find the names of the attributes and then look through the index relation to find the names of the index attributes. Then you can check the validity of the insert fields.
You may want to create one (or two) scanning mechanisms to connect to each file (these will be useful in the next part of the DBMS). For a scan you would initialize the scan to return the first record from the file. Then everytime you tug on the scan you will be given the next record from the file.
Print out your versions of all the code you add. You should test your code by showing examples of the various query commands. Note that you should test your code extensively considering many possible combinations of commands. You should hand in copies of all of your tests. Next, write up a team report of how your code is implemented. This report should give an overview of how you completed the various sections of this project. It should also discuss the algorithms you used to solve the problem. This report should be at least two pages long but no longer than four pages. Each team member should also write up an individual report (at least half a page but no more than a page) discussing their contributions to the coding process and how the overall team interaction went.
You must also submit your code electronically (but only once for each team). To do this go to the link https://webapps.d.umn.edu/service/webdrop/rmaclin/cs4611-1-f2004/uploa d.cgi and follow the directions for uploading a file.
To make your code easier to check and grade please use the following procedure for collecting the code before uploading it:
rmaclin/prog04Note that the suffix of all C++ code files (not .h files) should be ".C". Only code files (only rules files, .c, .C, and .h files) should be stored in this directory.
tar cf prog04.tar login/prog04