UNIVERSITY OF MINNESOTA--DULUTH

COMPUTER SCIENCE DEPARTMENT

CS 4611: DATABASE MANAGEMENT SYSTEMS

Assignment 2: B+ Tree Assignment
Due: Tuesday, November 2, 1999
70 points
Instructor: Rich Maclin

Introduction

In this assignment, you will implement a B+ tree in which leaf level pages contain entries of the form <key, rid of a data record> (Alternative 2 for data entries, in terms of the textbook.) You must implement the full search and insert algorithms as discussed in class. In particular, your insert routine must be capable of dealing with overflows (at any level of the tree) by splitting pages; as per the algorithm discussed in class, you will not consider re-distribution. For this assignment, you can deal with deletes by simply marking the corresponding leaf entry as `deleted'; you do not have to implement merging.

Your page classes will be based on the structures HFPage and SortedPage. The structure files for these classes can be found in:

	/usr/local/minibase/minibase-2.0/include

SortedPage is derived from HFPage, and it augments the insertRecord method of HFPage by storing records on the HFPage in sorted order by a specified key value. The key value must be included as the initial part of each record, to enable easy comparison of the key value of a new record with the key values of existing records on a page. The documentation available in the header files is sufficient to understand what operation each function performs.

You need to implement two page-level classes, BTIndexPage and BTLeafPage, both of which are derived from SortedPage. These page classes are used to build the B+ tree index; you will write code to create, destroy, open and close a B+ tree index, and to open scans that return all data entries (from the leaf pages) which satisfy some range selection on the keys.

You will carry out this assignment in teams with the same partners as in the previous assignment.

Getting Started

You will need to copy files from the src directory for this assignment. To do this you need to follow the same steps as in the previous assignment:

  1. Make an appropriate directory to work in.
  2. Copy the file Makefile from the directory
    	/usr/local/minibase/mini_hwk/assign/BTree/src
           
    to your work directory.
  3. Execute the call:
           make setup
           
    which will copy the appropriate files.
  4. Implement the assignment, you can type make to attempt to compile your solution.

The files found in src include:

You can find other useful include files bt.h, hfpage.h, sorted_page.h, index.h,test_driver.h, btree_driver.h, minirel.h and new_error.h in /usr/local/minibase/minibase-2.0/include.

Design Overview

You should begin by (re-)reading the chapter Tree Structured Indexing of the textbook to get an overview of the B+ tree layer.

A Note on Keys for this Assignment

You should note that key values are passed to functions using void* pointers (pointing to the key values). The contents of a key should be interpreted using the AttrType variable. The key can be either a string(attrString) or an integer(attrInteger), as per the definition of AttrType in minirel.h. We just implement these two kinds of keys in this assignment. If the key is a string, it has a fixed maximum length, MAX_KEY_SIZE1, defined in bt.h.

Although the specifications for some methods (e.g., the constructor of BTreeFile) suggest that keys can be of (the more general enumerated) type AttrType, you can return an error message if the keys are not of type attrString or attrInteger.

The SortedPage class, which augments the insertRecord method of HFPage by storing records on a page in sorted order according to a specified key value, assumes that the key value is included as the initial part of each record, to enable easy comparison of the key value of a new record with the key values of existing records on a page.

B+ Tree Page-Level Classes


Figure 1. Page classes in this assignment.

These classes are summarized in Figure 1. Note again that you must not add any private data members to BTIndexPage or BTLeafPage.

For further details about the individual methods in these classes, look at the header pages for the class.

Other B+ Tree Classes


Figure 2. Layout of a BTree with one BTLeafPage.


Figure 3. Layout of a BTree with more than one BTLeafPage.

We will assume here that everyone understands the concept of B+ trees, and the basic algorithms, and concentrate on explaining the design of the C++ classes that you will implement.

A BTreeFile will contain a header page and a number of BTIndexPages and BTLeafPages. The header page is used to hold information about the tree as a whole, such as the page id of the root page, the type of the search key, the length of the key field(s) (which has a fixed maximum size in this assignment), etc. When a B+ tree index is opened, you should read the header page first, and keep it pinned until the file is closed. Given the name of the B+ tree index file, how can you locate the header page? The DB class has a method

Status add_file_entry(const char* fname, PageId header_page_num);
that lets you register this information when a file fname is created. There are methods for deleting and reading these `file entries' (<file name, header page> pairs) as well, which can be used when the file is destroyed or opened. The header page contains the page id of the root of the tree, and every other page in the tree is accessed through the root page.

Figure 2 shows what a BTreeFile with only one BTLeafPage looks like; the single leaf page is also the root. Note that there is no BTIndexPage in this case. Figure 3 shows a tree with a few BTLeafPages, and this can easily be extended to contain multiple levels of BTIndexPages as well.

IndexFile and IndexFileScan

A BTree is one particular type of index. There are other types, for example a Hash index. However, all index types have some basic functionality in common. We've taken this basic index functionality and created a virtual base class called IndexFile. You won't write any code for IndexFile. However, any class derived from an IndexFile should support ~IndexFile(), Delete(), and insert(). (IndexFile and IndexFileScan are defined in index.h).

Likewise, an IndexFileScan is a virtual base class that contains the basic functionality all index file scans should support.

BTreeFile

The main class to be implemented for this assignment is BTreeFile. BTreeFile is a derived class of the IndexFile class, which means a BTreeFile is a kind of IndexFile. However, since IndexFile is a virtual base class all of the methods associated with IndexFile must be implemented for BTreeFile. You should have copied btfile.h into your directory, as per the instructions in the "Getting Started" Section.

The methods to be implemented include:

BTreeFileScan

Finally, you will implement scans that return data entries from the leaf pages of the tree. You should create the scan through a member of BTreeFile, so that you can report an error if a BTreeFile is closed before a scan is completed.

Note that BTreeFileScans should support several kinds of range selections. These ranges are described in btfile.h.

Extra Credit

Extra credit of up to 15 points is available. However, the maximum number of points that you can score on this assignment is 80. (So the 15 extra points could be used partially to offset points you lose elsewhere on the assignment.) The main motivation for trying these additional challenges should be the opportunity to write more complete software and understand some of the finer points, rather than to score more points. Do not start on this until you have completed the basic assignment!

The tasks are listed below with the points they are assigned:

  1. 5 points: Support duplicate records by allowing records with the same key to exist on more than one page. (You should not use any overflow pages!)
  2. 10 points: Implement node redistribution and merging during deletion.