General Questions

Q: Since the index pages in the B+ tree don't store the recordID along with the key, I don't see how we can add duplicate support. I was thinking of using the (key, recordid) pair as a unique identifier for the inserts and deletes. are we allowed to modify the index page class to include storage for the recordID?
A: Yes, you could do that. Another way is to change your insertion/deletion/search functions to assume that the key values in the index nodes are not exact separators any more and that there might be keys with that value also to the left.

Q: It says that we are free to add any private methods we like. But, the BTLeaf.h file says in that no private members should be declared. Also, in btindex.h, the file says that we can add public functions? I was wondering if you could clarify.
A: You can add any private or public member functions, but not any member *variables*. The main idea is that member variables change the space available on the page. The text of the assignment has been changed to reflect this.

Q: I know this might seem trivial, but I was having trouble finding where the pointers to the child nodes are stored in the BTIndexNodes.
A: The pointers to the child nodes are stored in records that you are designing. Each record consists of a pair (key, rid). The leftmost pointer is stored in prevPage.

Q: In the B+ Tree assignment, it says we do not need to implement redistribution. Does this mean for insertion, deletion, or both? If it includes deletion, what happens when we delete an entry in a page and
the page goes below the minimum occupancy, but both adjacent/sibling nodes can not take part in a merge because there is not enough free space in those pages. Do we just allow that page to have fewer than the
"required" minimum occupancy?
A: In insertion there is never re-distribution. A node might split, and then its keys are distributed, but that is not referred to as "redistribution". In deletion, the answer to the question in your last sentence is: Yes, the page is allowed to have fewer than minimum occupancy entries

Q: Are we allowed to add new public methods in SortedPage and HeapPage?
A: No.

Handling (or Not) Duplicates

Q: Within the insert command from the command prompt, when you type "insert 10 20", it inserts 10 items in the range of 10 through 20, but it uses random keys between 10 and 20. This means there can be duplicate keys. Since you said we don't have to worry about duplicates, should we just use "insert 10 10" to ensure that there are no duplicates? In class you said we should just alter out test cases, and I wasn't sure if this is what you meant.
A: You can modify yourself the actual function insertHighLow to make sure that no duplicate key values are inserted. There are several possible ways of doing that (e.g., keeping them in some local static data structure).

Q: I was just wondering about the implementation (or lack thereof) of duplicates. if we decide not to implement duplicates initially, do we modify our code to resist duplicate entries or should we assume the test program does not implement duplicates.
A: You don't have to resist duplicates. Just assume that the test program does not produce duplicates. Of course, you will have to modify yourself the actual function insertHighLow (in the file btreetest.cpp) to make sure that no duplicate records are inserted.

Q: Since the index pages in the B+ tree don't store the recordID along with the key, I don't see how we can add duplicate support. I was thinking of using the (key,recordid) pair as a unique identifier for the inserts and deletes. are we allowed to modify the index page class to include storage for the recordID?
A: Yes, you could do that. Another way is to change your insertion/deletion/search functions to assume that the key values in the index nodes are not exact separators any more and that there might be keys with that value also to the left.

BTreeFile Questions

BTreeFile Constructor/Destructor

Q (10/25): For the BTreeFile constructor, when we are creating the root node, should the root node be a leaf node or an index node? If it's an index node, what should the key value be? If it's a leaf node, does that mean it's okay to have root->leaf, and having no index?
A: If there is only the root node, the node is of type leaf node.

Q (10/25): For the constructor of BTreeFile, we opened the filename that was passed in, and fclosed this in the destructor, we were hoping that one of the methods would write something to this file, but nothing happened. Is there a method somewhere that does this fopen for us?
A: Use MINIBASE->GetFileEntry() to get the id of the root page, and then pin the page into the buffer. Do not use fopen etc. The lower layer is taking care of this. Look at the methods in db.h.

Q: Regarding the destructor for BTreeFile, how do we close the index file?
A: Just unpin all pages. Be careful to update the entry for MINIBASE_DB when the root changes (does not have to be in the destructor, but you have to take care of it somewhere).

BTreeFile::Insert()

none yet...

BTreeFile::Delete()

Q: What do we do in the case when the index node has one entry left (19) and we delete (21). Then we need to merge the bottom 2 leaf nodes to the left, and then remove (19) according to the delete algorithm. This will cause the problem of having an index node with no key value; meaning we cannot insert/retrieve (20) afterwards

            ....
          /      \
        25      (full)
       /
     19
    /   \
(,,,)   (20,21,,)

*** becomes ***
              ...
           /      \
        25      (full)
        /
     (?)

A: In this case you have to leave one empty leaf node in the tree since otherwise the 20 can not be found. This is one of the few special cases that come up since you do not have to implement full re-distribution. Make reasonable decisions for these special cases, but keep in mind that the tree should balanced (e.g., there will never be a level with with leaf and index nodes).

Q: What is the minimum occupancy? Is it half the maximum capacity?
A: There is a function in BTLeafPage and BTIndexPage called "IsAtLeastHalfFull" that you should use.

Q: If, on delete, a page goes below minimum occupancy, don't we have to pull a record from one of the siblings (as opposed to just leaving it like the response to the FAQ said)?
A: No, you don't have to pull from the siblings. Just check whether you can merge with any of the two siblings that have the *same* parent. If that fails, leave the leaf page the way it is.

Q: I have a question concerning delete. When the case arises when we need to merge the current node with a sibling, we first get the sibling using GetPrevPage() or GetNextPage(). The next step is to make sure that sibling has the same parent. My question is about the best way to do this. Our implementation is similar to the books so if we are at the leaf level, we will have access to the PageID of the current leaf node, and that leaf's parent PageID (as well as the PageIDs for the left and right sibling). I guess we could just call "GetFirst" and "GetNext" to scan the parent and see if the left and right siblings are also children of that parent (by comparing pageIDs). Does this sound acceptable or is there something that I'm missing?
A: You could pass the parent pageid as a parameter to the node that you are deleting from. Then if a merge should occur, you can use that parent pointer to find a sibling.

BTreeFileScan Questions

OpenScan()

Q: We don't understand what OpenScan has to initialize.Since the high key and low key pointers are passed,what are we exactly supposed to do?
A: The function has to initialize some member variables to keep track of the current state (including the high key and the low key), so that you can actually perform the scan using GetNext().

Q (11/1): In the test program, if we type in "delete 4 4", the programs does Open Scan(4,4) and delete 4 from the tree, but it then tries to OpenScan(4,4) again. In our implementation, we return NULL if the scan will not work later. For example, the second call to OpenScan(4,4) returns NULL because there is no value in the tree between 4 and 4. However, the test program says there is an error deleting, whereas it does work correctly but our implementation does not return an IndexFileScan that returns DONE on its first call. Should we make it so our implementation does return an IndexFileScan that returns DONE on its first call or is it fine as it is, returning an invalid IndexFileScan because it won't work later?
A: The upper layer assumes that a scan object will always be created, so you have to return an object in any case.

Note: You have to return an object in BTreeFile::OpenScan, even if there are no entries in the index for the range specified.

GetNext()

Q: When we are implementing the FileScan class, we take in a lowKey and a highKey. Since the scan must start at lowKey for the first call to GetNext(), what should we do when there is no exact key match with lowKey? I would assume that we should just "start" at the next higher key, but I wanted to check :). Thanks!
A: That is correct.

Q: If our tree contains 4,5,7,9,13, what should be the values returned by GetNext if I call OpenScan(6,10)? What if I call OpenScan (10,12)?
A: First question: 7 and 9. Second question: The first call to GetNext returns DONE. From the user's point of view, it is better to return an object that returns DONE when GetNext is called, since nothing is wrong with the filescan; just there is nothing in that range. You have to return an object, even if there is nothing in the range specified.

DeleteCurrent()

Q: In BTreeFileScan::DeleteCurrent(), do we have to do a deletion with merging as in BTFile::Delete?
A: No, just delete the record from the page. You are welcome to implement merging as well to avoid some special cases, but you do not have to.

Note: BTreeFileScan::DeleteCurrent should always return OK, even if the record that has been deleted was the last record in the specified range. If DeleteCurrent is called the second time at the current position without a call to GetNext() in between, return FAIL.