Q: Since the index pages in the B+ tree don't store the recordID along
with the key, I don't see how we can add duplicate support. I was thinking of
using the (key, recordid) pair as a unique identifier for the inserts and
deletes. are we allowed to modify the index page class to include storage for
the recordID?
A: Yes, you could do that. Another way is to change your
insertion/deletion/search functions to assume that the key values in the index
nodes are not exact separators any more and that there might be keys with that
value also to the left.
Q: It says that we are free to add any private methods we like. But,
the BTLeaf.h file says in that no private members should be declared. Also, in
btindex.h, the file says that we can add public functions? I was wondering if
you could clarify.
A: You can add any private or public member functions, but
not any member *variables*. The main idea is that member variables change the
space available on the page. The text of the assignment has been changed
to reflect this.
Q: I know this might seem trivial, but I was having trouble finding
where the pointers to the child nodes are stored in the BTIndexNodes.
A: The
pointers to the child nodes are stored in records that you are designing. Each
record consists of a pair (key, rid). The leftmost pointer is stored in prevPage.
Q: In the B+ Tree assignment, it says we do not need to implement
redistribution. Does this mean for insertion, deletion, or both? If it includes
deletion, what happens when we delete an entry in a page and
the page goes
below the minimum occupancy, but both adjacent/sibling nodes can not take part
in a merge because there is not enough free space in those pages. Do we just
allow that page to have fewer than the
"required" minimum occupancy?
A: In
insertion there is never re-distribution. A node might split, and then its keys
are distributed, but that is not referred to as "redistribution". In deletion,
the answer to the question in your last sentence is: Yes, the page is allowed
to have fewer than minimum occupancy entries
Q: Are we allowed to add new public methods in SortedPage and
HeapPage?
A: No.
Q: Within the insert command from the command prompt, when you type
"insert 10 20", it inserts 10 items in the range of 10 through 20, but it uses
random keys between 10 and 20. This means there can be duplicate keys. Since you
said we don't have to worry about duplicates, should we just use "insert 10 10"
to ensure that there are no duplicates? In class you said we should just alter
out test cases, and I wasn't sure if this is what you meant.
A: You can
modify yourself the actual function insertHighLow to make sure that no duplicate key
values are inserted. There are several possible ways of doing that (e.g.,
keeping them in some local static data structure).
Q: I was just wondering about the implementation (or lack thereof) of
duplicates. if we decide not to implement duplicates initially, do we modify our
code to resist duplicate entries or should we assume the test program does not
implement duplicates.
A: You don't have to resist duplicates. Just assume
that the test program does not produce duplicates. Of course, you will have
to modify yourself the actual function insertHighLow (in the file
btreetest.cpp) to make sure that no duplicate records are inserted.
Q: Since the index pages in the B+ tree don't store the recordID along
with the key, I don't see how we can add duplicate support. I was thinking of
using the (key,recordid) pair as a unique identifier for the inserts and
deletes. are we allowed to modify the index page class to include storage for
the recordID?
A: Yes, you could do that. Another way is to change your
insertion/deletion/search functions to assume that the key values in the index
nodes are not exact separators any more and that there might be keys with that
value also to the left.
Q (10/25): For the BTreeFile constructor, when we are creating the root node,
should the root node be a leaf node or an index node? If it's an index node,
what should the key value be? If it's a leaf node, does that mean it's okay to
have root->leaf, and having no index?
A: If there is only the root node,
the node is of type leaf node.
Q (10/25): For the constructor of BTreeFile, we opened the filename that was
passed in, and fclosed this in the destructor, we were hoping that one of the
methods would write something to this file, but nothing happened. Is there a
method somewhere that does this fopen for us?
A: Use MINIBASE->GetFileEntry() to get the id of the root page, and then pin the
page into the buffer. Do not use fopen etc. The lower layer is taking care of
this. Look at the methods in db.h.
Q: Regarding the destructor for BTreeFile, how do we close the index
file?
A: Just unpin all pages. Be careful to update the entry for MINIBASE_DB
when the root changes (does not have to be in the destructor, but you have to
take care of it somewhere).
none yet...
Q: What do we do in the case when the index node has one entry left (19) and we delete (21). Then we need to merge the bottom 2 leaf nodes to the left, and then remove (19) according to the delete algorithm. This will cause the problem of having an index node with no key value; meaning we cannot insert/retrieve (20) afterwards
....
/ \
25 (full)
/
19
/
\
(,,,) (20,21,,)
*** becomes
***
...
/ \
25
(full)
/
(?)
A: In this case you have to leave one empty leaf node in the tree since otherwise the 20 can not be found. This is one of the few special cases that come up since you do not have to implement full re-distribution. Make reasonable decisions for these special cases, but keep in mind that the tree should balanced (e.g., there will never be a level with with leaf and index nodes).
Q: What is the minimum occupancy? Is it half the maximum
capacity?
A: There is a function in BTLeafPage and BTIndexPage called "IsAtLeastHalfFull" that you should
use.
Q: If, on delete, a page goes below minimum occupancy, don't we have to pull
a record from one of the siblings (as opposed to just leaving it like the
response to the FAQ said)?
A: No, you don't have to pull from the siblings.
Just check whether you can merge with any of the two siblings that have the
*same* parent. If that fails, leave the leaf page the way it is.
Q: I have a question concerning delete. When the case arises when we
need to merge the current node with a sibling, we first get the sibling using
GetPrevPage() or GetNextPage(). The next step is to make sure that sibling has
the same parent. My question is about the best way to do this. Our
implementation is similar to the books so if we are at the leaf level, we will
have access to the PageID of the current leaf node, and that leaf's parent
PageID (as well as the PageIDs for the left and right sibling). I guess we could
just call "GetFirst" and "GetNext" to scan the parent and see if the left and
right siblings are also children of that parent (by comparing pageIDs). Does
this sound acceptable or is there something that I'm missing?
A: You could
pass the parent pageid as a parameter to the node that you are deleting from.
Then if a merge should occur, you can use that parent pointer to find a
sibling.
Q: We don't understand what OpenScan has to initialize.Since the high
key and low key pointers are passed,what are we exactly supposed to do?
A:
The function has to initialize some member variables to keep track of the
current state (including the high key and the low key), so that you can actually
perform the scan using GetNext().
Q (11/1): In the test program, if we type in "delete 4 4", the programs does
Open Scan(4,4) and delete 4 from the tree, but it then tries to OpenScan(4,4)
again. In our implementation, we return NULL if the scan will not work later.
For example, the second call to OpenScan(4,4) returns NULL because there is no
value in the tree between 4 and 4. However, the test program says there is an
error deleting, whereas it does work correctly but our implementation does not
return an IndexFileScan that returns DONE on its first call. Should we make it
so our implementation does return an IndexFileScan that returns DONE on its
first call or is it fine as it is, returning an invalid IndexFileScan because it
won't work later?
A: The upper layer assumes that a scan object will always be
created, so you have to return an object in any case.
Note: You have to return an object in BTreeFile::OpenScan, even if there are no entries in the index for the range specified.
Q: When we are implementing the FileScan class, we take in a lowKey
and a highKey. Since the scan must start at lowKey for the first call to GetNext(), what should we do when there is no exact key match with
lowKey? I
would assume that we should just "start" at the next higher key, but I wanted to
check :). Thanks!
A: That is correct.
Q: If our tree contains 4,5,7,9,13, what should be the values returned
by GetNext if I call OpenScan(6,10)? What if I call OpenScan (10,12)?
A:
First question: 7 and 9. Second question: The first call to GetNext returns
DONE. From the user's point of view, it is better to return an object that
returns DONE when GetNext is called, since nothing is wrong with the filescan;
just there is nothing in that range. You have to return an object, even if
there is nothing in the range specified.
Q: In BTreeFileScan::DeleteCurrent(), do we have to do a deletion
with merging as in BTFile::Delete?
A: No, just delete the record from the
page. You are welcome to implement merging as well to avoid some special cases,
but you do not have to.
Note: BTreeFileScan::DeleteCurrent should always return OK, even if the record that has been deleted was the last record in the specified range. If DeleteCurrent is called the second time at the current position without a call to GetNext() in between, return FAIL.