Until now we have used different types of sets to store values, as lists and trees. The following table shows the running time associated with different operations over this structures:
Set type | Insert | Delete | Member | |
---|---|---|---|---|
Link list | O(1) | O(n) | O(n) | |
Red Black trees | O(log n) | O(log n) | O(log n) |
We are interested in improving this results. For that, we will introduce a structure that will take time O(1) in all the above operations.
The basic idea is to define a Map as a set of (key,value) pairs. Map is nothing else but a partial function from keys to values. When the keys are string we say that this map is a Dictionary.
A mutable map is a map where an element (key-value pair) can be removed or changed after it was inserted. We call this kind of maps a Hash Table.
The running time is obtained by exploiting the fact that arrays have O(1) access to any position. We define a bucket as a block of this array where we can store one element of the map.
As we are storing our elements in an array, we would like to compute an index from the key
of the element. This index will allow us to choose where to store the element in the array. The function that computes this indexes is called a hash function .
What if the hash function returns the same index for two different keys? This is a case where there is a conflict, it generally happens in one of the following situations:
Let m be the size of the Hash Table, and n the size of the set, we define the load factor lf = n/m
as the average of number of elements per bucket. A big load factor will become into this problem.
A good behaved function ideally produces indexes for buckets uniformly at random. For instance, if the key is a string, a bad behaved function will be to use the length of the string, obviously this will create many conflicts.
There are many ways to solve conflicts. A simple approach is to store a list of elements on each bucket, but if the load factor is too high, then the structure will start behaving like a linked list, decreasing the performance we were looking for.
There are some hash functions frequently used. For instance modular hashing that takes a integer key and produces the modulus on a base m = 2p
. Multiplicative hashing with a integer key, computes k*m/2p mod 2g
, with an appropriate choice of p and q. These functions in general are well behaved.
This could work to define a top level environment, but how do we model nested environments? For instance, if we have
val x = 2; let x = 4 in x end;
We could do this by creating a list of values in each bucket. Inserting and deleting them as we enter and exit the scope of local environments.
On next lecture we will look closely to several implementations for this problem.
CS312 � 2002 Cornell University Computer Science |