Next:
Miss Rate Up:
Performance Evaluation Previous:
Performance Evaluation
Before presenting the analysis, we describe the experimental setup. Our workloads are based on the OO7 benchmark [CDN94]; this benchmark is intended to match the characteristics of many different CAD/CAM/CASE applications, but does not model any specific application. The OO7 database contains a tree of assembly objects, with leaves pointing to three composite parts chosen randomly from among 500 such objects. Each composite part contains a graph of atomic parts linked by connection objects; each atomic part has 3 outgoing connections. The small database has 20 atomic parts per composite part; the medium has 200. In our implementation, the small database takes up 4.2 MB and the medium database takes up 37.8 MB.
The objects in the databases are clustered into 8 KB pages using time of creation as described in the OO7 specification [CDN94]. The databases were stored by a server on a Seagate ST-32171N disk, with a peak transfer rate of 15.2 MB/s, an average read seek time of 9.4 ms, and an average rotational latency of 4.17 ms [Sea97].
The databases were accessed by a single client. Both the server and the client ran on DEC 3000/400 Alpha workstations, each with a 133 MHz Alpha 21064 processor, 128 MB of memory and OSF/1 version 3.2. They were connected by a 10 Mb/s Ethernet and had DEC LANCE Ethernet interfaces. The server had a 36 MB cache (of which 6 MB were used for the modified object buffer); we experimented with various sizes for the client cache.
The experiments ran several database traversals that are described below. Both the C code generated by the Theta compiler for the traversals and the system code were compiled using GNU's gcc with optimization level 2.
The OO7 benchmark defines several database traversals; these perform a depth-first traversal of the assembly tree and execute an operation on the composite parts referenced by the leaves of this tree. Traversals T1 and T6 are read-only; T1 performs a depth-first traversal of the entire composite part graph, while T6 reads only its root atomic part. Traversals T2a and T2b are identical to T1 except that T2a modifies the root atomic part of the graph, while T2b modifies all the atomic parts. Note that T6 accesses many fewer objects than the other traversals.
In general, some traversals will match the database clustering well while others will not, and we believe that on average, one cannot expect traversals to use a large fraction of each page. For example, Tsangaris and Naughton [TN92] found it was possible to achieve good average use by means of impractical and expensive clustering algorithms; an O (n2.4) algorithm achieved average use between 17% and 91% depending on the workload, while an O (n log n) algorithm achieved average use between 15% and 41% on the same workloads. Chang and Katz [CK89] observed that real CAD applications had similar access patterns. Furthermore, it is also expensive to collect the statistics necessary to run good clustering algorithms and to reorganize the objects in the database according to the result of the algorithm [MK94][GKM96]. These high costs bound the achievable frequency of reclusterings and increase the likelihood of mismatches between the current workload and the workload used to train the clustering algorithm; these mismatches can significantly reduce the average fraction of a page that is used [TN92].
The OO7 database clustering matches traversal T6 poorly but matches traversals T1, T2a and T2b well; our results show that on average T6 uses only 3% of each page whereas the other traversals use 49%. We defined a new traversal T1- that uses an average of 27% of a page to represent a more likely clustering quality; we believe that in real workloads, average use would fall between T6 and T1-. We also defined a traversal T1+ that uses 91% of the objects in a page; it allows us to evaluate the impact of HAC in a very unlikely worst case. T1- and T1+ are similar to T1; T1+ visits all the sub-objects of atomic parts and connections, whereas T1- stops traversing a composite part graph after it visits half of its atomic parts. Our methodology fixes the way objects are physically clustered in the database and simulates different qualities of clustering by using different traversals.
To analyze the sensitivity of HAC to different workloads, we also designed our own dynamic traversals of the OO7 database, which are similar to the multi-user OO7 benchmark [C+94a], but execute a more mixed set of operations. The dynamic traversals perform a sequence of operations in two medium-sized databases; each operation selects one of the databases randomly, follows a random path down from the root of the assembly tree to a composite part, and randomly executes a T1-, T1 or T1+ traversal of the composite part graph. To model a working set, operations are preferentially executed in one of the two databases. At any given point, one of the databases is the hot database, to which 90% of the operations are directed; the remainder go to the cold database. Each traversal runs 7500 operations and we time only the last 5000. After the 5000-th operation, there is a shift of working set, and the roles of hot and cold database are reversed.
Table 1 shows the values we used for HAC's parameters in all the experiments described in the next sections. To choose these values, we ran various hot traversals including static traversals T1 and T1-, the dynamic traversal with 80% of object accesses performed by T1- operations and 20% by T1 operations, and another very dynamic shifting traversal described by Day [Day95]. The experiments varied the cache sizes and the values of the parameters across the studied range reported in the table. The stable range column shows the range of parameter values that (for each traversal and each cache size) resulted in an elapsed time within 10% of that obtained for the value we chose.
Description | Chosen Value | Studied Range | Stable Range |
Retention fraction (R) | 0.67 | 0.5 - 0.9 | 0.67 - 0.9 |
Candidate set epochs (E) | 20 | 1 - 500 | 10 - 500 |
Secondary scan pointers (N) | 2 | 0 - 20 | 2 |
Frames scanned (S) | 3 | 2 - 20 | 3 |
Table 1: Parameter Settings for HAC
The performance of HAC improves monotonically within the range studied for R and E, varying little towards the high end of the range. Its behavior relative to changes in N is also monotonic for traversals T1-, T1, and the dynamic traversal, but performance on the shifting traversal degrades for values of N larger than 2.
The only parameter that is not trivial to set is S, the number of frames scanned, because it controls both the speed at which object usage is decayed and the number of frames inserted in the candidate set in each epoch. We picked a value S=3 because it results in good performance for all the experimental points and because the replacement overhead increases quickly with S.
Next: Miss Rate Up: Performance Evaluation Previous: Performance Evaluation
Miguel Castro, Atul Adya, Barbara Liskov, and Andrew Myers