Recent research on column-oriented database systems (DBMSs) has shown
that these systems can outperform existing row oriented DBMSs by one to
two orders of magnitude on read mostly query workloads like those found
in data warehouses, decision support, and customer relationship
management systems. In this talk, I will discuss this exciting new class
of database systems and will provide an overview of the C-Store system
that we have developed over the past two years at MIT. I will then
focus on the design of the column-oriented query execution engine I have
developed. In particular, I will discuss the impact on query performance
of tuple construction (stitching together attributes from multiple
columns into a row-oriented "tuple") and operation on compressed data.
Tuple construction allows column oriented DBMSs to offer a
standards-compliant relational database interface (e.g., ODBC, JDBC,
etc); however, if done at the wrong point in a query plan, a significant
performance penalty is paid. Similarly, data compression can improve
query performance by an order of magnitude by trading cheap CPU cycles
for expensive I/O bandwidth.