Posted on February 10, 2015 by Mike Haas
Oracle is a pioneer in database compression technology. If you’ve been around the Oracle technology for several decades, you are familiar with references to ‘Oracle 8i’, or ‘Oracle 9i’. Some of this blog’s discussion will refer to Oracle 9i, released more than 10 years ago! Back in those earlier Oracle days, the Oracle world was a little less complex. Oracle had not acquired Sun, there were no Oracle-Sun hardware offerings; thus, database compression was just that, database compression.
In more recent years, as Oracle has both acquired and evolved many new technologies, the capabilities around compression have become even more robust. With the advent of ever-increasing capabilities that carryover from the database, and now to storage and Engineered Systems solutions provided by Oracle, it becomes a little more challenging to ensure we understand and utilize the numerous compression capabilities, so we can best optimize their benefits.
Within this blog, we will concentrate specifically on compression from the Oracle database perspective, and we will minimize references to the more recent 12c database capabilities around Automatic Data Optimization (ADO), Heat Maps, and Information Lifecycle Management (ILM). These capabilities will be discussed in a subsequent blog, written by Erik Benner, Mythics Enterprise Architect.
The compression alternatives and targeted functionality can be reviewed using a historical timeline. In fact, more than a decade ago, Oracle Database 9i Release 2 introduced Basic Table Compression which compresses data that is loaded using bulk load operations. Basic Table Compression has been carried forward, and is a feature of Oracle Database 12c Enterprise Edition (EE). Advanced Row Compression is a part of the Oracle Advanced Compression Option. Note the distinction, a feature is a non-licensable technology; an Option to the database is a licensable technology.
In 2007, Oracle Database 11g Release 1 introduced OLTP Table compression, now called Advanced Row Compression, which maintains compression during all types of data manipulation operations, including conventional DML such as INSERT and UPDATE. In addition, Advanced Row Compression minimizes the overhead of write operations on compressed data, making it suitable for transactional / OLTP environments as well as Data Warehouses. This extends the benefits of compression to all application workloads.
With standard compression, Oracle compresses data by eliminating duplicate values in a database block. Standard compression only works for direct path operations (CTAS or IAS, as discussed earlier). If the data is modified using any kind of conventional DML operation (for example updates), the data within that database block will be uncompressed to make the modifications and will be written back to disk uncompressed.
With OLTP compression, just like standard compression, Oracle compresses data by eliminating duplicate values in a database block. But unlike standard compression OLTP compression allows data to remain compressed during all types of data manipulation operations, including conventional DML such as INSERT and UPDATE. More information on the OLTP table compression features can be found in the Oracle® Database Administrator's Guide 12c.
In addition to compressing data stored inside the database, Advanced Compression also includes the capability to compress backed up data. Recovery Manager (RMAN) and Data Pump are the two most commonly used tools to backup the data stored inside an Oracle Database. We will leave a more detailed discussion around RMAN and Data Pump compression, for another day.
If you decide to use compression, consider sorting your data before loading it to achieve the best possible compression rate. The easiest way to sort incoming data is to load it using an ORDER BY clause on either your CTAS or IAS statement. You should ORDER BY a NOT NULL column (ideally non numeric) that has a large number of distinct values (1,000 to 10,000).
Oracle’s Hybrid Columnar Compression technology is a new method for organizing data within a database block. As the name implies, this technology utilizes a combination of both row and columnar methods for storing data. This hybrid approach achieves the compression benefits of columnar storage, while avoiding the performance shortfalls of a pure columnar format.
Released with the 11g version of the Oracle database, Hybrid Columnar Compression (HCC) enables the highest levels of data compression. Unlike many other compression technologies, HCC provides enterprises with tremendous cost-savings and simultaneous performance improvements due to reduced I/O. While HCC is most commonly utilized in an Exadata environment, this was due to the timing of the 11g database release, and how it coincided with the Exadata Engineered Systems release. The fact is, HCC is not limited to use only on Engineered Systems (e.g. Exadata). The capabilities of HCC can be utilized on several Oracle-only storage offerings (e.g. ZFS technologies like ZS3-2, ZS4, FS1 and ZDLRA).
HCC is optimized to use both database and storage capabilities on Exadata to deliver tremendous space savings AND revolutionary performance. Average storage savings can range from 10x to 15x depending on which Hybrid Columnar Compression level is implemented.
Hybrid Columnar Compression (HCC) achieves its compression using a different compression technique. A logical construct called the compression unit is used to store a set of Exadata Hybrid Columnar-compressed rows. When data is loaded, a set of rows is pivoted into a columnar representation and compressed. After the column data for a set of rows has been compressed, it is fit into the compression unit. If conventional DML is issued against a table with HCC, the necessary data is uncompressed in order to do the modification and then written back to disk using a block-level compression algorithm. If your data set is frequently modified using conventional DML, HCC is not recommended, instead the use of OLTP compression is recommended.
HCC provides different levels of compression, focusing on query performance or compression ratio respectively. With HCC optimized for query, less compression algorithms are applied to the data to achieve good compression with little to no performance impact. Compression for archive on the other hand tries to optimize the compression on disk, irrespective of its potential impact on the query performance. See the Oracle Exadata documentation for further details.
It should be noted early in the HCC discussion; however, that this compression approach should be targeted almost exclusively for data stores that will not be updated. The compression ratios, and subsequent optimizations, will diminish if/when HCC compressed data is modified with DML operations. Therefore, the combined cost savings and performance gains that can be realized with HCC are unparalleled. They cannot be achieved or compared to any non-Oracle solution.
In an Exadata environment (and/or a SPARC SuperCluster), queries on hybrid columnar compressed data often run in the Exadata storage cells with Smart Scans using a high performance query engine that utilizes special columnar processing techniques. Data sent back to the database server(s) is usually compressed, and is typically much less data than is read from disk. Note that data remains compressed not only on disk, but also remains compressed in the Exadata Smart Flash Cache, over InfiniBand, in the database server buffer cache, as well as when doing back-ups or log shipping to Data Guard. These combined optimizations are many of the reasons that Oracle Exadata repeatedly provides unprecedented scalability and optimization.
Despite the powerful benefits of HCC, it is quite easy to use. For new tables and partitions, enabling Hybrid Columnar Compression is as easy as specifying a compression level when creating the table.
See the example below:
The COMPRESS FOR QUERY HIGH compression level is used as an example in this document, the additional compression levels, available with Hybrid Columnar Compression, include:
Typically compression is considered only as a space-saving mechanism, and more often than not, a space saving mechanism for permanently stored data. Data populated into the In-Memory column store is compressed using a new set of compression algorithms that not only help save space but also improve query performance. The new Oracle In-Memory compression format allows queries to execute directly against the compressed columns. This means all scanning and filtering operations will execute on a much smaller amount of data. Data is only decompressed when it is required for the result set. In-memory compression is specified using the keyword MEMCOMPRESS, a sub-clause of the INMEMORY attribute.
There are six levels, each of which provides a different level of compression and performance.
|NO MEMCOMPRESS||Data is populated without any compression|
|MEMCOMPRESS FOR DML||Minimal compression optimized for DML performance|
|MEMCOMPRESS FOR QUERY LOW||Optimized for query performance (default)|
|MEMCOMPRESS FOR QUERY HIGH||Optimized for query performance as well as space saving|
|MEMCOMPRESS FOR CAPACITY LOW||Balanced with a greater bias towards space saving|
|MEMCOMPRESS FOR CAPACITY HIGH||Optimized for space saving|
In-Memory compression ratios can vary from 2X-20X - depending on the compression option chosen, the datatype, and the contents of the table. The compression technique used can vary across columns, or partitions within a single table. For example, you might optimize some columns in a table for scan speed, and others for space saving.
A create table command that indicates different compression techniques for different column.
With the release of the 11g Oracle Database, came a new feature named the Oracle Compression Advisor. Many users became familiar with this package, as it was used to provide estimates of compression that would be realized via Hybrid Columnar Compression.
This DBMS_COMPRESSION package has been enhanced to support In-Memory compression. As it relates to In-Memory compression, the advisor provides an estimate of the compression ratio that can be realized through the use of MEMCOMPRESS. This estimate is based on analysis of a sample of the table data and provides a good estimate of the actual results obtained once the table is populated into the IM column store.
So there you have it, a very brief review of some of the Oracle database compression capabilities; enabling DBAs to evoke both cost savings, as well as performance, on disk and in-memory. Come back starting next week for other articles in this series, including Automatic Data Optimization and Storage Compression / Storage Tiering.
Mike Haas, Director - Enterprise Solutions