Posted on February 18, 2015 by Erik Benner
In the first of this series, the history of data compression was discussed as well as several methods used to compress data. In this section we will look at a new technology introduced in Oracle Database 12c that provides a simpler and more efficient method to manage the data.
In the last few years, we have seen a new IT monster born. This monster was created out of the powerful combination of an explosive growth of data and the complexity in required to manage the data. This beast has been fed by new storage capacities that allow us to store years of data in our systems, and the need to analyze the data at speeds that were just dreams five years ago. Adding to the growth is the ability to capture data in from new sources, often in near real-time. The drive for this space is often the ability to commercialize the data, through new markets that thrive on data.
As organizations see the increased value of their data, they continue to rapidly grow their stores of information. As this data grows, organizations start to leverage the data, achieving tangible results that benefit the business, often in ways that provide significant impact. Everything from financial data to client interactions are stored and analyzed to achieve these goals. To manage this data, complex rules are often built by the DBA to organize and compress the data, creating a never-ending process of analyzing how users interact with the systems to head off performance issues through complex exception rules.
One dirty secret - while our databases grow and new data is added, often only a fraction of the data is used on a frequent basis. Most users only access a small amount of data, and as the data ages, it is accessed less frequently. A common reference metric is that only twenty percent of the data is accessed frequently, the other eighty percent sits unused most of the time. In the past, we could apply technologies like partitioning and compression to reduce the disk space requirements and tier storage. But this often requires complex rules as we try to second guess how the data will be accessed, often compressing data that users are still using, and creating performance issues with the OLTP workload. To counter this, we have created more complex rules that require more time and effort to maintain.
In Oracle Database 12c, there is a new option called Automatic Data Optimization, which combines the functionality of both the Partitioning Pack and the Advanced Compression Pack. This technology simplifies not only how we apply compression technologies to our data, but also what storage is used to hold our data. This allows active data to live on high performance storage like solid state technology, and older inactive data to live on high capacity drives. This works because the database includes a new technology call Heat Map. Heat Map tracks how each block of storage in the database is used. This includes the time of last modification and access of tables and partitions. Row level Heat Map tracks modification times for individual rows (aggregated to the block level). These statistics can be used to define compression and storage policies which will be automatically maintained throughout the lifecycle of the data. By leveraging this data, we can create policies that match the compression to how the data is used. This lets us maximize the compression of inactive data, while at the same time improving performance across all data types.
Simple rules are established, based on how often the data is used. These rules are then used to apply compression algorithms and storage tiers.
The ability to tier storage should also not be overlooked. While compression alone can produce cost savings, the ability to tier to different storage can easily provide over 4x cost savings. The ADO rules can not only include the compression type, but also migrate the data to a different physical disk. This allows the DBA to leverage different storage technologies for the same table. From high performance Solid State Devices, to High Performance traditional Hard Disk Drives, to the lower cost High Capacity drives. We also can use different striping technologies, using RAID 1+0 for performance, and A Parity RAID technology for higher usable capacities when the performance is not as critical.
This combination of compression and location lets the DBA keep active data on solid state devices, with little or no compression, and then move the data as its use declines to higher capacity storage while at the same time leveraging higher level of compression. Not only does this reduce the storage expense, but in many instances it can improve the performance of Data Warehouse workloads. This performance enhancement is due to reduced I/O requirements for compression of the data.
In the next article in this series, we will show how to configure ADO, and how to calculate the space savings by using the Advanced Compression database option.
Erik Benner, Enterprise Architect - Enterprise Solutions