Best practices. Optimizing analytic workloads using DB with BLU Acceleration. IBM DB2 for Linux, UNIX, and Windows

Please download to get full document.

View again

of 27
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Services

Published:

Views: 17 | Pages: 27

Extension: PDF | Download: 0

Share
Related documents
Description
IBM DB2 for Linux, UNIX, and Windows Best practices Optimizing analytic workloads using DB with BLU Acceleration Jessica Rockwood Senior Manager, DB2 LUW Performance Benchmarking, IBM Canada Lab
Transcript
IBM DB2 for Linux, UNIX, and Windows Best practices Optimizing analytic workloads using DB with BLU Acceleration Jessica Rockwood Senior Manager, DB2 LUW Performance Benchmarking, IBM Canada Lab Roman B. Melnyk Senior Information Developer, IBM Canada Lab Michael Kwok Senior Manager, DB2 LUW Warehouse Performance, IBM Canada Lab Berni Schiefer Distinguished Engineer, Information Management Performance and Benchmarks, IBM Canada Lab Updated: May 2014 Executive summary... 4 Introduction... 5 Before you start... 6 Hardware and software... 6 Identifying optimal workloads for BLU Acceleration... 7 Capacity planning... 7 Processor cores... 7 Memory... 8 Input/Output... 8 Best practices for configuration... 8 Single setting for analytic workloads... 8 Database and database manager configuration Adopting BLU Acceleration Guidelines for creating column-organized tables Synopsis tables and data skipping Loading data Query execution plans and the new CTQ operator Database maintenance Concurrency control (workload management) Monitoring Is the sort heap adequate for good performance? Is the table suitably organized? How well are queries performing? How is the buffer pool performing? How are the prefetchers performing? Conclusion Further reading Contributors Acknowledgements Notices Optimizing analytic workloads using DB with BLU Acceleration Page 2 of 27 Trademarks Contacting IBM Optimizing analytic workloads using DB with BLU Acceleration Page 3 of 27 Executive summary DB with BLU Acceleration is a combination of complementary innovations from IBM that simplifies and speeds up analytic workloads. It is easy to implement and is selfoptimizing. BLU Acceleration can typically eliminate the need for indexes, aggregates (for example, MQTs or materialized views), or time-consuming database tuning to achieve top performance and storage efficiency. In most cases, no SQL or schema changes are required to take advantage of this breakthrough technology. These innovations, which are introduced in DB2 for Linux, UNIX, and Windows Version 10.5 (DB2 10.5), are designed to help you quickly find answers to more complex business questions while keeping costs down. DB with BLU Acceleration offers a rich set of features that help you to meet these goals, including column-organized storage, actionable compression, parallel vector processing, and data skipping. In combination, these features provide an in-memory, CPU-optimized, and I/O-optimized solution. This paper gives you an overview of these technologies, recommendations on hardware and software selection, guidelines for identifying the optimal workloads for BLU Acceleration, and information about capacity planning, memory, and I/O. A section on system configuration shows you how IBM s focus on simplicity enables you to set up DB so that it automatically makes optimal configuration choices for analytic workloads. Other sections describe how to implement and use DB2 with BLU Acceleration. The focus of this best practices paper is on ease of use. You will learn how BLU Acceleration works and what it is doing under the covers, which will give you a real appreciation of the simplicity that is built into BLU Acceleration and show you that it really does deliver super analytics, super easy. Optimizing analytic workloads using DB with BLU Acceleration Page 4 of 27 Introduction BLU Acceleration is a new collection of technologies for analytic queries that are introduced in DB2 for Linux, UNIX, and Windows Version 10.5 (DB ). At its heart, BLU Acceleration is about providing faster answers to more questions and analyzing more data at a lower cost. DB2 with BLU Acceleration is about providing order-ofmagnitude benefits in performance, storage savings, and time to value. These goals are accomplished by using multiple complementary technologies, including: The data is in a column store, meaning that I/O is performed only on those columns and values that satisfy a particular query. The column data is compressed with actionable compression, which preserves order so that the data can be used without decompression, resulting in huge storage and CPU savings and a significantly higher density of useful data held in memory. Parallel vector processing, with multi-core parallelism and single instruction, multiple data (SIMD) parallelism, provides improved performance and better utilization of available CPU resources. Data skipping avoids the unnecessary processing of irrelevant data, thereby further reducing the I/O that is required to complete a query. These and other technologies combine to provide an in-memory, CPU-optimized, and I/O-optimized solution that is greater than the sum of its parts. BLU Acceleration is fully integrated into DB2 10.5, so that much of how you leverage DB2 in your analytics environment today still applies when you adopt BLU Acceleration. The simplicity of BLU Acceleration changes how you implement and manage a BLUaccelerated environment. Gone are the days of having to define secondary indexes or aggregates, or having to make SQL or schema changes to achieve adequate performance. This best practices paper focuses as much on what you no longer need to do as on what you should do. We direct your attention to a few key implementation items, and then you can let BLU Acceleration take over and provide you with optimal performance. It is analytics that is super fast and super easy just load and go! 1 This document is current with respect to DB FP1 (DB ) and later. The use of the most current fix pack is encouraged. Optimizing analytic workloads using DB with BLU Acceleration Page 5 of 27 Before you start Ensure that your system meets the necessary hardware and software requirements. Hardware and software BLU Acceleration is supported on POWER /AIX and x86/linux platforms. Table 1 summarizes the supported operating systems, including both minimum and recommended versions, as well as recommended processors. Note that the minimum version requirements on these platforms are the DB requirements, which are applicable to both row- and column-organized tables. Table 1. Hardware and operating system recommendations Operating system Minimum version requirements Recommended versions Hardware recommendations AIX AIX 6.1 TL7 SP6 or AIX 7.1 TL1 SP6 AIX 7.1 TL2 SP1 or higher POWER7 or higher 2 Linux x86 (64-bit only) Red Hat Enterprise Linux (RHEL) 6, SuSE Linux Enterprise Server (SLES) 10 SP4 or SLES 11 SP2 RHEL 6.3 or higher SLES11 SP2 or higher Intel Nehalem (or equivalent) or higher 3 For optimal performance, we recommend that you maintain your system firmware at the latest levels, particularly if you are using virtualization. BLU Acceleration is offered in DB2 Advanced Workgroup Server Edition, DB2 Advanced Enterprise Server Edition, and DB2 Developer Edition. 2 Any POWER processor that supports DB is supported, but there are specific hardware optimizations on this processor. 3 Any Intel processor that supports Linux x86 is supported, but there are specific hardware optimizations on this processor. Optimizing analytic workloads using DB with BLU Acceleration Page 6 of 27 Identifying optimal workloads for BLU Acceleration In general, column-organized tables significantly speed up analytic workloads or data mart types of workloads. A data mart or warehouse typically has queries that have grouping, aggregation, range scans, or joins; queries that access a subset of a table s columns; and database designs that often include a star or snowflake schema. DB supports using both row-organized and column-organized tables in the same database, even in the same table spaces and buffer pools. The benefit of having column organization and row organization in the same database is that you can choose to optimize the table layout based on the workload. Row-organized tables have long been optimized for online transactional processing (OLTP) environments. Now with columnorganized tables sitting right next to row-organized tables, you have the best of both worlds in the same database. In fact, IBM provides Optim Query Workload Tuner 4 to identify and validate tables that would benefit from column organization based on the application workload. Capacity planning This section provides some guidelines for the number of cores and the amount of database memory that are recommended to achieve optimal performance for a BLUaccelerated workload. Processor cores DB2 with BLU Acceleration exploits multi-core parallelism and leverages SIMD parallelism to accelerate the performance of analytic queries. The effectiveness of these optimizations is directly impacted by the number of cores that are available to DB2 with BLU Acceleration. A minimum of 8 processor cores is required 5 for BLU-accelerated workloads. Today a single processor chip typically has 8-12 cores. If the DFT_DEGREE database configuration parameter has the default setting of ANY, queries use a degree of intrapartition parallelism that is equal to the number of cores. When several queries are running concurrently, the degree is automatically scaled back, based on the number of active agent threads on the system, to maintain efficient execution. Higher data volumes, higher degrees of query concurrency, and greater query complexity would benefit from a larger number of cores. 4 Optim Query Workload Tuner is part of DB AESE and AWSE 5 =Server_8388E5E04AD611E2A6D FE1B Updated: May 2014 Memory DB2 with BLU Acceleration is designed to effectively leverage large memory configurations. To realize the greatest benefit from the in-memory analytics that is built into DB2 10.5, consider systems that have at least 64 GB of RAM for production use. As you scale up the system maintain a ratio of 8GB of RAM per core. For information on how to allocate this memory within DB2, see Best practices for configuration . Input/Output BLU Acceleration has more random I/O access than in row-organized data marts that typically perform large sequential scans. Consider storing key column-organized tables on I/O subsystems and devices that provide high random I/O access. Examples of storage that provides better random I/O access are solid state drives (SSD),flash-based storage or enterprise SAN storage systems BLU-accelerated tables that are frequently accessed benefit from improved I/O subsystem performance, particularly when the table exceeds the memory size that is available for use by BLU Acceleration. There are both internal and external SSD or flash solutions. Internal flash storage might not have any write cache, and that can affect write operations involving, for example, temporary tables that are created and populated during query processing. Common examples of storage that would provide good random I/O characteristics for use with DB2 with BLU Acceleration include FlashSystem (810/820), Storewize V7000 (with SSD), XIV, as well as enterprise SAN storage such as DS8K. Best practices for configuration The simplicity of managing a BLU-accelerated environment is one of the key benefits of this technology. There is even a new option for the DB2_WORKLOAD registry variable to automatically set configuration parameters that are most relevant to BLU Acceleration and the performance of analytic workloads. Single setting for analytic workloads If you have a database that is dedicated to analytic workloads, you can use a simple setting to configure DB2 and the database for BLU Acceleration. By setting the DB2_WORKLOAD registry variable to ANALYTICS before you create the database, DB2 is aware that the system will be used for analytic workloads. The new database is automatically configured for optimal analytics performance. Remember to not disable AUTOCONFIGURE in the CREATE DATABASE command. For example: db2set DB2_WORKLOAD=ANALYTICS Optimizing analytic workloads using DB with BLU Acceleration Page 8 of 27 db2start db2 create db mydb Be sure to set this registry variable (when applicable) before creating your analytics database. DB2 will do the rest. Table 2 describes in detail what happens to a newly created database when DB2_WORKLOAD=ANALYTICS. Table 2. Database behavior that is influenced by DB2_WORKLOAD=ANALYTICS Category General database configuration parameters Automatic best practice settings DFT_TABLE_ORG = COLUMN PAGESIZE = 32 KB DFT_EXTENT_SZ = 4 DFT_DEGREE = ANY Database memory configuration parameters Intraquery parallelism DB2 workload management CATALOGCACHE_SZ, SORTHEAP, and SHEAPTHRES_SHR are set to a value that is higher than the default and optimized for the hardware that you are using. Enabled for any workload (including SYSDEFAULTUSERWORKLOAD) that specifies MAXIMUM DEGREE DEFAULT, even if the database manager configuration parameter INTRA_PARALLEL = OFF. The default concurrency threshold on the SYSDEFAULTMANAGEDSUBCLASS service subclass is enabled to ensure maximum efficiency and utilization of the server. Automatic space reclamation Sets AUTO_MAINT = ON and AUTO_REORG = ON, and space reclamation is performed for column-organized tables by default. The DB2_WORKLOAD registry variable is set at the instance level. It influences all new databases in that instance. If the instance is going to have multiple databases, not all of which are BLU-accelerated data marts, simply set the registry variable before you create the databases that you want optimized for analytic workloads, then unset the registry variable after these databases are created. Alternatively, follow the recommendations Optimizing analytic workloads using DB with BLU Acceleration Page 9 of 27 below for optimizing a pre-existing database, and apply only the database-level recommendations. To optimize a pre-existing database by configuring the recommended settings, set the registry variable and then issue the AUTOCONFIGURE command. The following example shows you how to optimize the pre-existing database MYDB for analytics and BLU Acceleration: db2set DB2_WORKLOAD=ANALYTICS db2start db2 connect to mydb db2 autoconfigure apply db After auto-configuration completes (either as part of the CREATE DATABASE command or an AUTOCONFIGURE command), review the sizes of the buffer pools, sort heap, sort heap threshold, and utility heap. You can fine tune the memory sizes based on your specific application needs and your expected data volumes. The following section provides more details about this type of review. Database and database manager configuration BLU Acceleration uses database shared sorts (the memory size for which is configured by the SHEAPTHRES_SHR database configuration parameter). In DB with BLU Acceleration, the self-tuning memory manager (STMM) does not adjust the values of the SHEAPTHRES_SHR and SORTHEAP database configuration parameters automatically. Instead, these parameters are set with static values by the AUTOCONFIGURE command when DB2_WORKLOAD=ANALYTICS. At the instance level, the SHEAPTHRES configuration parameter is a soft target value that DB2 tries to maintain for all memory that is consumed by both private and shared memory-intensive operations, such as hash join and hash aggregation. This value is not just about sort memory, but about all working memory. Keep the instance-level SHEAPTHRES configuration parameter at its default value of 0, which leaves the tracking of sort memory consumption at the database level only (SHEAPTHRES_SHR). However, at the database level, the value of the SHEAPTHRES_SHR parameter determines the limit for the allocation of sort memory. When a certain percentage of the value of SHEAPTHRES_SHR is attained, the DB2 database manager begins to throttle the memory that is allocated to sort memory consumers to prevent memory overcommitment, and those consumers might therefore receive only their minimum allocation requirement. To ensure that aggregation-intensive operations, which are common in analytic workloads, have sufficient sort memory to perform optimally, take the following steps: Set the SHEAPTHRES_SHR parameter to 40-50% of the total memory that is available to the database, which is determined by the value of the DATABASE_MEMORY database configuration parameter. Optimizing analytic workloads using DB with BLU Acceleration Page 10 of 27 Set the SORTHEAP parameter to 5-20% of the value of the SHEAPTHRES_SHR parameter. With increasing query concurrency and complexity, decrease the SORTHEAP to SHEAPTHRES_SHR ratio. In general, you can consider increasing the value of the SHEAPTHRS_SHR parameter, but beware of substantially increasing the value of the SORTHEAP parameter without understanding overall memory requirements across the entire workload. Table 3 summarizes what the memory distribution within the database should be from a memory configuration standpoint. Your specific application and workload characteristics might suggest different values. Table 3. Recommended memory distribution for a BLU-accelerated database as a function of workload concurrency Low concurrency ( 20 concurrent workloads) High concurrency ( 20 concurrent workloads) 40% buffer pool 25% buffer pool 40% SHEAPTHRES_SHR 50% SHEAPTHRES_SHR SORTHEAP = SHEAPTHRES_SHR / 5 SORTHEAP = SHEAPTHRES_SHR / 20 Because of their complexity, analytic queries are more likely to benefit from an increased value of the STMTHEAP database configuration parameter, which specifies the size of the statement heap (a work space for the SQL compiler during compilation of an SQL statement). If your query returns SQL0437W with reason code 1, consider increasing the STMTHEAP value. Start by increasing the value of STMTHEAP by 50%; if the warning persists, continue increasing the value by the same percent increment, as needed. DB sets the value of the UTIL_HEAP_SZ database configuration parameter to AUTOMATIC for new databases. Consider setting the UTIL_HEAP_SZ parameter to AUTOMATIC for existing databases. This setting enables the automatic resizing of the utility heap by DB2 load and other utilities. Adopting BLU Acceleration This section is all about implementing and using BLU Acceleration. It provides recommendations on how to build and load column-organized tables, details about column-organized tables and query execution, and guidelines for monitoring your database. If you want to convert one or more row-organized tables in your database into columnorganized tables, you can simply use the db2convert command to automate the Optimizing analytic workloads using DB with BLU Acceleration Page 11 of 27 process. This command calls the ADMIN_MOVE_TABLE stored procedure to perform the conversion, and shares its options. The db2convert command drops any dependent objects (such as secondary indexes or aggregates) as part of the conversion operation, as shown in the following example, because they are not required on column-organized tables: $ db2convert -d mydb -z db2inst1 -t mytable Conversion notes for exceptional table(s) Table: DB2INST1.MYTABLE: -Secondary indexes will be dropped from the table. Enter 1 to proceed with the conversion. Enter 2 to quit. Guidelines for creating column-organized tables Creating a column-organized table is very easy. All you need is a table name, column names, and their data types. If you set the DFT_TABLE_ORG database configuration parameter to COLUMN, you do not need to specify the ORGANIZE BY COLUMN clause. Here is an example of the syntax to define a column-organized table: CREATE TABLE mytable ( c1 INTEGER NOT NULL, c2 INTEGER, ) ORGANIZE BY COLUMN IN table_space_name No additional specifications are required. The previous example includes the IN table_space_name clause. For improved manageability, it is recommended that you create fact tables in their own table space. For example, with a fact table in one table space and dimension tables in a different table space, you can have different storage groups and buffer pools for each table space (and its associated table type) as well as independent backup/restore fact tables. If there are enforced primary key constraints or unique constraints, the unique i
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks