An Oracle White Paper November Enhancing Oracle Database Performance with Flash Storage - PDF

Please download to get full document.

View again

of 33
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report

Business & Finance


Views: 0 | Pages: 33

Extension: PDF | Download: 0

Related documents
An Oracle White Paper November 2010 Enhancing Oracle Database Performance with Flash Storage 1. Introduction Flash Memory: Performance and Price Characteristics Performance Implications...6
An Oracle White Paper November 2010 Enhancing Oracle Database Performance with Flash Storage 1. Introduction Flash Memory: Performance and Price Characteristics Performance Implications Price-Performance Implications Hardware Configuration Software Environment and Configuration Flash Storage in Decision Support Environments Flash Storage in OLTP Environments OLTP Workloads Oracle Database Redo Logs on Flash Storage Read Only Workloads Read-Write Workloads Flash Storage and Database Maintenance Database Loading Index Creation Database backups Database Recovery Conclusions...24 Acknowledgments...25 Appendix A...26 Appendix B...28 Endnotes...30 Enhancing Oracle Database Performance with Flash Storage 1. Introduction In the last year or two, enterprise class flash storage devices have entered the marketplace. One of the high end examples of such devices is Oracle's Sun F5100 Flash Array (hereafter just F5100.) The F5100 provides impressive raw I/O performance when compared to conventional disks. For example Table 2.2 shows that small I/Os (8K) on an F5100 are about an order of magnitude faster than small I/Os on 15K SAS disks and that large I/O's (1MB) are about 3 times faster on an F5100 than on SAS disks. Intuitively, these types of I/O advantages, should translate into serious performance gains for Oracle database applications when conventional disks are replaced by F5100 devices. In this paper it is shown how, and under what circumstances, this will occur. The focus of the paper is on three Oracle application areas on-line transaction processing (OLTP) characterized by transactions which retrieve and update just a handful of rows decision support (DSS) characterized by queries that scan very large numbers of rows database maintenance tasks which include such diverse activities as database backup and recovery, index creation and database loads It will be seen that in each of these areas, the use of F5100 flash devices for database storage can improve overall performance by as much as a factor of 4 or 5, or by as little as just a few percent. The wide variations show that flash may not speed up all applications. 1 The degree of improvement seen with any particular application depends upon certain characteristics of that application. For example if an application consumes almost all of the I/O channel bandwidth in a disk-based configuration, replacing disk with flash will provide little or no performance benefit. In addition to investigating the use of flash for database storage, the use of flash for redo logs is also considered. The guidelines for optimizing performance with flash-based redo logs will turn out to be somewhat different than those for database files. As stated above, the approach of this paper is to investigate the performance implications of replacing all conventional disks in an Oracle database with flash devices. Prior studies of flash in the Oracle database environment have looked at performance improvements resulting from mixed flash and conventional disk configurations. For example, studies the use of flash for Oracle indexes, with tables stored on conventional disks. Oracle also has the capability of using F5100's as a cache for the Oracle SGA. More information on this approach can be found at: 2 The organization of the rest of the paper is as follows: Section 2 compares some of the underlying characteristics of flash and disks. Sections 3 and 4 describe the configurations used for the tests that were performed. Section 5 discusses how the use of flash storage can improve query performance in decision support environments. Section 6 does the same for OLTP environments. Section 7 describes how the use of flash can speed up database maintenance operations. Section 8 summarizes the conclusions In addition, there are two appendices which show the SQL statements used for all the tests. 3 2. Flash Memory: Performance and Price Characteristics Flash memory is solid state (i.e. all electronic), non-volatile storage with performance and price characteristics lying somewhere between DRAM and conventional spinning (magnetic) disks. It should however be noted, that not all flash is the same. Bulk storage flash devices display wide performance and price differences, based on the specifics of their implementations. At the very low end is consumer flash, found in digital cameras, MP3 players, memory sticks, and the like. In the mid range are flash devices which have form factors resembling disks or HBA cards. These are suitable for high performance PCs and even some datacenters. At the very high end are multi-terabyte units with performance as much as an order of magnitude greater than the best conventional disks. These units are designed for enterprise-level computing systems and have much higher reliability specifications than the lower end devices. This paper is concerned only with this latter type of flash, and specifically focuses on the Oracle's Sun F5100 Flash Array. The F5100 has a capacity of 1920 GB or almost 2TB in a 1 U enclosure. It is comprised of up to 80 DIMM like cards, called flash modules. Table 2.1 summarizes general performance and price-performance differences between F5100 flash modules and SAS 15K disks. Table 2.2 shows more detailed performance comparisons between the two types of devices. Many of the measures listed in these tables are elaborated upon in Sections 2.1 and TABLE 2.1: DISK AND FLASH PERFORMANCE AND PRICE/PERFORMANCE COMPARISONS PROPERTY DISK (15K SAS) F5100 FMOD 1 Capacity (GB) random IOP read/sec (8K) 7 ms avg response time ,600 3 random IOP write/sec (8K) 8 ms avg response time ,000 4 MB/s sustained sequential read MB/s sustained sequential write $/IOPS (8K reads) $/MB/sec (sequential read) Watts (W) amortizing enclosure wattage W/reads/sec (8K reads) $/GB (price/capacity) TABLE 2.2 RANDOM I/O CHARACTERISTICS FOR F5100 FLASH MODULES VS. SAS DISKS AVERAGE SERVICE TIMES (MS) FOR RANDOM I/OS SINGLE F5100 FMOD (WITH D10R) SAS 143 GB 15 K DISK 1 user 2 users 4 users 8 users 16 users 1 user 2 users 4 users 8 users 16 users 8 K read K write MB read MB write SUSTAINED THROUGHPUT RANDOM READS (WRITES) PER SEC SINGLE F5100 FMOD (WITH D10R) SAS 143 GB 15 K DISK 1 user 2 users 4 users 8 users 16 users 1 user 2 users 4 users 8 users 16 users 8 K read K write MB read MB write Performance Implications Table 2.2 shows some of the performance advantages of the F5100 compared to SAS 15K disks small I/Os are about an order of magnitude faster on the F5100 than on a SAS disk large reads are about 3 times faster on the F5100 than on a SAS disk large writes are about 1.5 times faster on the F5100 than on a SAS disk 6 Table 2.2 also shows that the F5100 has similar latencies for small random reads and small random writes the latency of large random (sequential) writes is about twice that of large random (sequential) reads These results suggest that F5100 based storage should easily out-perform disk performance for most Oracle workloads. One of the generally held perceptions about flash storage devices is that writes are much slower than reads. This has led many to believe that flash is not an appropriate technology for write intensive applications. While this perception is certainly true about writes to the underlying NAND chips that many flash storage devices are constructed from, enterprise level flash devices have been designed, through various mechanisms, to compensate for the inherent NAND write performance limitations so as to achieve excellent write, as well as read, performance. This is seen at the pure I/O level in Table 2.2. It will also be seen at the Oracle level throughout this paper, in that many of the example workloads which benefit from the use of flash, do both reads and writes. Thus one should not automatically avoid using flash with write intensive workloads. 2.2 Price-Performance Implications The disk industry has traditionally pitched the cost of storage in terms of $/GB (Table 2.1 line 10). Whereas this made sense in the past, when disk capacities were much smaller than they are today, currently, other metrics turn out to be more meaningful. For example $/IOPS or $/MB/sec (depending on workload) are much better metrics for making price/performance comparisons than $/GB. This is because disk capacities have grown so dramatically, compared to disk performance, that many customers only use 5-10% of the capacity of each disk in order to get the proper number of spindles to meet performance objectives. These alternate measures show that conventional storage costs are much higher than they initially seem, and in some cases, even higher than flash storage costs (Table 2.1 lines 6 and 7) [1]. The low power consumption of the F5100 (Table 2.1 line 8), implies that the use of F5100 based storage will also yield very significant operational cost savings, coupled with performance improvements. 7 3. Hardware Configuration All of the reported measurements were made on two identically configured (except for storage) Sun SPARC Enterprise M5000 servers (hereafter just M5000). Each M5000 had 128GB of main memory and was running Oracle Solaris 10 9/10. One of the servers was configured with 6 x Oracle's Sun Storage J4200 Arrays (each with 12x 146GB 15k RPM SAS disks), referred to as the disk configuration in the rest of the paper and the other was configured with 1 x F5100 Flash Array (60 x 24 GB flash modules), referred to as the flash configuration in the rest of the paper Each M5000 had a single I/O unit with a total I/O channel capacity of approximately 2 GB/sec. In each configuration, Oracle's Sun StorageTek 2540 Array (hereafter just SS2540) was used for the Oracle database redo log. The SS2540 was also used as the Oracle backup device for the backup tests. A number of tests were also done with the flash-based redo logs. The M5000 is a 8 socket server with 4 x 2.5GHz SPARC VI cores per socket. Since this much CPU power would totally overwhelm the available I/O subsystem, only 2 of the 8 sockets where enabled. This resulted in much more balanced systems from the standpoint of CPU power and I/O capability. However, the conclusions derived from these smaller configurations should also apply to larger CPU and storage configurations, as long as the I/O and CPU subsystems are balanced. 8 4. Software Environment and Configuration All experiments were performed using Oracle 11g R2. Raw devices, as opposed to UFS or ZFS files, were used for all database tablespaces and redo logs,. The only exceptions were for the ZFS based redo log experiments performed with the flash configuration. The same database schema, based on a subset of the TPC-H schema [2], was used for both the disk and flash configurations. The TPC-H database contains historical sales data of a hypothetical enterprise that ships orders, of various kinds of parts, on a worldwide basis. Although the TPC-H schema was developed for a decision support benchmark, OLTP operations can be performed on the schema as well. The queries and transactions used for the various workloads reference the following tables [3]: orders with columns (o_orderkey, o_custkey, o_orderstatus, o_totalprice,... ) lineitem with columns (l_orderkey, l_partkey, l_suppkey, l_linenumber. l_quantity,...) customer with columns (c_custkey, c_name, c_address, c_nationkey, c_acctbal,...) nation with columns ( n_nationkey, n_name, n_comment,...) A complete description of the tables can be found in the TPC-H Benchmark Specification. The total size of the raw data loaded into the database tables was approximately 90 GB. To keep things on a level playing field, each workload was painstakingly tuned to take full advantage of its respective storage configuration. 9 5. Flash Storage in Decision Support Environments Decision support type queries typically scan large amounts of data, and usually, consume significant CPU resources and I/O channel bandwidth. In addition, such queries generally return aggregate, as opposed to detailed, data. The following questions, whose answers can be found in the subset TPC-H database described above, are typical decision support type queries (the SQL statements for these are shown in Appendix A): R1: What is the maximum discount anyone has ever received? R2: What are the total revenue breakdowns for each country? R3: How many orders were placed in each month? R4: How many urgent orders were there in each country? R5: What was the total revenue for all orders requesting airmail delivery on Dec ? R6: How many orders, of each priority type, were received on July ? R7: How many hazardous items have ever been shipped? Each of these queries was executed in parallel mode. The response time was used as the metric to judge performance. The response times are shown in Table 5.1. As is readily seen, the use of flash resulted in response time improvements for all 7 queries. However the degree of improvement varied over a wide range. In some cases (e.g. R5 and R6), flash provided a substantial benefit, whereas in others (e.g. R3 and R7), flash provided a very small benefit. The reasons for the variations become apparent after studying the resource consumption patterns of each query. For example, the disk version of R7 consumed an average of 1900 MBPS (megabytes/sec) of I/O bandwidth. Since the total I/O channel capacity for the system is about 2000 MBPS, there is at most another 5% of bandwidth that can be squeezed out of the system. The flash version of R7 did manage to consume a little bit of the slack, which accounts for the roughly 2% observed performance improvement. With Query R3, the CPU was severely bottlenecked in the disk configuration; average CPU utilization was almost 90%. Thus the maximum benefit that could occur with the substitution of faster storage devices is 10%. This is indeed what did occur. Queries R3 and R7 illustrate a fundamental theorem from queuing network theory, which says that the throughput of a system, comprised of a network of queues, cannot be increased by reducing the service times of non-bottleneck resources, if at least one of the resources is saturated, i.e. operating at nearly 100% utilization [4]. Queries R5 and R6 experienced the most improvement with use of flash. Both of these queries had an average read size of about 32K bytes. Thus I/O channel capacity was nowhere near saturation. 10 Instead, read access time was the most important consideration. On the disk configuration, the average service time for the reads was about 28 ms; on the flash configuration average service time for the reads was only 4 ms. Under these circumstances one would expect a large performance gain by replacing disk with flash, which is indeed what occurred. Similar considerations can be used to explain the reasons for the flash improvements seen, not only with the remaining queries, but with virtually all kinds of other decision support queries. TABLE 5.1: FLASH VS. DISK PERFORMANCE FOR SINGLE USER DECISION SUPPORT QUERIES QUERY DISK CONFIGURATION: FLASH CONFIGURATION: FLASH IMPROVEMENT QUERY RESPONSE TIME QUERY RESPONSE TIME (PERCENT) (SECONDS) (SECONDS) R % R % R % R % R % R % R % 11 6. Flash Storage in OLTP Environments OLTP type transactions are characterized by lightweight client-server interactions which use minimal CPU resources and access small numbers of rows. Single SQL statements transactions, with known I/O characteristics were used to simulate various OLTP environments. This methodology is sound in that the SQL statements which were used are, in effect, the building blocks of more realistic workloads. For each storage configuration, each of the simple transactions were run repeatedly, by an increasing number of users, until the throughput reached a peak. The throughputs and response times were then tabulated and compared. 6.1 OLTP Workloads To simulate read-only environments three different read-only transactions were used. Each was implemented via a PL/SQL stored procedure whose source is shown in Appendix B. 1. oltp_readonly1 randomly selects a single row from the orders table using an index. The Oracle caches are set up in such a way so that index is fully cached, but that the orders table is not. The table, uniqueorders, is a fully cached table that translates integers in the range, 0 to 150,000,000, to actual order_keys (which have gaps). uniqueorders is not part of the TPC-H database; it was invented to avoid dealing with non-existent order keys. As a result of the described caching mechanisms, oltp_readonly1 is guaranteed to always perform exactly one physical read, once the workload acheives its steady state. 2. oltp_readonly2 This is just like oltp_readonly1, except that it randomly selects 2 rows from orders. In the process, it always performs exactly 2 physical reads. 3. oltp_readonly4 This is exactly like the previous two, except that it selects 4 rows from orders and does exactly 4 physical reads. These three transaction types are representative of many actual read-only OLTP environments seen in production applications. To simulate the behavior of read-write environments, a single update transaction was used. It was implemented by a PL/SQL stored procedure, whose code is shown in Appendix B. 12 oltp_readwrite This updates a single randomly selected row in the orders table. Using the same caching mechanisms described above, it was arranged to do exactly one physical read and one physical write (plus whatever fraction of a redo write occurs as a result of group commits). Since typical OLTP applications tend to perform as many reads as writes, the flash benefits shown for oltp_readwrite are likely to be representative of actual customer OLTP workloads. 6.2 Oracle Database Redo Logs on Flash Storage One of the more interesting questions concerning the use of flash storage is its value for Oracle redo logs. It turns out that the answer is not black and white, but rather depends upon the characteristics of the workload under consideration. For the flash storage configuration, four sets of update transaction results are provided. One set uses an SS2540 as the redo log device and the other 3 use various flash based options for the redo log. The main complication with using the F5100 for redo logs arises from an inherent feature of the F5100; namely, the performance penalty for writes which are not aligned on 4K boundaries. The penalty is illustrated in Table 6.1 which shows both sequential and random write latencies for various write sizes. There are two details from Table 6.1 that should be emphasized. First, the penalties for misaligned random writes are much greater than for misaligned sequential writes of the same size. Second, as write sizes increase, the misalignment penalties (in percentage terms) decrease dramatically, especially in the sequential case. Since all database writes in Oracle are aligned on 4K boundaries (as long as the default block size is at least 4K), using flash for database tablespaces should never result in slower performance due to misaligned writes. Redo writes however, are only guaranteed to fall on 512 byte boundaries [5]. Redo writes also have a quasi-random access pattern when async I/O is employed. These two properties contribute to performance degradations for some workloads. Data illustrating this is shown in Table TABLE 6.1: 4K ALIGNED AND MISALIGNED WRITES TO F5100 FLASH MODULES WRITE SIZE (BYTES) AVG. SERVICE TIME SEQUENTIAL CASE (MS) AVG. SERVICE TIME RANDOM CASE (MS) **4096** **8192** **16384** **32768** **65536** ** ** Values surrounded by ** are multiples of 4K 6.3 Read Only Workloads The following 3 tables compare flash and disk performance for read-only OLTP workloads. Table 6.2 shows the disk-flash throughput and response time comparisons for oltp_readonly1. Tables 6.3 and 6.4 show the analogous comparisons for oltp_readonly2 and oltp_readonly4. All 3 tables show that at lightly CPU utilized levels the use of flash storage provides significant performance benefits up to 8
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!