An Oracle White Paper September, Enterprise Manager 12c Cloud Control: Monitoring and Managing Oracle Coherence for High Performance - PDF

Please download to get full document.

View again

of 19
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Economy & Finance

Published:

Views: 15 | Pages: 19

Extension: PDF | Download: 0

Share
Related documents
Description
An Oracle White Paper September, 2012 Enterprise Manager 12c Cloud Control: Monitoring and Managing Oracle Coherence for High Performance Executive Overview... 2 Introduction... 2 Abstract Data Layer...
Transcript
An Oracle White Paper September, 2012 Enterprise Manager 12c Cloud Control: Monitoring and Managing Oracle Coherence for High Performance Executive Overview... 2 Introduction... 2 Abstract Data Layer... 2 Managing Oracle Coherence with Oracle Enterprise Manager... 3 Oracle Coherence Topology and Health Dashboard... 3 Log Alerts... 5 Monitoring and Diagnostics... 5 Cluster Stability... 7 Node Memory Performance... 7 Planning Storage Capacity... 8 Network Performance Bottlenecks... 9 Optimize Query Performance Cache Load and Performance Cache Data Management User Defined Metrics JVM Diagnostics for Deep JVM Runtime Visibility Threads Monitoring Differential Heap Analysis Configuration and Change Management Automating Discovery and Tracking Assets Detecting Configuration Changes Lifecycle Management and Provisioning Automation Monitoring Oracle Coherence Clusters on Exalogic Elastic Cloud Conclusion... 17 Executive Overview Oracle Enterprise Manager is Oracle s integrated enterprise IT management product line and provides the industry s first complete cloud lifecycle management solution. Oracle Enterprise Manager s Business-Driven IT Management capabilities allow you to quickly set up, manage and support enterprise clouds and traditional Oracle IT environments from applications to disk. Oracle Enterprise Manager allows customers to achieve: Best service levels for traditional and cloud applications through management from a business perspective including Oracle Fusion Applications Maximum return on IT management investment through the best solutions for intelligent management of the Oracle stack and engineered systems Unmatched customer support experience through real-time integration of Oracle s knowledgebase with each customer environment Introduction Oracle Enterprise Manager s Fusion Middleware Management solutions provide full-lifecycle management for Oracle WebLogic, SOA suite, Coherence, Identity Management, WebCenter Suite, and Business Intelligence Enterprise Edition. Oracle Enterprise Manager provides a single console to manage these assets from a business and service perspective, including user experience management, change and configuration management, patching, provisioning, testing, performance management, business transaction management, and automatic tuning for these diverse environments. Abstract Data Layer Oracle Coherence is an in-memory data grid solution that provides linear scalability, reliability, and high performance to applications. Enterprises are increasingly designing mission critical applications around Oracle Coherence as an abstract data layer. Oracle Fusion Middleware provides the runtime engine and platform for mission-critical Java EE, SOA and middleware applications. Oracle Coherence is designed to work seamlessly with Oracle WebLogic Server and Oracle SOA Suite to support mission critical business applications. In some cases, enterprises share a single Oracle Coherence cluster across a host of applications in a business unit while in other cases they prefer a oneto-one relationship between an application and a cluster. 2 Since direct revenue is routinely impacted by business applications, monitoring and managing the service levels of these applications has become that increasingly more critical. Traditionally administrators have to rely on various scripts to manage and monitor the middleware environment. Some use different point solutions to manage different tiers of the applications stack. Given all the investment, the risk involved in managing IT systems using scripts and point solutions is too high. Therefore, for effective application performance monitoring, enterprises need a solution that provides performance visibility into all tiers of applications stack, proactive notifications of potential issues, an ability to perform deep diagnostics in each tier, and an ability to perform cross-tier diagnostics. These facets are necessary across all stages of the application lifecycle. Managing Oracle Coherence with Oracle Enterprise Manager Oracle Coherence provides JMX interface that exposes valuable information about the runtime health of the cluster via several MBeans. These MBeans can be accessed using JConsole or any JMX based point solution. However, just showing the MBean values doesn t provide all the useful information. The MBeans are organized based on nodes (or JVMs) for different Oracle Coherence resources. It is important to intelligently collate these MBeans, and aggregate where necessary, to provide meaningful monitoring data. E.g. JMX exposes cache metrics per node at that point of time. The real value is provided when administrator is able to see aggregation of cache metrics across all the nodes over period of time (e.g. 2 hours, 24 hours, etc). There are several cases where such aggregation and trending is necessary to identify performance hotspot. Oracle Coherence is a multi-threaded environment. Although, JMX provides plenty of valuable information about the health of the node and sophisticated tools can extract more value from it, there are some cases where deep visibility into the JVM runtime is necessary. Similarly, JMX doesn t provide much help for developers to optimize the Oracle Coherence queries ahead of time. Apart from monitoring and diagnostics there are several other aspects to administration which are key to success of production deployment i.e. configuration management and cache data or operations management. The sections below provide details into Oracle Enterprise Manager s comprehensive monitoring and management solution for Oracle Coherence. Oracle Coherence Topology and Health Dashboard After discovering your Oracle Coherence clusters with Oracle Enterprise Manager, Oracle Enterprise Manager immediately starts monitoring the environment with a predefined set of status and performance metrics. Oracle Enterprise Manager performs several aggregations and correlations on the raw JMX statistics. Based on the relationships between the components Oracle Enterprise Manager creates a topology view. The topology shows the cluster, caches, nodes, the Hosts where the nodes are running, and Oracle WebLogic servers (for HTTP Session caching). Administrators can drill down to get a topology view of any particular cache or node. The topology view can show alerts, incidents and key performance indicators for any entity in the topology as shown in Figure 1. These alerts are automatically generated by Oracle Enterprise Manager based on the performance thresholds. In production, administrators can leverage several framework features of Oracle Enterprise Manager. For instance, custom monitoring template can be created to set thresholds on key metrics. The template 3 can then be applied to all nodes in a cluster. This simplifies setting of thresholds in a large deployment. Similarly, incident rules can be defined to generate notifications, such as SNMP or , in response to an alert. Some availability and performance thresholds are defined out-of-box. For example, an alert is generated when a cache server node goes down or when service becomes ENDANGERED. Figure 1 Topology view provides high level visibility into the cluster and alerts across different resources. Oracle Enterprise Manager provides out-of-box alert setting for ENDANGERED services. This alert indicates that a crash of any of the storage enabled nodes could result in loss of data. Ideally the service status should be MACHINE-SAFE. This status indicates that the data loss will not happen even if a machine crashes. The NODE-SAFE status indicates that data will not be lost if a node crashes but data loss can take place if a machine crashes. As can be seen in the Figure 2, the Services table shows all the services along with their type (e.g. Invocation, Distributed, Replicated, etc), status, and number of storage enabled nodes, number of endangered nodes, and number of active transactions, if any. The cluster home page dashboard shows several vital details about the overall health of the cluster. Administrators get a complete picture about bad caches, nodes uptime, top hosts based on CPU or Memory, etc. Oracle Enterprise Manager highlights top caches in the cluster that have lowest Hits to Gets Ratio (%). Such data is not available from raw JMX. Each miss on get operation leads to overhead on the cache to communicate with backend Database and high response time for the application. Administrators can drill down to such a cache for more detailed diagnostics. 4 Figure 2 Oracle Coherence cluster home page provides a dashboard view into overall health of the cluster such as top caches, service status, hosts, etc. Log Alerts Not all events are exposed via JMX. Some events are registered in the log files. By default, Oracle Coherence uses stdout as a log destination. If Oracle Coherence is running on an Oracle WebLogic server then the container log will be used for Oracle Coherence log messages. However, if the Oracle Coherence is running as a standalone Java server then administrators need to specify the log file path using tangosol.coherence.log system property. The log level can be set using tangosol.coherence.log.level system property. Oracle Enterprise Manager provides log alert feature than enables administrators to define a pattern on a log file and get an alert when the pattern is found in the log. Administrators can also define number of times the pattern has to match before an alert is generated. Moreover, similar to other alerts, administrators can define notification rules to get an , generate SNMP trap, etc. Monitoring and Diagnostics From an operational point of view there is a difference between monitoring and diagnostics. Monitoring involves checking the cluster status and health using alerts and performance trends for a set of important health metrics. Oracle Enterprise Manger allows organizations to create a custom performance view with such metrics and use it as the default page for the console. Metrics such as number of storage enabled nodes and total number of nodes (including process nodes) indicate how stable the cluster is. Aggregate memory consumed and aggregate memory available metrics can be used to make sure the cluster is not running out of storage capacity. Send/receive success rate per minute metrics show the network performance based on every sample. 5 Figure 3 Cluster level monitoring view showing performance trends of metrics indicating overall health Monitoring solution should have an ability to separate one-time events from potential performance issues. Also, administrators should not have to watch the console 24x7. Oracle Enterprise Manager provides ability to set thresholds in such a way that alert will be generated only if the metric value crosses the threshold repeatedly. Although, in some cases, administrators may choose to get notified at the first occurrence of an event. Historical monitoring of the metrics allows administrators to observe the performance charts over period of time with respect to the thresholds. The data collected from the trend analysis can be used to fine tune configuration and plan capacity. Figure 3 shows the trend of the selected metrics that can be used to get high level visibility into the health of the cluster. You can overlay related metrics on the same charts for better analysis. Diagnostics, on the other hand, typically involves finding root-cause of application slowness, performance bottlenecks in a cache, set of nodes, hosts, etc. Diagnostics will typically be triggered by alerts. In some cases Oracle Coherence queries used in application may not be efficient and may require turning. Database slowness can severely impact the persistence of the Oracle Coherence. Often, Oracle Coherence administrators don t have visibility into the Database tier. Diagnosing crosstier issues in such cases is a real challenge. While diagnosing performance issues administrators may also need to check the configuration of the Oracle Coherence as well as the Host. Bad configuration of JVMs, cluster services and Hosts can lead to availability and performance issues for applications. It is imperative for administrators to consider these aspects for managing mission critical Oracle Coherence applications. Unlike point solutions, Oracle Enterprise Manager provides end-to-end management, monitoring and diagnostics for Oracle Coherence. Administrators can slice and dice the performance metrics across various resources such as JVMs, caches, services, Hosts, and Database. JVM Diagnostics is particularly useful in real-time diagnostics as it collects JVM runtime information at a high sampling rate (even at 2 seconds interval). 6 Cluster Stability Oracle Coherence can be thought of as a data cloud that enables nodes (JVMs) and even machines to dynamically join or leave the cluster. Oracle Coherence treats each node that joins the cluster as a new node. Oracle Coherence doesn t keep an identity of the departed or dead node. That means, even if the crashed node is started again with exactly same configuration and name the Oracle Coherence treats it as a completely new node. Although this provides flexibility to Oracle Coherence, it poses a big challenge from management point of view. If management tool doesn t recognize the node across the lifecycle then administrators will not be able to find the performance and configuration of the node across lifecycle. Administrators need to know about the nodes that are leaving the cluster or crashing, their performance before their crash, changes in their configuration, etc. Without knowing the performance and configuration of the node across the lifecycle administrators can t effectively address the cluster stability issues. Oracle Enterprise Manager provides a unique ability to track historical performance and configuration of the nodes across lifecycles. Administrators can track every node, time it went down, time it came up again and all the performance metrics across the selected timeline. JVM Diagnostics also maintains the history of the JVM s runtime performance across lifecycles. Most of the tools don t provide such ability to track performance of the node across lifecycles. Node Memory Performance One of the critical parameters that affect every Java program is the heap. Heap utilization and garbage collection (GC) directly impact the performance of the nodes. Typically, the heap utilization chart should look like a saw tooth. The fall from high to low level of heap utilization in such a chart indicates a GC cycle. The heap utilization chart may remain stable if the cache data is not changing (e.g. readonly caches). However, the JVM may eventually crash due to out of memory error if the heap utilization continues to grow and GC is unable to clean up the heap. There are two types of GCs, minor and major (sometimes called as full GC ). Minor GC is not a stop all GC and has very low overhead, it clears the garbage objects from new generation section of the heap. On the other hand, Major or Full GC is a stop all GC and inflicts very high overhead. No other request is served by the JVM when the Major GC kicks in and that means that the node is not able to communicate with rest of the cluster while the GC is on. If the Major GC occurs frequently on a node, Oracle Coherence will most likely drop that node from the cluster. Oracle Enterprise Manager collects GC performance metrics such as GC overhead (%), Minor GC (invocations/min) and Major GC (invocations/min) for each JVM. Administrators can also compare the trends of these metric across several nodes. This helps in identifying heap issues that are common across a cluster vs. the issues that are specific to a particular node. 7 Figure 4 Node heap performance charts along with CPU usage (%) and number of active threads Planning Storage Capacity A node can host multiple caches. Administrators can use Units metric to find out the memory consumed by each cache on a node. If the UnitCalculator is set to BINARY the Units metric indicates the number of bytes consumed by cache entries. Administrator can use HighUnits value as a performance threshold for Units metric. HighUnits is configured per caching scheme to limit the number of units that can be placed in the cache before pruning occurs. Please note that the Units metric doesn t indicate the memory consumed by the indexes. It is extremely important to consider the following parameters while planning storage capacity Total size of the cache entries * 2 (assuming one backup copy) Size of the indexes JVM footprint Max heap size of the JVM Number of nodes (n) on a machine Available memory on a machine and Number of machines (m) 8 To ensure high availability and performance service levels there should be one extra node (n + 1) on every machine to absorb an impact of a node crash and one extra machine (m + 1) to handle a machine crash. Max JVM heap size is defined by Xmx command line parameter. To avoid overhead of incremental heap expansion the initial heap size (-Xms) should be set equal to (-Xmx). In general, the total heap utilization of the node should remain under 70%. Beyond this level GC starts having an adverse impact on the performance. Also, enough free memory should be available on the machine after taking into account all running processes. It is important to note that the swapping significantly impacts the Oracle Coherence performance. If a node gets swapped out of the memory it could be removed from the cluster for unresponsiveness. Oracle Enterprise Manager s host monitoring provides visibility into the memory performance including swapping and paging metrics. Administrators can use performance thresholds to be proactive and avoid cluster stability issues that can occur due to host resource constraints. Additionally, Oracle Enterprise Manager shows, real-time and historical, top ten processes (ordered by memory and CPU) along with their resource utilization on the Host. Figure 5 Host memory performance and top 10 processes ordered by memory consumption Network Performance Bottlenecks Monitoring network performance of the cluster is very crucial. Administrators can monitor the publisher/receiver success rate since the start of the node is exposed by JMX out-of-box. But more useful metrics are packets send/receive success rate per minute which is based on delta of the samples. Such metrics provided by Oracle Enterprise Manager add real value from operations point of view. Ideally the send/receive success rate should be close to 100%. Drop in the values of these metrics indicate some performance bottleneck. The network bottlenecks could be caused by high network latency or high packet drop rate. Prolonged GC on a node can also cause the send/receive success rate to go down. Oracle Enterprise Manager collects several such network performance metrics for each node. Send Queue Size indicates number of packets in a queue including those for which the acknowledgment has not yet been received. Packets that do not receive an acknowledgment within ResendDelay interval will be automatically resent. Set ResendDelay with careful observation of the 9 network performance. Setting this value too low can overflow the network with unnecessary repetitions. Setting the value too high can increase the overall latency by delaying the re-sends of dropped packets. Use the datagram test to find the network performance and fine tune the configuration based on that. Observe the PacketDeliveryEfficiency metric, low value is an indication that there is a high rate of unnecessary packet retransmissions. Network Performance of the Host Network performance issues can also be related to the configuration and performance of the Host. Two important parameters to watch out for are - buffer size of the OS and Maximum Transfer Units (MTU) of the network card. The OS buffers must be large enough to handle incoming network traffic while the node JVM is paused during
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x