Friday, December 11, 2009

Deciding to use DomainHealth

DomainHealth (DH) is an open source "zero-config" monitoring tool for WebLogic. It collects important server metrics over time, archives these into CSV files and provides a simple web interface for viewing graphs of current and historical statistics.

So when should you consider using DomainHealth?


When you don't have a full monitoring solution in place already for your WebLogic environment. Ideally an organisation will have standardised on using Oracle Enterprise Manager or a 3rd party management tool, for its comprehensive monitoring needs. Oracle Enterprise Manager also caters for various management requirements in addition to monitoring, which DH does not. The scope of DH's monitoring capabilities is purely focussed on tracking current and historic usage of certain key server resources. DH is not applicable for profiling your deployed applications (instead see Oracle AD4J) or intensively monitoring your JVM (instead see Oracle JRockit Mission Control). However, some organisations using Enterprise Manager may find that DH acts as a convenient and complementary addition to their monitoring solution.

The most comparable tool to DomainHealth is the WebLogic Diagnostic Framework Console Extension which ships as standard with WebLogic. I use both of these tools in my day to day work, where different situations and requirements dictate the use of one over the other.

I use DomainHealth (with its built-in WLDF harvesting mechanism) rather than the WLDF Console Extension, in situations where some of the following factors are advantageous:
  1. Zero-configuration. An administrator does not have to first work out and configure the server objects to monitor. An administrator does not have to spend time creating a complex WLDF module with the correct object types, names and attributes in it. An administrator does not have to work out what graphs to plot and then configure specific graphs for each server in the domain for every new environment (eg. Test, Pre-Prod, Prod1, Prod2).
  2. Minimal performance impact on Managed Servers. Obtains a set of statistics once and ONLY once, regardless of how many times you come back to view the same statistics in the graphs. The background statistics collection work is driven from the admin server, once per minute, lasting far less than a second.
  3. Tools friendly storage of statistics in CSV files. Administrators can open the CSVs in MS Excel or Open Office for off-line analysis and graphing. Using CSV files rather than WebLogic Persistent File Stores on the admin server has no detrimental performance impact. It doesn't matter if it takes 10 microseconds or 100 milliseconds to persist the set of statistics - timeliness only has to be to the nearest minute. The file I/O for writing data to CSV files on the admin server is not in the 'flight-path' of transactions that happen to be racing through the managed servers.
  4. Minimal administrator workstation pre-requisites. Doesn't require Java Applet support on the administrator's workstation; it's browser-friendly and just uses very simple HTML and PNG images to display graphs.
  5. Hot deployable. Deployable to an already running domain for diagnosis of currently occurring problems, without needing to restart the admin server.
  6. Statistics don't constantly scroll whilst trying to analyse them. Administrators can focus in on the current window of time or an historic window of time, in a graph, without it continuously refreshing and moving to a later time. A simple set of navigation buttons is provided to move backwards or forwards in time or just go to the most current time.
  7. Statistics can be viewed for non-running Managed Servers. If a managed server has has just died, graphs of its recent statistics can still be viewed to help diagnose the failure cause, without first requiring the managed server to be recovered and re-started.
I use the WLDF Console Extension rather than DomainHealth when some of the following factors are advantageous:
  1. Infinitely configurable. Administrators get to choose exactly what server resources they want to monitor.
  2. Fine-grained statistics capture. Statistics are gathered and displayed at a much higher frequency than just once every minute.
  3. Shipped as part of WebLogic. No need for an administrator to seek corporate approval to download and provision a 3rd party open source application into the organisation's WebLogic environment.
  4. Statistics can be retrieved for the periods of time when the Admin Server was down. As long as an administrator has previously configured a WLDF module with the right harvested objects and attributes, statistics can still be retrieved retrospectively, by the Console's graphs, following a period of time when the admin server was down and unable to contact the managed servers.
So if DomainHealth sounds like it would be useful, give it a try and let me know your feedback in the forums provided on the project's site (you have to first click the Develop menu option for some reason that only SourceForge knows!).

DomainHealth project home page: http://sourceforge.net/projects/domainhealth

DomainHealth help documentation: http://sourceforge.net/apps/mediawiki/domainhealth


Song for today: Gimme Shelter by The Rolling Stones

Thursday, December 10, 2009

New DomainHealth WebLogic monitoring tool version

I've just released the latest version of DomainHealth - version 0.8 (well actually 0.8.1 because of a fix for a last minute bug spotted by Kris).

DomainHealth (DH) is an open source "zero-config" monitoring tool for WebLogic. It collects important server metrics over time, archives these into CSV files and provides a simple web interface for viewing graphs of current and historical statistics.

You can download it from the project home page at http://sourceforge.net/projects/domainhealth.

The help docs for DomainHealth are at: http://sourceforge.net/apps/mediawiki/domainhealth.


This release includes many minor fixes and enhancements (see the Release Notes document listed alongside the DH download file), plus the following major additions:
  • Now provides the ability to harvest and retrieve server statistics using a WLDF module (configured on the fly by DH), rather than using JMX to poll each server for statistics. This is now the default behaviour when running on WLS version 10.3.x or greater. For WLS versions 9.0 to 10.0.x, it still uses JMX Polling. If you prefer to use JMX Polling for the recent WLS versions, you can force this behaviour with a special parameter (see the Help docs). It is worth noting that, although I don't believe the load that the periodic JMX Polling puts on the managed servers (once a minute), is noticeable, I was still keen for DH to move to use WLDF by default. This way, DH acts as a WebLogic 'good citizen' and is also able to better cope with the increased number of MBean statistics that inevitably come with each new DH release.
  • Now shows a lot more interesting Thread Pool statistics on the main page (including Throughput and QueueLength).
  • Previously, for domains with many servers, it was difficult to drill into the statistics for just one specific server at a time, in the graphical web pages. Now you have the option to select which server you want to see on the web page, in isolation, by selecting the server's name from a drop down list.
  • When using the WLDF based mechanism for collecting metrics, statistics for all Work Managers and all Server Channels (protocol server sockets) are now also retrieved and displayed. I have not added this capability for the JMX Polling based mechanism because I'm wary of putting too much load on each managed server during the polling process (I may revisit this decision at a later date).

Song for today: Touched by VAST