Friday, December 11, 2009

Deciding to use DomainHealth

DomainHealth (DH) is an open source "zero-config" monitoring tool for WebLogic. It collects important server metrics over time, archives these into CSV files and provides a simple web interface for viewing graphs of current and historical statistics.

So when should you consider using DomainHealth?


When you don't have a full monitoring solution in place already for your WebLogic environment. Ideally an organisation will have standardised on using Oracle Enterprise Manager or a 3rd party management tool, for its comprehensive monitoring needs. Oracle Enterprise Manager also caters for various management requirements in addition to monitoring, which DH does not. The scope of DH's monitoring capabilities is purely focussed on tracking current and historic usage of certain key server resources. DH is not applicable for profiling your deployed applications (instead see Oracle AD4J) or intensively monitoring your JVM (instead see Oracle JRockit Mission Control). However, some organisations using Enterprise Manager may find that DH acts as a convenient and complementary addition to their monitoring solution.

The most comparable tool to DomainHealth is the WebLogic Diagnostic Framework Console Extension which ships as standard with WebLogic. I use both of these tools in my day to day work, where different situations and requirements dictate the use of one over the other.

I use DomainHealth (with its built-in WLDF harvesting mechanism) rather than the WLDF Console Extension, in situations where some of the following factors are advantageous:
  1. Zero-configuration. An administrator does not have to first work out and configure the server objects to monitor. An administrator does not have to spend time creating a complex WLDF module with the correct object types, names and attributes in it. An administrator does not have to work out what graphs to plot and then configure specific graphs for each server in the domain for every new environment (eg. Test, Pre-Prod, Prod1, Prod2).
  2. Minimal performance impact on Managed Servers. Obtains a set of statistics once and ONLY once, regardless of how many times you come back to view the same statistics in the graphs. The background statistics collection work is driven from the admin server, once per minute, lasting far less than a second.
  3. Tools friendly storage of statistics in CSV files. Administrators can open the CSVs in MS Excel or Open Office for off-line analysis and graphing. Using CSV files rather than WebLogic Persistent File Stores on the admin server has no detrimental performance impact. It doesn't matter if it takes 10 microseconds or 100 milliseconds to persist the set of statistics - timeliness only has to be to the nearest minute. The file I/O for writing data to CSV files on the admin server is not in the 'flight-path' of transactions that happen to be racing through the managed servers.
  4. Minimal administrator workstation pre-requisites. Doesn't require Java Applet support on the administrator's workstation; it's browser-friendly and just uses very simple HTML and PNG images to display graphs.
  5. Hot deployable. Deployable to an already running domain for diagnosis of currently occurring problems, without needing to restart the admin server.
  6. Statistics don't constantly scroll whilst trying to analyse them. Administrators can focus in on the current window of time or an historic window of time, in a graph, without it continuously refreshing and moving to a later time. A simple set of navigation buttons is provided to move backwards or forwards in time or just go to the most current time.
  7. Statistics can be viewed for non-running Managed Servers. If a managed server has has just died, graphs of its recent statistics can still be viewed to help diagnose the failure cause, without first requiring the managed server to be recovered and re-started.
I use the WLDF Console Extension rather than DomainHealth when some of the following factors are advantageous:
  1. Infinitely configurable. Administrators get to choose exactly what server resources they want to monitor.
  2. Fine-grained statistics capture. Statistics are gathered and displayed at a much higher frequency than just once every minute.
  3. Shipped as part of WebLogic. No need for an administrator to seek corporate approval to download and provision a 3rd party open source application into the organisation's WebLogic environment.
  4. Statistics can be retrieved for the periods of time when the Admin Server was down. As long as an administrator has previously configured a WLDF module with the right harvested objects and attributes, statistics can still be retrieved retrospectively, by the Console's graphs, following a period of time when the admin server was down and unable to contact the managed servers.
So if DomainHealth sounds like it would be useful, give it a try and let me know your feedback in the forums provided on the project's site (you have to first click the Develop menu option for some reason that only SourceForge knows!).

DomainHealth project home page: http://sourceforge.net/projects/domainhealth

DomainHealth help documentation: http://sourceforge.net/apps/mediawiki/domainhealth


Song for today: Gimme Shelter by The Rolling Stones

2 comments:

AdamD said...

Paul,

Just found DH today and will have a play tomorrow and see if it meets our fairly high level needs.

We have also set up WLDF which seems a bit intermittent! It doesnt display properly in IE or Opera. The panes are very small and cant be re-sized. Any ideas?

Seems to work most of the time in Firefox.

Cheers

Adam

Anil said...

Any plans for adding windows on your WLHostMachineStats?
Rgds
Anil