Thursday, March 3, 2011

Exalogic DCLI - run commands on all compute nodes at once

Exalogic includes a tool called DCLI (Distributed Command Line Interface) that can be used to run the same commands on all or a subset of compute nodes in parallel. This saves a lot of time and helps avoid the sorts of silly errors that often occur when running a command over and over again. DCLI is a tool that originally came with Exadata (as documented in the Oracle Exadata Storage Server Software User's Guide - E13861-05 chapter 9), and is now incorporated into the new Exalogic product too. It is worth noting that if you are ever involved in performing the initial configuration of a new Exalogic rack, using OneCommand to configure the Exalogic's networking, then under the covers OneCommand will be using DLCI to perform a lot of its work.
Introduction to Exalogic's DCLI
The Oracle Enterprise Linux 5.5 based factory image running on each Exalogic compute node has the exalogic.tools RPM package installed. This contains the DCLI tool in addition to other useful Exalogic command line utilities. Running 'rpm -qi exalogic.tools' on a compute node shows the following package information:
Name : exalogic.tools
Version : 1.0.0.0
Release : 1.0
When you run 'rpm -ql exalogic.tools' you will see that the set of command line utilities are all placed in a directory at '/opt/exalogic.tools'. Specifically, the DCLI tool is located at '/opt/exalogic.tools/tools/dcli'.

Running DCLI from the command line with the '-h' argument, will present you with a short help summary of DCLI and the parameters it can be given:

# /opt/exalogic.tools/tools/dcli -h

If you look at the contents of the '/opt/exalogic.tools/tools/dcli' file you will see that it is actually a Python script that, essentially, determines the list of compute nodes that a supplied command should be applied to and then runs the supplied command on each compute node using SSH under the covers. Conveniently, the Python script also captures the output from each compute node and prints it out in the shell that DCLI was run from. The output from each individual compute node is prefixed by that particular compute node's name so that it is easy for the administrator to see if something untoward occurred on one of the compute nodes only.

A good way of testing DCLI, is to SSH to your nominated 'master' compute node in the Exalogic rack (eg. the 1st one), as root user, and create a file (eg. called 'nodelist') which contains the hostnames of all the compute nodes in the rack (separated by newlines). For example, my nodelist file has the following entries in the first 3 lines:

el01cn01
el01cn02
el01cn03
....

Note: You can comment out one or more hostnames with a hash ('#') if you want DCLI to ignore particular hostnames.

As a reminder on Exalogic compute node naming conventions, 'el01' is the Exalogic rack's default name and 'cn01' contains the number of the specific compute node in that rack.

Once you've created the list of target compute nodes for DCLI to distribute commands to, a nice test is to run a DCLI command that just prints the date-time of each compute node to the shell output of your master compute node (using the /bin/date Linux command). For example:

# /opt/exalogic.tools/tools/dcli -t -g nodeslist /bin/date
Example output:

Target nodes: ['el01cn01', 'el01cn02', 'el01cn03',....]
el01cn01: Mon Feb 21 21:11:42 UTC 2011
el01cn02: Mon Feb 21 21:11:42 UTC 2011
el01cn03: Mon Feb 21 21:11:42 UTC 2011
....

When this runs, you will be prompted for the password for each compute node that DCLI contacts using SSH. The '-t' option tells DCLI to first print out all the names of all nodes it will run the operation on, which is useful for double-checking that you are hitting the compute nodes you intended. The -g command provides the name of the file that contains the list of nodes to operate on (in this case, 'nodelist' in the current directory).


SSH Trust and User Equivalence

To use DCLI without being prompted for a password for each compute node that is contacted, it is preferable to first set-up SSH Trust between the master compute node and all the other compute nodes. DCLI calls this "user equivalence"; a named user on one compute node will then be assumed to have the same identity as the same named user on all other compute nodes. On your nominated 'master' compute node (eg. 'el01cn01'), as root user, first generate an SSH public-private key for the root user. For example:

# ssh-keygen -N '' -f ~/.ssh/id_dsa -t dsa

This places the generated public and private key files in the '.ssh' sub-directory of the root user's home directory (note, '' in the command is two single quotes)

Now run the DCLI command with the '-k' option as shown below which pushes the current user's SSH public key to each other compute node's '.ssh/authorized_keys' file to establish SSH Trust. You will again be prompted to enter the password for each compute node, but this will be the last time you will need to. With the '-k' option, each compute node is contacted sequentially rather than in parallel, to give you chance to enter the password for each node in turn.

# /opt/exalogic.tools/tools/dcli -t -g nodeslist -k -s "\-o StrictHostKeyChecking=no"

In my example above, I also pass the SSH option 'StrictHostKeyChecking=no' so you avoid being prompted with the standard SSH question "Are you sure you want to continue connecting (yes/no)", for each compute node that is contacted. The master compute node will then be added to the list of SSH known hosts on each other compute node, so that this yes/no question will never occur again.

Once the DCLI command completes you have established SSH Trust and User Equivalence. Any subsequent DCLI commands that you issue, from now on, will occur without you being prompted fo passwords.

You can then run the original date-time test again, to satisfy yourself that SSH Trust and User Equivalence is indeed established between the master compute node and each other compute node and that no passwords are prompted for.

# /opt/exalogic.tools/tools/dcli -t -g nodeslist /bin/date

Useful Examples

Now lets have a look at some examples common DCLI commands you might need to issue for your new Exalogic system.

Example 1 - Add a new OS group to each compute node called 'oracle' with group id 500:

# /opt/exalogic.tools/tools/dcli -t -g nodeslist groupadd -g 500 oracle

Example 2 - Add a new OS user to each compute node called 'oracle' with user id 500 as a member of the new 'oracle' group:

# /opt/exalogic.tools/tools/dcli -t -g nodeslist useradd -g oracle -u 500 oracle

Example 3 - Set the password to 'welcome1' for the OS 'root' user and the new 'oracle' user on each compute node (this uses another feature of DCLI where, if multiple commands need to be run in one go, they can be added to a file, which I tend to suffix with '.scl' in my examples - 'scl' is the convention for 'source command line', and the '-x' parameter is provided to tell DCLI to run commands from the named file):

# vi setpasswds.scl
echo welcome1 | passwd root --stdin
echo welcome1 | passwd oracle --stdin
# chmod u+x setpasswds.scl
# /opt/exalogic.tools/tools/dcli -t -g nodeslist -x setpasswds.scl

Example 4 - Create a new mount point directory and definition on each compute node for mounting the common/general NFS share which exists on Exalogic's ZFS Shared Storage appliance (the hostname of the HA shared storage on Exalogic's internal InfiniBand network in my example is 'el01sn-priv') and then from each compute node, permanently mount the NFS Share:

# /opt/exalogic.tools/tools/dcli -t -g nodeslist mkdir -p /u01/common/general
# /opt/exalogic.tools/tools/dcli -t -g nodeslist chown -R oracle:oracle /u01/common/general
# vi addmount.scl
cat >> /etc/fstab << EOF
el01sn-priv:/export/common/general /u01/common/general nfs rw,bg,hard,nointr,rsize=131072,wsize=131072,tcp,vers=3 0 0
EOF
# chmod u+x addmount.scl
# /opt/exalogic.tools/tools/dcli -t -g nodeslist -x addmount.scl
# /opt/exalogic.tools/tools/dcli -t -g nodeslist mount /u01/common/general


Running DCLI As Non-Root User

In the default Exalogic set-up, DCLI executes as root user when issuing all of its commands regardless of what OS user's shell you use to enter the DCLI command from. Although root access is often necessary for creating things like OS users, groups and mount points, it is not desirable if you just want to use DCLI to execute non-privileged commands under a specific OS user on all computes nodes. For example, as a new 'coherence' OS user, you may want the ability to run a script that starts a Coherence Cache Server instance on every one of the compute nodes in the Exalogic rack, in one go, to automatically join the same Coherence cluster.

To enable DCLI to be used under any OS user and to run all its distributed commands on all compute nodes, as that OS user, we just need to make a few simple one-off changes on our master compute node where DCLI is being run from...

1. As root user, allow all OS users to access the Exalogic tools directory that contains the DCLI tool:

# chmod a+x /opt/exalogic.tools/tools

2. As root user, change the permissions of the DCLI tool to be executable by all users:

# chmod a+x /opt/exalogic.tools/tools/dcli

3. As root user, modify, the DCLI python script (/opt/exalogic.tools/tools/dcli) using 'vi' and replace the line....

USER_ID="root"

...with the line...

USER_ID=pwd.getpwuid(os.getuid())[0]

This script line uses some Python functions to set the DCLI user id to the name of the current OS user running the DCLI command, rather than the hard-coded 'root' username.

4. Whilst still editing the file using vi, add the following Python library import command near the top of the DCLI Python script to enable the 'pwd' Python library to be referenced by the code in step 3.

import pwd

Now log-on to your master compute node as your new non-root OS user (eg. 'coherence' user) and once you've done the one-off setup of your nodelist file and SSH-Trust/User-Equivalence (as described earlier), you will happily be able run DCLI commands accross all compute nodes as your new OS user.

For example, for a test Coherence project I've been playing with recently, I have a Cache Server 'start in-background' script in a Coherence project located on my Exalogic's ZFS Shared Storage. When I run script using the DCLI command below, from my 'coherence' OS user shell on my master compute node, 30 Coherence cache servers instances are started immediately, almost instantly forming a cluster across the compute nodes in the rack.

# /opt/exalogic.tools/tools/dcli -t -g nodeslist /u01/common/general/my-coh-proj/start-cache-server.sh

Just for fun I can run this again to allow 30 more Coherence servers to start-up and join the same Coherence cluster, now containing 60 members.


Summary

As you can see DCLI is pretty powerful yet very simple in both concept and execution!


Song for today: Death Rays by Mogwai