This section provides release notes that are specific to the TruCluster Server software patches in this kit. See “Cluster-Specific Installation and Removal Release Notes” for important about information about installing or removing Version 5.1B-4.
Select Option to Check Tagged Files |
 |
During the preinstall stage of a rolling upgrade, you have the option of checking tagged files. You should override the default setting and select the check tag option. The reason for selecting this option is described in “Check for Tagged Files if Messages Are Displayed”.
Check for Tagged Files if Messages Are Displayed |
 |
When installing this patch kit during a rolling upgrade, you may see the following error and warning messages during the setup stage:
Creating tagged files.
*** Error ***
The tar commands used to create tagged files in the '/usr' file system have
reported the following errors and warnings:
tar: lib/nls/msg/en_US.88591/ladebug.cat : No such file or directory
*** Warning ***
The above errors were detected during the cluster upgrade. If you believe that
the errors are not critical to system operation, you can choose to continue.
If you are unsure, you should check the cluster upgrade log and refer
to clu_upgrade(8) before continuing with the upgrade. |
If you see these messages during the setup stage, you should verify that the tagged files were properly created when you execute the preinstall stage.
In cases where the tagged files are not created, you can repeat the setup stage.
Noncritical Errors |
 |
During a rolling upgrade to install this patch kit , you may encounter the following noncritical situations:
The tagged file for ifaccess.conf (.Old..ifaccess.conf) may disappear. This error will not cause any problems with the rolling upgrade procedure or the installation of the kit. A message would alert you to this condition if you use the clu_upgrade undo command. Running the clu_upgrade -v check setup at the start of the procedure will fix this error.
When the worldwide language subset is installed, the file wwinstall will attempt to be tagged and will fail. This error will not affect the operational status of the cluster.
Unrecoverable Failure Procedure |
 |
The procedure to follow if you encounter unrecoverable failures while running dupatch during a rolling upgrade has changed. The new procedure calls for you to run the clu_upgrade -undo install command and then set the system baseline. The procedure is explained in the Patch Kit Installation Instructions as notes in Section 5.3 and Section 5.6.
Do Not Add or Delete OSF, TCR, IOS, or OSH Subsets During Roll |
 |
During a rolling upgrade, do not use the /usr/sbin/setld command to add or delete any of the following subsets:
Base Operating System subsets (those with the prefix OSF).
TruCluster Server subsets (those with the prefix TCR).
Worldwide Language Support (WLS) subsets (those with the prefix IOS).
New Hardware Delivery (NHD) subsets (those with the prefix OSH).
Adding or deleting these subsets during a roll creates inconsistencies in the tagged files.
Undo Stages in Correct Order |
 |
If you need to undo the install stage, because the lead member is in an unrecoverable state, be sure to undo the stages in the correct order.
During the install stage, clu_upgrade cannot tell whether the roll is going forward or backward. This ambiguity incorrectly allows the clu_upgrade undo preinstall stage to be run before clu_upgrade undo install. Refer to the Patch Kit Installation Instructions for additional information on undoing a rolling patch.
clu_upgrade undo of Install Stage Can Result in Incorrect File Permissions |
 |
This note applies only when both of the following are true:
You are using installupdate, dupatch, or nhd_install to perform a rolling upgrade.
You need to undo the install stage; that is, to use the clu_upgrade undo install command.
In this situation, incorrect file permissions can be set for files on the lead member. This can result in the failure of rsh, rlogin, and other commands that assume user IDs or identities by means of setuid.
The clu_upgrade undo install command must be run from a nonlead member that has access to the lead member's boot disk. After the command completes, follow these steps:
Boot the lead member to single-user mode.
Run the following script:
#!/usr/bin/ksh -p
#
# Script for restoring installed permissions
#
cd /
for i in /usr/.smdb./$(OSF|TCR|IOS|OSH)*.sts
do
grep -q "_INSTALLED" $i 2>/dev/null && /usr/lbin/fverify -y <"${i%.sts}.inv"
done |
Rerun installupdate, dupatch, or nhd_install, whichever is appropriate, and complete the rolling upgrade.
For information about rolling upgrades, see the Patch Kit Installation Instructions and the installupdate(8) and clu_upgrade(8) reference pages.
Missing Entry Messages Can Be Ignored During Rolling Patch |
 |
During the setup stage of a rolling patch, you might see a message like the following:
Creating tagged files.
............................................................
clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.597530
clubase: Entry not found in /cluster/admin/tmp/stanza.stdin.597568 |
An Entry not found message will appear once for each member in the cluster. The number in the message corresponds to a PID.
You can safely ignore this Entry not found message.
Relocating AutoFS During a Rolling Upgrade on a Cluster |
 |
This note applies only to performing rolling upgrades on cluster systems that use AutoFS.
During a cluster rolling upgrade, each cluster member is singly halted and rebooted several times. The Patch Kit Installation Instructions direct you to manually relocate applications under the control of Cluster Application Availability (CAA) prior to halting a member on which CAA applications run.
Depending on the amount of NFS traffic, the manual relocation of AutoFS may sometimes fail. Failure is most likely to occur when NFS traffic is heavy. The following procedure avoids that problem.
At the start of the rolling upgrade procedure, use the caa_stat command to learn which member is running AutoFS. For example:
# caa_stat -t
Name Type Target State Host
------------------------------------------------------------
autofs application ONLINE ONLINE rye
cluster_lockd application ONLINE ONLINE rye
clustercron application ONLINE ONLINE swiss
dhcp application ONLINE ONLINE swiss
named application ONLINE ONLINE rye |
To minimize your effort in the following procedure, perform the roll stage last on the member where AutoFS runs.
When it is time to perform a manual relocation on a member where AutoFS is running, follow these steps:
Stop AutoFS by entering the following command on the member where AutoFS runs:
# /usr/sbin/caa_stop -f autofs |
Perform the manual relocation of other applications running on that member:
# /usr/sbin/caa_relocate -s current_member -c target_member |
After the member that had been running AutoFS has been halted as part of the rolling upgrade procedure, restart AutoFS on a member that is still up. (If this is the roll stage and the halted member is not the last member to be rolled, you can minimize your effort by restarting AutoFS on the member you plan to roll last.)
On a member that is up, enter the following command to restart AutoFS. (The member where AutoFS is to run, target_member, must be up and running in multi-user mode.)
# /usr/sbin/caa_startautofs -c target_member |
Continue with the rolling upgrade procedure.
Messages Displayed During Rolling Upgrade Can Be Ignored |
 |
You can ignore the following messages if you see them displayed during a rolling upgrade:
kill:1048674: no such process
This message may be displayed after the roll stage. For example:
# clu_upgrade roll
This is the cluster upgrade program.
⋮The 'roll' stage has completed successfully. This
member must be rebooted in order to run with the newly
installed software.
Do you want to reboot this member at this time? []:y
You indicated that you want to reboot this member at this time.
Is that correct? [yes]:
The 'roll' stage of the upgrade has completed successfully.
kill: 1048674: no such process
# |
rmdir: /var/.clu_upgrade: File exists
This message may be displayed after the clean stage. For example:
# clu_upgrade clean
This is the cluster upgrade program.
You have indicated that you want to perform the 'clean' stage
of the upgrade.
Do you want to continue to upgrade the cluster? [yes]:
⋮
Deleting tagged files.
.................................................................
.................................................................
.................................................................
.................................................................
...................................Removing back-up and kit files
rmdir: /var/.clu_upgrade: File exists
The 'clean' stage of the upgrade has completed successfully.
# |
Error on Cluster Creation |
 |
When you attempt to create a cluster after having deleted patches, you may see the following error messages:
*** Error ***
This system has only Tru64 UNIX patches installed.
Please install the latest TruCluster Server patches on your system.
You can obtain the most recent patch kit from:
http://www.support.compaq.com/patches/
*** Error ***
The system is not configured properly for cluster creation.
Please fix the previously reported problems, and then rerun the
'clu_create' command. |
If you see these messages, enter the following command:
# ls -tlr /usr/.smdb./*PAT*.sts |
If this command returns a file with 000000 in its name, you will have to run the clu_create command with the -f option to force the creation of your cluster. The problem is caused by the cluster software misinterpreting the existence of some patches and will be corrected in a future patch kit.
If the command does not return a file with 000000 in its name, you will need to contact HP support to determine the cause of the problem.
When Taking a Cluster Member to Single-User Mode, First Halt the Member |
 |
To take a cluster member from multiuser mode to single-user mode, first halt the member and then boot it to single-user mode. For example:
# shutdown -h now
>>> boot -fl s |
Halting and booting the system ensures that it provides the minimal set of services to the cluster and that the running cluster has a minimal reliance on the member running in single-user mode.
When the system reaches single-user mode, enter the following commands:
# /sbin/init s
# /sbin/bcheckrc
# /usr/sbin/lmf reset |
Login Failure Possible with C2 Security Enabled |
 |
Login failures may occur as a result of a rolling upgrade on systems with Enhanced Security (C2) enabled. The failures may be exhibited in two ways:
With the following error message:
Can't rewrite protected password entry for user |
With the following set of error messages:
login: Ignoring log file: /var/tcb/files/dblogs/log.00001: magic number 0, not 8
login: log_get: read: I/O error
Can't rewrite protected password entry for user |
The problem may occur after the initial reboot of the lead cluster member or after the rolling upgrade is completed and the clu_upgrade switch procedure has been run. The following sections describe the steps you can take to prevent the problem or correct it after it occurs.
You can prevent this problem by performing the following steps before beginning the rolling upgrade:
Disable the prpasswdd daemon from running on the cluster:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS` -disable" |
Stop the prpasswdd daemon on every node in the cluster:
# /sbin/init.d/prpasswd stop |
Perform the rolling upgrade procedure through the clu_upgrade switch step and reboot all the cluster members.
Perform one of the following actions:
If PRPASSWDD_ARGS did not exist before this upgrade (that is, if rcmgr get PRPASSWDD_ARGS at this point shows only -disable), then delete PRPASSWDD_ARGS:
# rcmgr -c delete PRPASSWDD_ARGS |
If PRPASSWDD_ARGS existed before this upgrade, then reset PRPASSWDD_ARGS to the original string:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS | sed 's/ -disable//'`" |
Check that PRPASSWDD_ARGS is now set to what you expect:
# rcmgr get PRPASSWDD_ARGS |
Start the prpasswdd daemon on every node in the cluster:
# /sbin/init.d/prpasswd start |
Complete the rolling upgrade.
If you have already encountered the problem, perform the following steps to clear it:
Restart the prpasswdd daemon on every node in the cluster:
# /sbin/init.d/prpasswd restart |
Reboot the lead cluster member.
Check to see if the problem has been resolved. If it has been resolved, you are finished. If you still see the problem, continue to step 4.
Try to force a change to the auth database by performing the following steps:
Use edauth to add a harmless field to an account, the exact commands depend on your editor. For example, pick an account that does not have a vacation set and add u_vacation_end:
# edauth
s/:u_lock@:/u_vacation_end#0:u_lock@:/
w
q |
Check to see that the u_vacation_end#0 field was added to the account:
Use edauth to remove the u_vacation_end#0 field from the account.
If the edauth commands fail, do not stop. Continue with the following instructions.
Check to see if the problem has been resolved. If it has been resolved, you are finished.
If you still see the problem, observe the following warning and continue to step 6.
Disable logins on the cluster by creating the file /etc/nologin:
Disable the prpasswdd daemon from running on the cluster:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS` -disable" |
Stop the prpasswdd daemon on every node in the cluster:
# /sbin/init.d/prpasswd stop |
Force a checkpoint of db_checkpoint, using the db_checkpoint command with the -1 (number 1) option :
# /usr/tcb/bin/db_checkpoint -1 -h /var/tcb/files |
Continue with the instructions even if this command fails.
Delete the files in the dblogs directory:
# rm -f /var/tcb/files/dblogs/* |
Force a change to the auth database, as follows:
Use the edauth command to add a harmless field to an account, the exact commands depend on your editor. For example, pick an account that does not have a vacation set and enter the following:
# edauth
s/:u_lock@:/u_vacation_end#0:u_lock@:/
w
q |
Check to see that the u_vacation_end#0 field was added to the account:
Use the edauth command to remove the u_vacation_end#0 field from the account.
If the edauth command was successful, perform one of the following actions:
If PRPASSWDD_ARGS did not exist before this upgrade (that is, if rcmgr get PRPASSWDD_ARGS at this point shows only -disable), then delete PRPASSWDD_ARGS:
# rcmgr -c delete PRPASSWDD_ARGS |
If PRPASSWDD_ARGS existed before this upgrade, then reset PRPASSWDD_ARGS to the original string:
# rcmgr -c set PRPASSWDD_ARGS \
"`rcmgr get PRPASSWDD_ARGS | sed 's/ -disable//'`" |
Check that PRPASSWDD_ARGS is now set to what you expect:
# rcmgr get PRPASSWDD_ARGS |
Start the prpasswdd daemon on every node in the cluster:
# /sbin/init.d/prpasswd start |
Re-enable logins on the cluster by deleting the file /etc/nologin:
Check to see if the problem has been resolved. If it has not, contact HP support.
File System Unmount Recommended if Message Is Displayed |
 |
Under certain error conditions, the following message may be seen during a relocation or failover, or during the boot of a member:
WARNING: Unable to failover /mnt: pfs and cfs fsids differ |
The result is that the fileset in question is now unserved in the cluster. For example:
# cfsmgr /mnt
Domain or filesystem name = /mnt
Server Status : Not Served |
If this occurs, we recommend that you immediately do the following:
Use the following command to unmount the files ystem:
# cfsmgr -u -p [mountpoint] |
If other mounted filesets exist in the same domain, unmount them (they should also be in the "Not Served" state):
For steps on checking an AdvFS domain, see the AdvFS Administration Guide, Section 6.3.1, steps 3-7.
Run diagnostics on the domain prior to remounting its file systems.
To verify the domain, you can use the AdvFS verify utility or the fixfdmn utility. If using fixfdmn, we recommend first running it with the -n option to see what errors are found prior to allowing fixfdmnn to fix them.
Once you have successfully verified the domain, remounting the domain's file systems in the cluster should succeed.
If the domain cannot be immediately verified, we recommend that you do not remount the original fileset until this can be done.
Tunable Attribute May Help Performance Problem |
 |
The tunable attribute cfs_clone_noccr, included in this patch kit , may correct a problem in which cluster fileset writes that occur simultaneously with reads of the fileset's clone on a cluster client (for example, during a backup) may result in performance degradation. This occurs most often when the clone file being read consists of many thousands of extents (for example, 20,000 or more).
If a degradation during cluster clone reads is noticeable (for example, the clone read appears to be hanging and requires a long time to complete), set the value of cfs_clone_noccr to 1 on the server of the given fileset. This sysconfig tunable attribute is set to 0 by default and should be changed only when the degradation is noticeable.
Note that all filesets with clones that are served by the node on which the attribute is set will also see this change. It may be advisable (though not required) to have those filesets whose clone files have fewer extents be served by a different node during the time the tunable attribute is set.
AlphaServer ES47 or AlphaServer GS1280 Hangs When Added to Cluster |
 |
If after running clu_add_member to add an AlphaServer ES47 or AlphaServer GS1280 as a member of a TruCluster the AlphaServer hangs during its first boot, try rebooting it with the original V5.1B generic cluster kernel, clu_genvmunix.
Use the following instructions to extract and copy the V5.1B cluster genvmunix from your original Tru64 UNIX kit to your AlphaServer ES47 or AlphaServer GS1280 system. In these instructions, the AlphaServer ES47 or AlphaServer GS1280 is designated as member 5. Substitute the appropriate member number for your cluster.
Insert the Tru64 UNIX Associated Products Disk 2 into the CD-ROM drive of an active member.
Mount the CD-ROM to /mnt. For example:
# mount -r /dev/disk/cdrom0c /mnt |
Mount the boot disk of the AlphaServer ES47 or AlphaServer GS1280 on its specific mount point; for example:
# mount root5_domain#root /cluster/members/member5/boot_partition |
Extract the original clu_genvmunix from the CD-ROM and copy it to the boot disk of the AlphaServer ES47 or AlphaServer GS1280 member.
# zcat < TCRBASE540 | ( cd /cluster/admin/tmp; \
tar -xf - ./usr/opt/TruCluster/clu_genvmunix)
# cp /cluster/admin/tmp/usr/opt/TruCluster/clu_genvmunix \
/cluster/members/member?/boot_partition/genvmunix
# rm /cluster/admin/tmp/usr/opt/TruCluster/clu_genvmunix |
Unmount the CD-ROM and the boot disk:
# umount /mnt
# umount /cluster/members/member5/boot_partition |
Reboot the AlphaServer ES47 or AlphaServer GS1280.
Problems with clu_upgrade Switch Stage |
 |
If the clu_upgrade switch stage does not complete successfully, you may see a message like the following:
versw: No switch due to inconsistent versions |
The problem can be due to one or more members running genvmunix, a generic kernel.
Use the command clu_get_info -full and note each member's version number, as reported in the line beginning
If a member has a version number different from that of the other members, shut down the member and reboot it from vmunix, the custom kernel. If multiple members have the different version numbers, reboot them one at a time from vmunix.
Data Protector Issues and Restrictions |
 |
The following sections describe issues and restrictions for Version 5.1 of the HP OpenView Storage Data Protector backup and recovery product when configuring it on a Tru64 UNIX cluster.
Possible Error Backing Up Cluster Mount Points
When backing up cluster mount points using the cluster alias as the client name, you may encounter an error in which the directory is reported as a mount point to a different file system and is backed up as an empty directory.
To correct this problem, create TruCluster Server clients as follows:
Create a client for each host name node in the cluster.
Create another client using the cluster alias name, selecting it as a virtual host.
You can then create backups using the alias as the client name.
You may also need to define your mount points to back up using the manual add function of the Add Backup wizard. Under some circumstances, backups that are created using the default device discovery encounter the “backed up as an empty directory” problem.
Configuring Data Protector for Oracle Integration
When Configuring Data Protector for Oracle integration, libobk.so should be linked with /usr/omni/lib/libob2oracle8_64bit.so.
The Data Protector UNIX Integration Guide incorrectly states that it should be linked with /usr/omni/lib/libob2oracle8_64.so.
Set ipport_userreserved Attribute on Large Systems |
 |
Larger systems can encounter portmapper problems in a local area network (LAN) cluster if the value of the ipport_userreserved attribute has not been tuned. The recommended value is 65535 and should be the same for all cluster members. Set the value before adding the first member.
If this value is not set for a LAN cluster with larger machines, the machines may run out of ports for interconnect services. For more information, see the manual Tuning Tru64 UNIX for Internet Servers.