High Availability Considerations and Limitations

  • CloudControl keeps the data on the two nodes in sync. The default data sync interval is 10 minutes, and only data that has changed since the previous sync is synchronized. CloudControl uses rsync and postgreSQL checkpoints to determine the changes. You can configure the sync interval using the asc ha --interval <minutes> command. See CloudControl HA CLI Commands.

  • You cannot use different versions of CloudControl for your primary and secondary nodes. When upgrading, you need to use the same version of CloudControl when you run the HA upgrade process. See HA Pair Upgrade Using vSphere and HA Pair Upgrade Using ISO Upload .
  • PostgreSQL wal files are applied sequentially. If you have an HA pair where the secondary node goes offline and is restored to an earlier point in time, the next wal file that is received from the primary node will not sync properly. Therefore, if you restore a secondary from an earlier restore point, you should disband the cluster and then run the hasetup command on the secondary node to reestablish the HA pair.
  • In HyTrust CloudControl, there is no concept of “failback”, even though that term may be part of your operational procedures. After a failover occurs, you need to set up a new HA pair with the surviving node as the new primary.

  • Do not configure a VIP on the secondary node during setup. If you try to add a secondary node with a VIP to an HA pair, the operation will fail with the following message: "Error: Failover must be initiated from the secondary node". To recover from this:
    • Run asc restore --systemrestore on the secondary node. This resets the node to an initialized state, and removes the VIP and all other networking. It will take a few minutes to complete, and CloudControl will prompt you to reboot when it is complete.

      Note: Once HA is running, the primary CloudControl node will sync the current configuration over to the secondary node, so no information is lost.

    • In the vSphere console, login to the secondary node and run the setup command to reenter the network configuration. This will take a few minutes to complete.

    • Run the hasetup command to join this node to the primary node.

  • You can only configure dual-site on the primary CloudControl node. CloudControl allows you to view the configuration on the secondary node using the asc ha --dualSiteHaConf --list command, but prevents any changes from being made. You create and edit the dual-site configuration on the primary mode only, and HA sync will synchronize the information to the secondary node.
  • After a failover, always rejoin the HA pair using the surviving node as primary. If both nodes are online after a failover, it is possible to rejoin the HA pair using the surviving node as secondary and the failed node (original primary) as primary again. This is not supported, and can introduce the following problems:

    • HTCC configuration is out of date

    • in dual-site mode, VIP and PIP provisioning will be incorrect

    Always make the surviving node the primary node of the next HA pair. To return the HA pair to the original configuration, first recover from the original failover to the original secondary by rebuilding the HA pair, and then perform a manual failover back to the original primary.

  • After a failover, do not manage CloudControl or access vSphere resources using the failed node. Depending on network state, a failed node in dual-site HA may come back online believing it should be primary, and retaining its VIP and PIPs. Resume HA, using this node as secondary, as soon as possible. Until then, it will not have up to date configuration; and any of the following performed using the failed node will be lost when HA is resumed:

    • configuration changes (will be overwritten by sync from the new primary node)

    • audit logs for vSphere operations (same reason)
  • To resume HA after a manual or automatic failover, run the following commands on the node that will become secondary in the HA pair:

    1. Disband: asc ha --disband

    2. Restore: asc restore --systemrestore --keep-net

    3. Join: asc ha --join node-B-IP --mode=secondary

  • Important: After the failover, this node may appear to be in a state where steps 1 or 2 (disband and restore) might be not needed. However, the tested and supported procedure is to perform all three steps to ensure that configuration is consistent.

  • If both nodes are not accessible using the web browser for CloudControl management, for example: 

    • the secondary CloudControl VIP is reachable, but returns an HTTP 503 error (“503 Service Temporarily Unavailable”)

    • the primary CloudControl VIP redirects to the secondary CloudControl VIP

    then you need to initiate a manual failover to the secondary CloudControl node. See Initiating a Manual Failover.