actionAZ:ElastiCache

fail_az

Forces a failover for elasticache

Activity as code

Below are the details and signature of the activity Python module.

Typeaction
Moduleazchaosaws.elasticache.actions
Namefail_az
Returnmapping

This function forces a failover for elasticache. If it runs in cluster mode, it forces failover for every primary node specified (max up to 5 in every 24hours). If it runs in non-cluster mode, it forces failover if the primary node is in the target AZ.

Note: You will need to provide the replicationgroupdids for clusters where Cluster mode is enabled. Otherwise, they will not be affected. If there are multiple shards in a same Redis cluster (cluster mode enabled) that will need to failover (if their primary nodes are in same AZ), the first node replacement must complete before a subsequent test_failover call can be made. Therefore, the function leverages on describe_events to wait for the first primary node to complete first.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/elasticache.html#ElastiCache.Client.test_failover

cache_cluster_ids provided should all be the primary nodes. Otherwise, the program will not know when the node replacement is completed as the primary cache cluster id will be a key in the event generated by ElastiCache. Incorrect cache_cluster_ids will cause the program to time out after the failover for that particular nodegroup.

Usage

JSON

{
  "name": "fail_az",
  "type": "action",
  "provider": {
    "type": "python",
    "module": "azchaosaws.elasticache.actions",
    "func": "fail_az",
    "arguments": {
      "az": "",
      "dry_run": true
    }
  }
}

YAML

name: fail_az
provider:
  arguments:
    az: ""
    dry_run: true
  func: fail_az
  module: azchaosaws.elasticache.actions
  type: python
type: action

Arguments

NameTypeDefaultRequiredTitleDescription
azstringYesAvailability ZoneAZ to target
tagsList[Dict[str, str]][{"Key": "AZ_FAILURE", "Value": "True"}]NoTagsMatch only resources with these tags
replication_groupsList[Dict[str, Any]]nullNoReplication Groups
dry_runboolfalseNoDry RunOnly perform a dry run for it

Required:

Optional:

Return structure

{
  "AvailabilityZone": str,
  "DryRun": bool,
  "Shards":
    {
      "Success": [
        {
          "CacheClusterId": str,
          "ReplicationGroupId": str,
          "NodeGroupId": str,
          "ClusterEnabled": bool
        },
        ...
      ],
      "Failed": [
        {
          "CacheClusterId": str,
          "ReplicationGroupId": str,
          "NodeGroupId": str,
          "ClusterEnabled": bool
        },
        ...
      ]
    }
}

Signature

def fail_az(
    az: str = None,
    dry_run: bool = None,
    replication_groups: List[Dict[str, Any]] = None,
    tags: List[Dict[str, str]] = [{"Key": "AZ_FAILURE", "Value": "True"}],
    configuration: Configuration = None,
) -> Dict[str, Any]:
    pass