Setting up a AWS cloudwatch alert when ElasticsearchRequests are too high - aws-cloudformation

I am trying to setup a cloudwatch alert that if more than lets say 5000 http requests are sent to an AWS ES cluster using CloudFormation, I see there is the ElasticsearchRequests metric i can use and this is what i have so far:
ClusterElasticsearchRequestsTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmActions:
- {'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'}
AlarmDescription: 'ElasticsearchRequests are too high.'
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: ClientId
Value: !Ref 'AWS::AccountId'
- Name: DomainName
Value: !Ref ElasticsearchDomain
EvaluationPeriods: 1
MetricName: 'ElasticsearchRequests'
Namespace: 'AWS/ES'
OKActions:
- {'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'}
Period: 60
Statistic: Maximum
Threshold: 5000
Does this look correct?
Should I use SampleCount instead of Maximum for the Statistic?
Any advice is much appreciated

According to the AWS Doc about monitoring ELasticSearch/OpenSearch clusters, the relevant statistic for the metric ElasticsearchRequests is Sum.
Here is what the docs say:
OpenSearchRequests
The number of requests made to the Elasticsearch/OpenSearch cluster.
Relevant statistics: Sum

Related

Query Cloud Watch Metrics on basis of Dimensions using Cloud Formation

I have installed CW agent on my EC2 Linux Machine and received disk_used_percemt metric of each partition. I want to create CW Alarm on only one partition. I'm getting the following dimensions for each metric,
Instance name, InstanceId, ImageId, device, fstype, path, Metric name
Now I want to create an alarm using CW where,
Namespace: CWAgent
Metric name: disk_used_percent
InstanceId: X
device: xvda1
I'm using the following CF code,
CloudWatchAlarm:
Type: "AWS::CloudWatch::Alarm"
Properties:
AlarmName: "disk-space-threshold"
AlarmDescription: "A Cloudwatch Alarm that triggers when disk space of EBS is less than 50%"
MetricName: "disk_used_percent"
Namespace: "CWAgent"
Statistic: "Average"
Period: "60"
EvaluationPeriods: "1"
Threshold: "75"
ComparisonOperator: "GreaterThanOrEqualToThreshold"
TreatMissingData: "missing"
Dimensions:
- Name: InstanceId
Value: !Ref InstanceID
- Name: ImageId
Value: !Ref ImageID
- Name: device
Value: !Ref Device
When an alarm is created, it is showing insufficient data. What can be the possible issue?
You can't filter by 3 dimensions. You always have to use full set of dimensions to identify a metric.

creating an alarm for sagemaker endpoint in cloudformation

I am trying to create an alarm for a sagemaker endpoint using cloudformation. My endpoint has two variants. My cloud formation file looks similar to below:
MySagemakerAlarmCPUUtilization:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmName: MySagemakerAlarmCPUUtilization
AlarmDescription: Monitor the CPU levels of the endpoint
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Dimension:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
Namespace: AWS/SageMaker/Endpoints
EvaluationPeriods: 1
Period: 600
Statistic: Average
Threshold: 50
I am having an issue though with the dimension part. I get an invalid property error here. Does anyone know the correct syntax to look at a particular variant of an endpoint in cloud formation?
Realised I just had a typo in this. It should read Dimensions. So:
Dimensions:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
But the code is right if anyone else wanted to use it

gke cluster deployment with custom network

I am trying to create a yaml file to deploy gke cluster in a custom network I created. I get an error
JSON payload received. Unknown name \"network\": Cannot find field."
I have tried a few names for the resources but I am still seeing the same issue
resources:
- name: myclus
type: container.v1.cluster
properties:
network: projects/project-251012/global/networks/dev-cloud
zone: "us-east4-a"
cluster:
initialClusterVersion: "1.12.9-gke.13"
currentMasterVersion: "1.12.9-gke.13"
## Initial NodePool config.
nodePools:
- name: "myclus-pool1"
initialNodeCount: 3
version: "1.12.9-gke.13"
config:
machineType: "n1-standard-1"
oauthScopes:
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/ndev.clouddns.readwrite
preemptible: true
## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
- name: myclus-pool1
type: container.v1.nodePool
properties:
zone: us-east4-a
clusterId: $(ref.myclus.name)
nodePool:
name: "myclus-pool1"
I expect it to place the cluster nodes in this network.
The network field needs to be part of the cluster spec. The top-level of properties should just be zone and cluster, network should be on the same indentation as initialClusterVersion. See more on the container.v1.cluster API reference page
Your manifest should look more like:
EDIT: there is some confusion in the API reference docs concerning deprecated fields. I offered a YAML that applies to the new API, not the one you are using. I've update with the correct syntax for the basic v1 API and further down I've added the newer API (which currently relies on gcp-types to deploy.
resources:
- name: myclus
type: container.v1.cluster
properties:
projectId: [project]
zone: us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: [network_name]
subnetwork: [subnet] ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-1
type: container.v1.nodePool
properties:
projectId: [project]
zone: us-central1-f
clusterId: $(ref.myclus.name)
nodePool:
name: my-clus-pool2
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
The newer API (which provides more functionality and allows you to use more features including the v1beta1 API and beta features) would look something like this:
resources:
- name: myclus
type: gcp-types/container-v1:projects.locations.clusters
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: shared-vpc
subnetwork: local-only ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-2
type: gcp-types/container-v1:projects.locations.clusters.nodePools
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f/clusters/$(ref.myclus.name)
nodePool:
name: my-clus-separate-pool
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
Another note, you may want to modify your scopes, the current scopes will not allow you to pull images from gcr.io, some system pods may not spin up properly and if you are using Google's repository, you will be unable to pull those images.
Finally, you don't want to repeat the node pool resource in both the cluster spec and separately. Instead, create the cluster with a basic (default) node pool, for all additional node pools, create them as separate resources to manage them without going through the cluster. There are very few updates you can perform on a node pool, asside from resizing

504 Gateway Timeout using Application Load Balancer in ECS

Deploying a Laravel web application on ECS, in order to enable autoscaling I am using an Application Load Balancer. The application worked (and scaled) perfectly until I introduced a heavy weight page, where I started to get 504 Gateway Timeout errors after a minute or so.
I am pretty sure the single web server has a higher timeout (this never happens when the application is tested in local) so the problem must be related to something related to AWS environment (ECS / ALB).
Below you can find a snipped of the ALB setting
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]
After some attempts, I solved the issue setting the idle timeout attribute of the load balancer, as explained here in theory, because nothing was wrong with the single ECS Tasks. In Cloudformation, it was enough to add the attribute setting of the parameter, and double the default value.
AdminLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
LoadBalancerAttributes:
- Key: 'idle_timeout.timeout_seconds'
Value: 120
SecurityGroups:
- !Ref 'AlbSecurityGroup'
Subnets:
- !Ref 'PublicSubnetAz1'
- !Ref 'PublicSubnetAz2'
Scheme: internet-facing
Name: !Join ['-', [!Ref 'AWS::StackName', 'lb']]

Cannot access restApiId & restApiRootResourceId for cross stack reference in serverless yml

Since I had an issue of 200 resource error, I found a way of using cross stack reference by dividing into different services. I managed to do that by using the cross-stack reference. The issue is I cannot give the restApiId & restApiRootResourceId dynamically. Right now, am statically setting ids into the service-2.
Basically the service-1 looks like,
provider:
name: aws
runtime: nodejs8.10
apiGateway:
restApiId:
Ref: ApiGatewayRestApi
restApiResources:
Fn::GetAtt:
- ApiGatewayRestApi
- RootResourceId
custom:
stage: "${opt:stage, self:provider.stage}"
resources:
Resources:
ApiGatewayRestApi:
Type: AWS::ApiGateway::RestApi
Properties:
Name: ${self:service}-${self:custom.stage}-1
Outputs:
ApiGatewayRestApiId:
Value:
Ref: ApiGatewayRestApi
Export:
Name: ApiGatewayRestApi-restApiId
ApiGatewayRestApiRootResourceId:
Value:
Fn::GetAtt:
- ApiGatewayRestApi
- RootResourceId
Export:
Name: ApiGatewayRestApi-rootResourceId
And the service-2 looks like this,
provider:
name: aws
runtime: nodejs8.10
apiGateway-shared:
restApiId:
'Fn::ImportValue': ApiGatewayRestApi-restApiId
restApiRootResourceId:
'Fn::ImportValue': ApiGatewayRestApi-rootResourceId
As the above service-2 config, I cannot reference the Ids.
FYI: Both services are in different files.
So How what's wrong with this approach?
Serverless has special syntax on how to access stack output variables: {cf:stackName.outputKey}.
Note that using the Fn::ImportValue would work inside the resources section.