I provision AWS Elasticsearch service with Terraform and want to setup CloudWatch alarms for some metrics like CPU Usage etc. also by using Terraform.
In order to do it I have to put NodeId to aws_cloudwatch_metric_alarm resource:
The problem is that aws_elasticsearch_domain resource doesn't have suitable Attributes Reference
And I also haven't found anything suitable in aws es cli
https://docs.aws.amazon.com/cli/latest/reference/es/index.html
Any ideas how to get this NodeId to use in Terraform?
You can get the nodeId from elasticsearch api instead of relying on aws sdk/cli.
Specifically, you can query the cat/nodes api.
Link for reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-nodes.html
Related
I have a specific namespace that I am trying to output logs into Amazon S3 using Banzai cloud Logging operator.I Am currently using ClusterFlow and ClusterOutput but it is not working for me and not able to see the logs in S3 bucket.
Here are the images of the code that I have:
https://imgur.com/a/s6MBpML
I have tried reaching out to the Banzai Cloud Slack to no avail. Can someone help me check the yaml files and validate them?
You should be using Flow object and not ClusterFlow.
The Flow is a namespaced resource, which means logs will only be collected from the namespace that the Flow is deployed in. Where as ClusterFlow is scoped at the cluster level and can configure log collection across all namespaces.
I have a simple spring boot application deployed on Kubernetes on GCP. The service is exposed to an external IP address. I am load testing this application using JMeter. It is just a http GET request which returns True or False.
I want to get the latency metrics with time to feed it to HorizontalPodAutoscaler to implement custom auto-scaler. How do I implement this?
Since you mentioned Custom Auto Scaler. I would suggest this simple solution which makes use of some of tools which you already might have.
First Part: Is to Create a service or cron or any time-based trigger which will on a regular interval make requests to your deployed application. This application will then store the resultant metrics to persistence storage or file or Database etc.
For example, if you use a simple Apache Benchmark CLI tool(you can also use Jmeter or any other load testing tool which generates structured o/p), You will get a detailed result for a single query. Use this link to get around the result for your reference.
Second Part Is that this same script can also trigger another event which will check for the latency or response time limit configured as per your requirement. If the response time is above the configured value scale if it is below scale down.
The logic for scaling down can be more trivial, But I will leave that to you.
Now for actually scaling the deployment, you can use the Kubernetes API. You can refer to the official doc or this answer for details. Here's a simple flow diagram.
There are two ways to auto scale with custom metrics:
1.You can export a custom metric from every Pod in the Deployment and target the average value per Pod.
2.You can export a custom metric from a single Pod outside of the Deployment and target the total value.
So follow these-
1. To grant GKE objects access to metrics stored in Stackdriver, you need to deploy the Custom Metrics Stackdriver Adapter. To run Custom Metrics Adapter, you must grant your user the ability to create required authorization roles by running the following command:
kubectl create clusterrolebinding cluster-admin-binding \
--clusterrole cluster-admin --user "$(gcloud config get-value account)"
To deploy adapter-
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml
You can export your metrics to Stackdriver either directly from your application, or by exposing them in Prometheus format and adding the Prometheus-to-Stackdriver adapter to your Pod's containers.
You can view the exported metrics from the Metrics Explorer by searching for custom/[METRIC_NAME]
Your metric needs to meet the following requirements:
Metric kind must be GAUGE
Metric type can be either DOUBLE or INT64
Metric name must start with custom.googleapis.com/ prefix, followed by a simple name
Resource type must be "gke_container"
Resource labels must include:
pod_id set to Pod UID, which can be obtained via the Downward API
container_name = ""
project_id, zone, cluster_name, which can be obtained by your application from the metadata server. To get values, you can use Google Cloud's compute metadata client.
namespace_id, instance_id, which can be set to any value.
3.Once you have exported metrics to Stackdriver, you can deploy a HPA to scale your Deployment based on the metrics.
Vie this on GitHub for additional codes
My objective is to fetch the time series of a metric for a pod running on a kubernetes cluster on GKE using the Stackdriver TimeSeries REST API.
I have ensured that Stackdriver monitoring and logging are enabled on the kubernetes cluster.
Currently, I am able to fetch the time series of all the resources available in a cluster using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>"
In order to fetch the time series of a given pod id, I am using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>" AND resource.labels.pod_id="<POD_ID>"
This filter returns an HTTP 200 OK with an empty response body. I have found the pod ID from the metadata.uid field received in the response of the following kubectl command:
kubectl get deploy -n default <SERVICE_NAME> -o yaml
However, when I use the Pod ID of a background container spawned by GKE/Stackdriver, I do get the time series values.
Since I am able to see Stackdriver metrics of my pod on the GKE UI, I believe I should also get the metric values using the REST API.
My doubts/questions are:
Am I fetching the Pod ID of my pod correctly using kubectl?
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
Is there some other way in which I can get the time series of my pod using the REST APIs?
I wouldn't rely on kubectl get deploy for pod ids. I would get them with something like kubectl -n default get pods | grep <prefix-for-your-pod> | awk '{print $1}'
I don't think so, but the best way to find out is opening a support ticket with GCP if you have any doubts.
Not that I'm aware of, Stackdriver is the monitoring solution in GCP. Again, you can check with GCP support. There are other tools that you can use to get metrics from Kubernetes like Prometheus. There are multiple guides on the web on how to set it up with Grafana on k8s. This is one for example.
Hope it helps!
Am I fetching the Pod ID of my pod correctly using kubectl?
You could use JSONpath as output with kubectl, in this case iterating over the Pods and fetching the metadata.name and metadata.uid fields:
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
which will output something like this:
nginx-65899c769f-2j775 d4fr5t6-bc2f-11e8-81e8-42010a84011f
nginx2-77b5c9d48c-7qlps 4f5gh6r-bc37-11e8-81e8-42010a84011f
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
As #Rico mentioned in his answer, contacting the GCP support could be a way forward if you don't get further with the troubleshooting, see below.
Is there some other way in which I can get the time series of my pod using the REST APIs?
You could use the APIs Explorer or the Metrics Explorer from within the Stackdriver portal. There's some good troubleshooting tips here with a link to the APIs Explorer. In the Stackdriver Metrics Explorer it's fairly easy to reassemble the filter you've used using dropdown lists to choose e.g. a particular pod_id.
Taken from the Troubleshooting the Monitoring guide (linked above) regarding an empty HTTP 200 response on filtered queries:
If your API call returns status code 200 and an empty response, there
are several possibilities:
If your call uses a filter, then the filter might not have matched anything. The filter match is case-sensitive. To resolve filter
problems, start by specifying only one filter component, such as
metric.type, and see if you get results. Add the other filter
components one-by-one.
If you are working with a custom metric, you might not have specified the project where your custom metric is defined.*
I found this link when reading through the documentation of the Monitoring API. That link will get you to the APIs Explorer with some pre-filled fields, change these accordingly and add your own filter.
I have not tested more using the REST API at the moment but hopefully this could get you forward.
I've followed the instructions to create an EKS cluster in AWS using Terraform.
https://www.terraform.io/docs/providers/aws/guides/eks-getting-started.html
I've also copied the output for connecting to the cluster to ~/.kube/config-eks. I've verified this successfully works as I've been able to connect to the cluster and manually deploy containers. However, now i'm trying to use the Terraform Kubernetes provider to connect to the cluster but cannot seem to be able to configure the provider properly.
I've configured the provider to use my kubectl configuration but when attempting to push a simple configmap, i get an error stating the following:
configmaps is forbidden: User "system:anonymous" cannot create configmaps in the namespace "kube-system"
I know that the provider is picking up part of the configuration but I cannot seem to get it to authenticate. I suspect this is because EKS uses heptio for authentication and i'm not sure if the K8s Go client used by Terraform can support heptio. However, given that Terraform released their AWS EKS support when EKS went GA, I'd doubt that they wouldn't also update their Terraform provider to work with it.
Is it possible to even do this now? Are there alternatives?
Exec auth was added here: https://github.com/kubernetes/client-go/commit/19c591bac28a94ca793a2f18a0cf0f2e800fad04
This is what is utilized for custom authentication plugins and was published Feb 7th.
Right now, Terraform doesn't support the new exec-based authentication provider, but there is an issue open with a workaround: https://github.com/terraform-providers/terraform-provider-kubernetes/issues/161
That said, if I get some free time I will work on a PR.
I am creating a grafana dashboard to show metrices.
Is there any keyword which i can use in my query to check my service is running or not.
I am using prometheus to retrive data from my api for the metrics creation.
You can create a query like so
count_scalar(container_last_seen{name=<container_name>})
That will give you the count of how many containers are running with that name