We are running nearly 100 instances in Production for kubernetes cluster and using prometheus server to create Grafana dashboard. To monitor the disk usage , below query is used
(sum(node_filesystem_size_bytes{instance=~"$Instance"}) - sum(node_filesystem_free_bytes{instance=~"$Instance"})) / sum(node_filesystem_size_bytes{instance=~"$Instance"})
As Instance ip is getting replaced and we are using nearly 80 instances, I am getting error as "Request URI too large".Can someone help to fix this issue
You only need to specify the instances once and use the on matching operator to get their matching series:
(sum(node_filesystem_size_bytes{instance=~"$Instance"})
- on(instance) sum(node_filesystem_free_bytes))
/ on(instance) sum(node_filesystem_size_bytes)
Consider also adding a unifying label to your time series so you can do something like ...{instance_type="group-A"} instead of explicitly specifying instances.
Related
I am trying to do MongoDB on AWS following the AWS deployment guide. It is defaulted to boot up m5.large EC2s. However, I am only experimenting so I want to use a free tier EC2. When I add t2.micro to the allowed values and set it as default I get an error as pictured below.
Is there anyway I can get MongoDB running on AWS with 3 replications using the cloudformation method with free tier t2.micro instances.? If not, any better methods?
The MongoDB on AWS - Quick Start has multiple templates that are deployed.
I notice that the NodeInstanceType is used and defined in multiple templates, presumably with the values passed from the master template to the node templates. Therefore, your changes will probably need to be made on any template that defines the NodeInstanceType parameter. I recommend you check all of the templates for such references.
While maintaining a simple set of a Grafana and a Prometheus pod, plus a few others, within a cluster on Azure Kubernetes Services (AKS)—I ran into an issue where a "instance" variable query set up in Grafana only retrieves any data from several, out of 52, VM's/instances in a Prometheus scrape job.
There are a couple other Grafana variables that can arbitrarily nest / be nested inside of the "instance" variable, but switching the order of them did nothing.
To elaborate, this is bizarre because I am getting perfect data for 11 out of the 52 VM's/instances while the others are not being populated in the "instance" Grafana variable as they should be. Maybe this is a backend issue, but have not found any oddities while probing around with kubectl.
Thank you!
All the 52 instances were not sending the windows_iis_requests_total metric.Hence the label_values gave the incomplete response.
I am creating a redshift cluster using CF and then I need to output the cluster status (basically if its available or not). There are ways to output the endpoints and port but I could not find any possible way of outputting the status.
How can I get that, or it is not possible ?
You are correct. According to AWS::Redshift::Cluster - AWS CloudFormation, the only available outputs are Endpoint.Address and Endpoint.Port.
Status is not something that you'd normally want to output from CloudFormation because the value changes.
If you really want to wait until the cluster is available, you could create a WaitCondition and then have something monitor the status and the signal for the Wait Condition to continue. This would probably need to be an Amazon EC2 instance with some User Data. Linux instances are charged per-second, so this would be quite feasible.
I run kubernetes cluster in AWS, CoreOS-stable-1745.6.0-hvm (ami-401f5e38), all deployed by kops 1.9.1 / terraform.
etcd_version = "3.2.17"
k8s_version = "1.10.2"
This Prometheus alert method=QGET alertname=HighNumberOfFailedHTTPRequests is coming from coreos kube-prometheus monitoring bundle. The alert started to fire from the very beginning of the cluster lifetime and now exists for ~3 weeks without visible impact.
^ QGET fails - 33% requests.
NOTE: I have the 2nd cluster in other region built from scratch on the same versions and it has exact same behavior. So it's reproducible.
Anyone knows what might be the root cause, and what's the impact if ignored further?
EDIT:
Later I found this GH issue which describes my case precisely: https://github.com/coreos/etcd/issues/9596
From CoreOS documentation:
For alerts to not appear on arbitrary events it is typically better not to alert directly on a raw value that was sampled, but rather by aggregating and defining a relative threshold rather than a hardcoded value. For example: send a warning if 1% of the HTTP requests fail, instead of sending a warning if 300 requests failed within the last five minutes. A static value would also require a change whenever your traffic volume changes.
Here you can find detailed information on how to Develop Prometheus alerts for etcd.
I got the explanation in GitHub issue thread.
HTTP metrics/alerts should be replaced with GRPC.
I want to track Akka actor's metrics and for that I am using Kamon a JVM monitoring tool, which requires a backend service to post it's stats data so for this purpose I've decided to use open source StatsD with the combination of Grafana & Graphite. Here is the Grafana image which I ran in the docker (with the help of docker tool since I am on Mac), everything thing is working fine. I am able to see Grafana UI screen but its showing some random data in the graphs, may be these are example graphs. Now I am struggling on how to configure it with my own datasource. If anybody here had same experience in the past, can help me? Any kind of help would be appreciated.
The random graphs you are seeing are the default grafana test datasource.
You first need to configure a new datasource that points at the Graphite metrics. The important thing to realise here is that the URL to the Graphite datasource from Grafana is located within the same Docker container i.e. the localhost.
If you set up a new datasource with the following properties:
Name: graphite
Default: checked
Type: Graphite
URL: http://localhost:8000
Access: proxy
You should then have a datasource that points to the Graphite metric data within the Docker container.
Note - the default username/password for the Grafana UI is admin/admin.
Hope this helps.