Alerting/Azure Monitor: “tsdb.HandleRequest() response error &{Request failed status: 400 Bad Request A 0xc001403600 [] [] []}” - grafana

I am using a grafana dashboard for Azure Monitor for containers- Metrics but while creating the alarm for CPU utilization in the Kubernetes cluster, I get the error mentioned above.
Here is the graph for the CPU utilization :
And also I am attaching the condition I am using to create alert:
I am not sure what am I doing wrong here. Please advice!
Thank you

Graph of CPU utilization and configuration to create alert looks correct. As per the error message (request failed status: 400 Bad Request), this looks like the alert notification settings might have some issue (most likely the server is expecting different format of data and throwing 400 to the request).
Grafana documentation for alerts - link
What is the mode of notification you are using? Predefined one or custom webhook based notification?

Related

Swagger call fails - net::ERR_EMPTY_RESPONSE

I have defined swagger for REST APIs. Swagger explorer is working fine. 1 of the APIs is doing bulk update of dynamo DB records. It takes around 3-4 minutes to complete dynamoDB operation. For this API, swagger call runs for sometime and then it says
Response Code
0
Response Headers
{
"error": "no response from server"
}
When I checked logs, there is no any exception from swagger. This API update operation runs in background and completes processing in 3-4 minutes. I have verified it using logs and metric datapoint. I want swagger also to pause/ continuous active load instead of getting timed out.
After checking Inspect part of page, it says
net::ERR_EMPTY_RESPONSE
Is there any way to update swagger timeout/ anything?

Is there an endpoint in the HAProxy Data Plane API that gives the current file version?

I am using the Data Plane API to start a transaction. I notice the top of my haproxy file looks like this.
# _version=130
When I start my app to consume this API I read that value to base my transaction version on. However, sometimes it gets stuck telling me the following error.
{
status: 409,
text: '{"code":409,"message":"15: Version mismatch, transaction version: 129, configured version: 130"}\n',
method: 'PUT',
path: '/v1/services/haproxy/transactions/5d0298aa-038e-44d1-9381-f8db0612d9ea'
}
It seems that the data plane api sidecar process does not stay in sync with the current values inside the active haproxy.cfg file on the system. However, after scouring the API's swagger (OpenAPI) file. I am unable to locate any methods to get the actual version to use when starting and committing a transaction.
Has anybody else ran into this issue?
Have you noticed that it only happens when a transaction rollback is issued? I have... so far...
I believe by performing a GET request on a configuration endpoint will return it within _version. For example:
# curl --user <user>:<password> http://localhost:10000/v1/services/haproxy/configuration/frontends
{"_version":2,"data":[{"name":"fe_main"},{"http-use-htx":"enabled","name":"fe_stats"},{"http-use-htx":"enabled","name":"stats"}]}

Exclude path from Google Cloud Endpoints

Main question:
Can we exclude a path from the cloud endpoint statistics/monitoring while still allowing traffic to our actual backend?
Explanation:
We have a backend running on Kubernetes and are now trying out Google Cloud Endpoints. We added the EPS container to the pod in front of the backend container. As we do everywhere else, we also use health checks in Kubernetes and from the Google (L7) LoadBalancer in front. In order to have the health check reach our backend, it has to be defined in the openapi yaml file used by the EPS container, e.g.:
...
paths:
"/_ah/health":
get:
operationId: "OkStatus"
security: []
responses:
200:
description: "Ok message"
...
The issue with this is that these requests muddle the monitoring/tracing/statistics for our actual API. The latency numbers registered by the cloud endpoint are useless: they show a 50th percentile of 2ms, and then a 95th percentile of 20s because of the high fraction of health-check traffic. The actual requests taking 20+ seconds are shown as a marginal fraction of requests since the health checks do requests multiple times each second, each taking 2ms. Since these health checks are steady traffic being 90% of all requests, the actual relevant requests are shown as the 'exceptions' in the margin.
Therefore, we'd like to exclude this health traffic from the endpoint statistics, but keep the health check functional.
I have not found anything for this in the documentation, nor any solution on the web somewhere else.
Possible alternate solution
We can add an extra service to our Kubernetes setup reaching directly our backend only used for the health check. Problems with this are:
Extra k8s service, configuration, firewall rules ... required
We do not health check the actual setup. If the EPS container fails to direct traffic to our backend, this will go unnoticed.
We encrypt traffic between the loadbalancer and backends with SSL, but our actual backend should now need an extra ssl-aware webserver in between for this. For this health check without actual data, this is a minor issue, but still would mean an exception to the rule.
We could add an additional health check for the EPS container as well. But since this should not show up in the stats, it should be like doing a request for a non-defined path and checking that the reponse is the EPS reponse for that case:
{"code": 5,
"message": "Method does not exist.",
"details": [{
"#type": "type.googleapis.com/google.rpc.DebugInfo",
"stackEntries": [],
"detail": "service_control"
}]
}
This is not ideal either. It does check if the container is running at the very least, but it's more of a 'it's not down' rather than a 'it's working' approach, so a lot of other issues will go unnoticed.
Google Cloud Endpoints doesn't support excluding a path from reporting statistics/monitoring yet. It's something that is on the radar and being actively looked at.
In the meantime, your alternate solution would work as a stop-gap, but with the downsides that you posted.

Error requests are not logged in the server for load test in Jmeter with 1000 users

I have done jmeter load testing for APIs and I have got error percentages for above 1000 users, when the developer is checking if the jmter server hits the api, he is complaining that,only success requests are logged and error hits are not logged.Please let me know the answer.
try saving these properties in jmeter.properties or user.properties file and enable them .by default they are false hope this helps.
jmeter.save.saveservice.response_data.on_error=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.assertion_results_failure_message=true

Fiware response 503 - Service unavailable

we are getting this error, since 11.30 GMT+2, when trying to request information from this URL http://orion.lab.fi-ware.org:1026/ngsi10/queryContext?limit='+str(limit)+'&details=on
503 - Service Unavailable
Error in IDM communication
Later this morning it worked fine.
Thank you
If the service get back up during the morning, it doesn't seem to be a major problem. This kind of glitches happen from time to time.