How Istio's sampling rate works with errors? - kubernetes

My question about Istio in Kubernetes. I have Istio sample rate of 1% and I have error which is not included in 1%. Would I see in Jaeger trace for this error?
I kind of new to Kubernetes and Istio. That's why can't tested on my own. I have been playing with Istio's example of Book Application and I wonder would I see trace with error which not included in 1% of sample rate.
Configure Istio when installing with:
pilot.traceSampling=1
As result want to know can I see error which not included in sample rate. If no, how I configure Istio to see it if possible?

If you have sampling rate set to 1% then error will be seen in Jaeger once it occurs 100 times.
This is mentioned at Distributed Tracing - Jaeger:
To see trace data, you must send requests to your service. The number of requests depends on Istio’s sampling rate. You set this rate when you install Istio. The default sampling rate is 1%. You need to send at least 100 requests before the first trace is visible. To send a 100 requests to the productpage service, use the following command:
$ for i in `seq 1 100`; do curl -s -o /dev/null http://$GATEWAY_URL/productpage; done
If you are not seeing the error in the current sample, I would advice make the sample higher.
You can read about Tracing context propagation which is being done by Envoy.
Envoy automatically sends spans to tracing collectors
Alternatively the trace context can be manually propagated by the service:
When using the LightStep tracer, Envoy relies on the service to propagate the x-ot-span-context HTTP header while sending HTTP requests to other services.
When using the Zipkin tracer, Envoy relies on the service to propagate the B3 HTTP headers ( x-b3-traceid, x-b3-spanid, x-b3-parentspanid, x-b3-sampled, and x-b3-flags). The x-b3-sampled header can also be supplied by an external client to either enable or disable tracing for a particular request. In addition, the single b3 header propagation format is supported, which is a more compressed format.
When using the Datadog tracer, Envoy relies on the service to propagate the Datadog-specific HTTP headers ( x-datadog-trace-id, x-datadog-parent-id, x-datadog-sampling-priority).

Related

Monitor smart power plug with Prometheus / Grafana

I barely managed to set up Prometheus & Grafana on my new Raspberry Pi (running Raspbian). Now I would like to monitor a smart power plug with a REST API. That means I could send a curl command and receive some data back:
$ curl --location --request GET '[Switch IP]/report'
{
"power": 35.804927825927734,
"relay": true,
"temperature": 21.369983673095703
}
However I am at a loss as to how to get this data automagically queried and parsed by Prometheus. My Google Fu is failing me as all the results explain how to query Prometheus. Any hints would be greatly appreciated.
It's non-trivial, unfortunately.
Prometheus "scrapes" HTTP endpoints and expects these to publish metrics using Prometheus' exposition format. This is a simple text format that lists metrics with their values. I was unable to find a good example.
You would need to have an "exporter" that interacts with your devices and creates metrics (in the Prometheus format) and publishes these on an HTTP endpoint (not REST just a simple text page).
Then, you'd point the Prometheus server at this exporter's endpoint and Prometheus would periodically read the metrics representing your device and enable you to interact with the results.
There are a few possible approaches to make this a bit more straightforward:
https://github.com/ricoberger/script_exporter
https://github.com/grafana/agent/issues/1371 — discussing a possible script_exporter integration
https://github.com/prometheus/pushgateway — Prometheus’ push gateway
https://github.com/prometheus/blackbox_exporter — Prometheus’ blackbox exporter
https://medium.com/avmconsulting-blog/pushing-bash-script-result-to-prometheus-using-pushgateway-a0760cd261e — this post shows something similar

Configuring wait time of SOAP request node and SOAP input node in IIB

I am using IIB V10.Can we increase the max client wait time of SOAP input node more than 180 seconds(default) in IIB. Also, can we configure the request time out of SOAP request node from 120 seconds(default) to a higher number?
The IIB documentation describe these timeouts in detail here:
maxClientWaitTime of SOAP Input node.
requestTimeout of SOAP Request node.
You can configure these values either directly in the flow as properties of the nodes or via BAR overrides before the deployment.
There is also general chapter called Configuring message flows to process timeouts which describes the timeout handling of these synchronous nodes.

Distributed tracing in Istio - expected behavior when the application does NOT propagate headers

My application (hosted in a Kubernetes cluster with Istio installed) does NOT propagate distributed tracing headers (as described here). My expectation is that istio-proxy should still generate a trace (consisting of a single call) that would be visible in Jaeger, even though of course the entire chain of calls would not be stitched together. However, that doesn't appear to be the case, as I'm not seeing any calls to my application in Jaeger.
In attempt to troubleshoot I have tried the following:
Logs for the istio-proxy container deployed as a side-car to my application's container look good, I can see incoming requests to the application being registered by Envoy:
kubectl logs -f helloworld-69b7f5b6f8-chp9n -c istio-proxy
[2019-01-29T21:29:18.925Z] - 444 289 45 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:45930 10.244.0.54:80 10.244.0.1:33733
[2019-01-29T21:29:29.922Z] - 444 289 25065 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46014 10.244.0.54:80 10.240.0.5:56166
[2019-01-29T21:30:05.922Z] - 444 289 15051 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46240 10.244.0.54:80 10.240.0.6:48053
[2019-01-29T21:30:31.922Z] - 444 289 36 "127.0.0.1:80" inbound|81||helloworld.default.svc.cluster.local 127.0.0.1:46392 10.244.0.54:80 10.240.0.6:47009
I have enabled tracing in Mixer's configuration, and I can now see Mixer's activity in Jaeger UI (but no traces of calls to my application still).
I'm new to Istio, and it appears I have run out of option.
First off, is my expectation correct? Am I supposed to be seeing traces - each consisting of a single call - in Jaeger UI when the application doesn't propagate distributed tracing headers?
If my expectation is correct, how can I troubleshoot further? Can I somehow verify Envoy configuration and check that it's indeed tracing data to Mixer?
If my expectation is incorrect, can Istio's behavior be overridden so that I get what I need?
Thank you.

Kubernetes etcd HighNumberOfFailedHTTPRequests QGET

I run kubernetes cluster in AWS, CoreOS-stable-1745.6.0-hvm (ami-401f5e38), all deployed by kops 1.9.1 / terraform.
etcd_version = "3.2.17"
k8s_version = "1.10.2"
This Prometheus alert method=QGET alertname=HighNumberOfFailedHTTPRequests is coming from coreos kube-prometheus monitoring bundle. The alert started to fire from the very beginning of the cluster lifetime and now exists for ~3 weeks without visible impact.
^ QGET fails - 33% requests.
NOTE: I have the 2nd cluster in other region built from scratch on the same versions and it has exact same behavior. So it's reproducible.
Anyone knows what might be the root cause, and what's the impact if ignored further?
EDIT:
Later I found this GH issue which describes my case precisely: https://github.com/coreos/etcd/issues/9596
From CoreOS documentation:
For alerts to not appear on arbitrary events it is typically better not to alert directly on a raw value that was sampled, but rather by aggregating and defining a relative threshold rather than a hardcoded value. For example: send a warning if 1% of the HTTP requests fail, instead of sending a warning if 300 requests failed within the last five minutes. A static value would also require a change whenever your traffic volume changes.
Here you can find detailed information on how to Develop Prometheus alerts for etcd.
I got the explanation in GitHub issue thread.
HTTP metrics/alerts should be replaced with GRPC.

Istio Tracing Issues

I have made a trivial 3 tier services similar to the bookinfo app on the Istio site. Everything seems to work fine, except for the tracing with zipkin or jaeger.
To clarify, I have 3 services S1, S2, S3, all pretty similar and trivial passing requests downstream and doing some work. I can see S1 and S2 in the trace, but not S3. I have narrowed this down a bit further, when i use Istio version 0.5.0, I can see S3 in the trace as well, but only after some time, however, with Istio version 0.5.1, I can only see S1 and S2 in the trace, even though the services are working properly and the calls are propagating down all the way to S3.
The only difference that I can see, which I am not sure if this is even an issue or not, is this output in istio-proxy for S3 using istio version 0.5.0, but not in 0.5.1
"GET /readiness HTTP/1.1" 200 - 0 39 1 1 "-" "kube-probe/1.9+" "0969a5a3-f6c0-9f8e-a449-d8617c3a5f9f" "10.X.X.18:8080" "127.0.0.1:8080"
I can add the exact yaml files if need. Also, I am not sure if the tracing is supposed to be coming from istio-proxy as it shows in the istio docs, but in my case, I do not see istio-proxy but rather istio-ingress only.
Trace context propagation might be missing.
https://istio.io/docs/tasks/observability/distributed-tracing/overview/#trace-context-propagation
Although Istio proxies are able to automatically send spans, they need some hints to tie together the entire trace. Applications need to propagate the appropriate HTTP headers so that when the proxies send span information, the spans can be correlated correctly into a single trace.
To do this, an application needs to collect and propagate the following headers from the incoming request to any outgoing requests:
x-request-id
x-b3-traceid
x-b3-spanid
x-b3-parentspanid
x-b3-sampled
x-b3-flags
x-ot-span-context
Additionally, tracing integrations based on OpenCensus (e.g. Stackdriver) propagate the following headers:
x-cloud-trace-context
traceparent
grpc-trace-bin