How to add tag to span generated by Istio, sent to Datadog backend? - kubernetes

I'm building an Istio-Datadog integration for distributed tracing.
I can see traces showing in Datadog, but I cannot associate them with the running pods, because there's no pod_name in the span. I would like to add that pod_name to spans generated by Istio, and see them in traces in Datadog. Is this possible?
When using manual tracing instrumentation (dd-tracer) in the microservices, adding a tag to span is a simple one-liner. I'd like to find an equivalent/central workaround for doing tracing via Istio.

Related

Separated logs sources for different namespaces for same application name

There is a AWS EKS cluster. datadog-agent is installed as in the guideline
I have a deployment "myapp" in two namespaces - "dev" and "qa".
But datalogs merges logs from these two apps and I can see only "myapp" in sources https://datadoghq.eu/logs.
Is there way to get logs for every namespace separately?
You can simply try to change the name of the application in the namespaces (since you are separating them). This will make them appear as different sources.
If that is not an option, you can use tags to filter the incoming logs for the namespaces. You can read more about tags here.
This is actually a very good feature, for situations where you deploy an application spanning multiple namespaces and you want aggregated logs. So, this is very useful as well.

Show more logs in kubernetes dashboard

I am using the official kubernetes dashboard in Version kubernetesui/dashboard:v2.4.0 to manage my cluster and I've noticed that, when I select a pod and look into the logs, the length of the displayed logs is quite short. It's like 50 lines or something?
If an exception occurs, the logs are pretty much useless because the original cause is hidden by lots of other lines. I would have to download the logs or shell to the kubernetes server and use kubectl logs in order to see whats going on.
Is there any way to configure the dashboard in a way so that more lines of logs get displayed?
AFAIK, it is not possible with kubernetesui/dashboard:v2.4.0. On the list of dashboard arguments that allow for customization, there is no option to change the amount of logs displayed.
As a workaround you can use Prometheus + Grafana combination or ELK kibana as separate dashboards with logs/metrics, however depending on the size and scope of your k8s cluster it might be overkill. There are also alternative k8s opensource dashboards such as skooner (formerly known as k8dash), however I am not sure if it offers more workload logs visibility.
If anyone is interested: As the feature that i was looking for does not exist yet, i have submitted a feature request in GitHub. You can see it here: https://github.com/kubernetes/dashboard/issues/6700

Fluency with forward plugin: how to add kubernetes metadata to logs

Hey i have a question.
Im using logback-more-appenders(fluency plugin) to send logs to EFK stack (fluent-bit) which is working in kubernetes cluster, but it lacks kubernetes metadata ( like node/pod names).
I know i can use <additionalField></additionalField> in logbck.xml to add Service name (because this is static), but i cannot do it to dynamic parts like node or pod name.
I tried to do it on fluent-bit side using kubernetes filter, but this works only with tail/systemd inputs not a forward one (it parses tag with filename which contains namespce and pod name). Im using forward plugin to send logs from java software to elasticsearch, and in logback.xml i cannot enter dynamic pod name (or i don't know if i can).
Any tips how i can do it? I prefer to send logs using fluency instead of sniffing host container logs.
In my case, the best i could think of was to change from forward to tail plugin with structured logging (in json).
Have you tried to Pass POD ID and NODE NAME as environment variables in logback.xml as additional fields, that you can attribute the metadata to the logevents?

What is the difference between the core os projects kube-prometheus and prometheus operator?

The github repo of Prometheus Operator https://github.com/coreos/prometheus-operator/ project says that
The Prometheus Operator makes the Prometheus configuration Kubernetes native and manages and operates Prometheus and Alertmanager clusters. It is a piece of the puzzle regarding full end-to-end monitoring.
kube-prometheus combines the Prometheus Operator with a collection of manifests to help getting started with monitoring Kubernetes itself and applications running on top of it.
Can someone elaborate this?
I've always had this exact same question/repeatedly bumped into both, but tbh reading the above answer didn't clarify it for me/I needed a short explanation. I found this github issue that just made it crystal clear to me.
https://github.com/coreos/prometheus-operator/issues/2619
Quoting nicgirault of GitHub:
At last I realized that prometheus-operator chart was packaging
kube-prometheus stack but it took me around 10 hours playing around to
realize this.
**Here's my summarized explanation:
"kube-prometheus" and "Prometheus Operator Helm Chart" both do the same thing:
Basically the Ingress/Ingress Controller Concept, applied to Metrics/Prometheus Operator.
Both are a means of easily configuring, installing, and managing a huge distributed application (Kubernetes Prometheus Stack) on Kubernetes:**
What is the Entire Kube Prometheus Stack you ask? Prometheus, Grafana, AlertManager, CRDs (Custom Resource Definitions), Prometheus Operator(software bot app), IaC Alert Rules, IaC Grafana Dashboards, IaC ServiceMonitor CRDs (which auto-generate Prometheus Metric Collection Configuration and auto hot import it into Prometheus Server)
(Also when I say easily configuring I mean 1,000-10,000++ lines of easy for humans to understand config that generates and auto manage 10,000-100,000 lines of machine config + stuff with sensible defaults + monitoring configuration self-service, distributed configuration sharding with an operator/controller to combine config + generate verbose boilerplate machine-readable config from nice human-readable config.
If they achieve the same end goal, you might ask what's the difference between them?
https://github.com/coreos/kube-prometheus
https://github.com/helm/charts/tree/master/stable/prometheus-operator
Basically, CoreOS's kube-prometheus deploys the Prometheus Stack using Ksonnet.
Prometheus Operator Helm Chart wraps kube-prometheus / achieves the same end result but with Helm.
So which one to use?
Doesn't matter + they achieve the same end result + shouldn't be crazy difficult to start with 1 and switch to the other.
Helm tends to be faster to learn/develop basic mastery of.
Ksonnet is harder to learn/develop basic mastery of, but:
it's more idempotent (better for CICD automation) (but it's only a difference of 99% idempotent vs 99.99% idempotent.)
has built-in templating which means that if you have multiple clusters you need to manage / that you want to always keep consistent with each other. Then you can leverage ksonnet's templating to manage multiple instances of the Kube Prometheus Stack (for multiple envs) using a DRY code base with lots of code reuse. (If you only have a few envs and Prometheus doesn't need to change often it's not completely unreasonable to keep 4 helm values files in sync by hand. I've also seen Jinja2 templating used to template out helm values files, but if you're going to bother with that you may as well just consider ksonnet.)
Kubernetes operator are kubernetes specific application(pods) that configure, manage and optimize other Kubernetes deployments automatically. They are implemented as a custom controller.
According to official coreOS website:
Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software.
The prometheus operator provides the easy way to deploy configure and monitor your prometheus instances on kubernetes cluster. To do so, prometheus operator introduces three types of custom resource definition(CRD) in kubernetes.
Prometheus
Alertmanager
ServiceMonitor
Now, with the help of above CRD's, you can directly create a prometheus instance by providing kind: Prometheus and the prometheus instance is ready to serve, likewise you can do for AlertManager. Without this you would have to setup the deployment for prometheus with its image, configuration and many more things.
The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options.
Now, kube-prometheus implemented the prometheus operator and provides you minimum yaml files to create your basic setup of prometheus, alertmanager and grafana by running a single command.
git clone https://github.com/coreos/prometheus-operator.git
kubectl apply -f prometheus-operator/contrib/kube-prometheus/manifests/
By running above command in kube-prometheus directory, you will get a monitoring namespace which will have an instance of alertmanager, prometheus and grafana for UI. This is enough setup for most of the basic implementation and if you need any more specifics according to your application, you can add more yamls of exporter you need.
Kube-prometheus is more of a contribution to prometheus-operator project, which implements the prometheus operator functionality very well and provide you a complete monitoring setup for your kubernetes cluster. You can start with kube-prometheus and extend the functionality of your monitoring setup according to your application from there.
You can learn more about prometheus-operator here
As of today, 28-09-2020, this is the way to install Prometheus in a Kubernetes cluster
https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#kube-prometheus-stack
According to official documentation, kube-prometheus-stack is a rename of prometheus-operator.
As I understood, kube-prometheus-stack also has preinstalled grafana dashboards and prometheus rules.
Note: This chart was formerly named prometheus-operator chart, now
renamed to more clearly reflect that it installs the kube-prometheus
project stack, within which Prometheus Operator is only one component.
Taken from https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
Architecturally the container runs docker
The default container logs are managed by Docker, and the default log driver uses JSON-file
log-driver": "json-file
https://docs.docker.com/config/containers/logging/configure/
If the default jSON-file is used to manage container logs, log rotation is not performed by default. Therefore, the default JSON-file log driver the log files stored by the log driver can result in a large amount of disk space for containers that generate a large amount of output, which can cause disk space to run out.
In this case, save the log to ES, store it separately, and periodically delete the index using curator kubernetes
And run a scheduled task in K8S to delete the index periodically
Another solution for disk space is to periodically delete old logs from jSON-files
Typically we set the size and number of logs
This will set up a maximum of 10 log files, each with a maximum size of 20 Mb. Therefore, the container has a maximum of 200 Mb of logs
"log-driver": "json-file", "log-opts": { "max-size": "20m", "max-file": "10" },
Note: In general, the default Docker log is placed
/var/lib/docker/containers/
But in the same case kubernetes also saves logs and creates a directory structure to help you find pods-based logs, so you can find container logs for each Pod running on a node
/var/log/pods/<namespace>_<pod_name>_<pod_id>/<container_name>/
When removing pod, / var/lib/container under the docker/containers/log and k8s created under/var/log/pods/pod log will be deleted
For example, if the POD is restarted during production, the pod log will be deleted whether it is on the original node or jumped to another node
Therefore, this log needs to be saved in ES for centralized management. Many R&D projects will check the log for troubleshooting in most cases

Fetching Stackdriver Monitoring TimeSeries data for a pod running on a k8s cluster on GKE using the REST API

My objective is to fetch the time series of a metric for a pod running on a kubernetes cluster on GKE using the Stackdriver TimeSeries REST API.
I have ensured that Stackdriver monitoring and logging are enabled on the kubernetes cluster.
Currently, I am able to fetch the time series of all the resources available in a cluster using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>"
In order to fetch the time series of a given pod id, I am using the following filter:
metric.type="container.googleapis.com/container/cpu/usage_time" AND resource.labels.cluster_name="<MY_CLUSTER_NAME>" AND resource.labels.pod_id="<POD_ID>"
This filter returns an HTTP 200 OK with an empty response body. I have found the pod ID from the metadata.uid field received in the response of the following kubectl command:
kubectl get deploy -n default <SERVICE_NAME> -o yaml
However, when I use the Pod ID of a background container spawned by GKE/Stackdriver, I do get the time series values.
Since I am able to see Stackdriver metrics of my pod on the GKE UI, I believe I should also get the metric values using the REST API.
My doubts/questions are:
Am I fetching the Pod ID of my pod correctly using kubectl?
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
Is there some other way in which I can get the time series of my pod using the REST APIs?
I wouldn't rely on kubectl get deploy for pod ids. I would get them with something like kubectl -n default get pods | grep <prefix-for-your-pod> | awk '{print $1}'
I don't think so, but the best way to find out is opening a support ticket with GCP if you have any doubts.
Not that I'm aware of, Stackdriver is the monitoring solution in GCP. Again, you can check with GCP support. There are other tools that you can use to get metrics from Kubernetes like Prometheus. There are multiple guides on the web on how to set it up with Grafana on k8s. This is one for example.
Hope it helps!
Am I fetching the Pod ID of my pod correctly using kubectl?
You could use JSONpath as output with kubectl, in this case iterating over the Pods and fetching the metadata.name and metadata.uid fields:
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\n"}{end}'
which will output something like this:
nginx-65899c769f-2j775 d4fr5t6-bc2f-11e8-81e8-42010a84011f
nginx2-77b5c9d48c-7qlps 4f5gh6r-bc37-11e8-81e8-42010a84011f
Could there be some issue with my cluster setup/service deployment due to which I'm unable to fetch the metrics?
As #Rico mentioned in his answer, contacting the GCP support could be a way forward if you don't get further with the troubleshooting, see below.
Is there some other way in which I can get the time series of my pod using the REST APIs?
You could use the APIs Explorer or the Metrics Explorer from within the Stackdriver portal. There's some good troubleshooting tips here with a link to the APIs Explorer. In the Stackdriver Metrics Explorer it's fairly easy to reassemble the filter you've used using dropdown lists to choose e.g. a particular pod_id.
Taken from the Troubleshooting the Monitoring guide (linked above) regarding an empty HTTP 200 response on filtered queries:
If your API call returns status code 200 and an empty response, there
are several possibilities:
If your call uses a filter, then the filter might not have matched anything. The filter match is case-sensitive. To resolve filter
problems, start by specifying only one filter component, such as
metric.type, and see if you get results. Add the other filter
components one-by-one.
If you are working with a custom metric, you might not have specified the project where your custom metric is defined.*
I found this link when reading through the documentation of the Monitoring API. That link will get you to the APIs Explorer with some pre-filled fields, change these accordingly and add your own filter.
I have not tested more using the REST API at the moment but hopefully this could get you forward.