Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I'm trying to get ECS Service Discovery working with Prometheus.
Currently my ECS container gets added to Route 53 like so:
+-----------------------------------------------+------+--------------------------------------------------------+
| Name | Type | Value |
+-----------------------------------------------+------+--------------------------------------------------------+
| my-service.local. | SRV | 1 1 8080 123456-7890-1234-5678-12345.my-service.local. |
| 123456-7890-1234-5678-12345.my-service.local. | A | 10.0.11.111 |
+-----------------------------------------------+------+--------------------------------------------------------+
I assume if I added more running containers to ECS, I would get more Alias records in Route 53 with the name 123456-7890-1234-5678-12345.my-service.local.
In my Prometheus configuration file, I have supplied the following under scrape_config:
- job_name: 'cadvisor'
scrape_interval: 5s
dns_sd_configs:
- names:
- 'my-service.local'
type: 'SRV'
However, when I check the target status in Prometheus, I see the following:
Endpoint: http://123456-7890-1234-5678-12345.my-service.local:8080/metrics
State: Down
Error: context deadline exceeded
I'm not familiar with how DNS Service Discovery works with SRV records so I'm not sure where the problem lies exactly. Looking at how AWS ECS Service Discovery added the records, it looks like my-service.local maps to 123456-7890-1234-5678-12345.my-service.local:8080
However it looks like Prometheus doesn't then try to find the list of local IPs mapped to 123456-7890-1234-5678-12345.my-service.local and just tries to scrape from it directly.
Is there some configuration option that I'm missing to make this work or have I misunderstood something at a fundamental level?
Turns out the issue was that I needed to add a security group rule to allow my Prometheus instance to talk to my ECS cluster since both were in a public subnet.
Also scaling the desired count in the ECS cluster up creates both another SRV record and an associated A record in Route 53 (not just one additional A record as I previously thought).
Everything seems to work now.
A fairly good alternative to using a "proper" service discovery like Consul or ECS SD with Route 53 is relying on the AWS API. This is appropriate as long as the total number of containers / tasks stays below a few thousand, since you are limited by the AWS API request cap.
There exist a number of tools that provide this functionality in combination with Prometheus file discovery. For example https://pypi.org/project/prometheus-ecs-discoverer/ or https://github.com/teralytics/prometheus-ecs-discovery
Related
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 months ago.
This post was edited and submitted for review 4 months ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I am trying to run some installation instructions for a software development environment built on top of K3S.
I am getting the error "no nodes available to schedule pods", which when Googled takes me to the question no nodes available to schedule pods - Running Kubernetes Locally with No VM
The answer to that question tells me to run kubectl get nodes.
And when I do that, it shows me perhaps not surprisingly, that I don't have any nodes running.
Without having to learn how Kubernetes actually works, how can I start some nodes and get past this error?
This is a local environment running on a single VM (just like the linked question).
It would depend how your K8s was installed. Kubernetes is a complex system requiring multiple nodes all configured correctly in order to function.
If there are no nodes found for scheduling, my first though would be you only have a single node and its a master node (which runs the control plane services but not workloads) and have not attached any worker nodes. You would need to add another node to the cluster which is running as a worker for it to schedule workloads.
If you want to get up and running without understanding it, there are distributions such as minikube or k3s, which will set it up out of the box and are designed to run on a single machine.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
So far in our legacy deployments of webservices to VM clusters, we have effectively been using Log4j2 based multi-file logging on to a persistent Volume where the log files are rolled over each day. We have a need to maintain logs for about 3 months, before they can be purged.
We are migrating to a Kubernetes Infrastructure and have been struggling on what would be the best logging strategy to adapt with Kubernetes Clusters. We don't quite like the strategies involving spitting out all logging to STDOUT/ERROUT and using come centralized tools like Datadog to manage the logs.
Our Design requirements for the Kubernetes Logging Solution are:
Using Lo4j2 to multiple files appenders.
We want to maintain the multi-file log appender structure.
We want to preserve the rolling logs in archives for about 3-months
Need a way to have easy access to the logs for searching, filtering etc.
The Kubectrl setup for viewing logs may be a bit too cumbersome for our needs.
Ideally we would like to use the Datadog dashboard approach BUT using multi-file appenders.
The serious limitation of Datadog we run into is the need for having everything pumped to STDOUT.
Start using containers platforms or building containers means that as a first step we must to change our mindset. Create logs files in your containers is not the best practices for two reasons:
Your containers should be stateless, so the should not save anything inside of it, because when it is deleted and created again your files will be desapeared.
When you send your outputs using Passive Logging(STDOUT/STDERR), Kubernetes creates the logs files for you, this files can be used by platforms like fluentd or logstash to collects those logs and send it to a log aggregation tool.
I recommend to use the Passive Logging which is the recommended way by Kubernetes and the standard for cloud native applications, maybe in the future you will need to use your app in a cloud services, which also use Passive Logging to check application errors
In the following links you will see some refereces about why k8s recommends to use Passive Logging:
k8s checklist best practices
Twelve Factor Applications Logging
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 20 days ago.
Improve this question
I am having some issues with the log entries in Stackdriver using GKE, when the log entry is greater than 20 KB, this is split into several chunks. According to GCP documentation, the limit size of log entries is 256 KB (https://cloud.google.com/logging/quotas). I have tested several configurations and found out something very curious: when the Cluster is set up using Ubuntu nodes thet issue is seen. When I use the default node type: Container-Optimized OS (cos), Stackdriver captures the log entries correctly.
Can somebody explain me the cause of this error?. I have checked this Logging with Docker and Kubernetes. Logs more than 16k split up, I think it could be related.
Additional information:
GKE static version: v1.14.10-gke.50
Kernel version (nodes): 4.15.0-1069-gke
OS image (nodes): Ubuntu 18.04.5 LTS
Docker version (nodes): 18.9.7
Cloud Operations for GKE: Legacy Logging and Monitoring
New feedback: I have created more clusters using different GKE versions and another "Cloud Operations for GKE" implementation (System and Workload and Monitoring) and the issue is the same one. Curret steps to reproduce the issue:
Create a GKE cluster using as image Ubuntu (No matter the GKE version)
Deploy an application which logs a log entry greater than 16 KB. I am using a Spring boot application + Log4j 1.X
Look for the log entry in the Stackdriver web console. The log entry is split into multiple chunks.
I see a similar happens in my GCP project when output of one log entry is large, (17KB). what difference is: the first log entry contains 0~40% of the full log output, the second contains 0~70%, and third/last contains 0~100%. My app is Spring Boot reactive application. I use a reactive log filter.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I used to be able to do iptables debugging on a Debian 9 host with specific rules in chains PREROUTING and OUTPUT (both in table raw) and target TRACE and as described here. Messages showed up in /var/log/kern.log when such rules fired.
The host had the following relevant entries in its boot config file. Things apparently worked without either CONFIG_IP_NF_TARGET_LOG or CONFIG_IP6_NF_TARGET_LOG. (I am interested in IPv4 traffic.)
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_IP_NF_RAW=m
CONFIG_IP6_NF_RAW=m
# CONFIG_IP_NF_TARGET_LOG missing
# CONFIG_IP6_NF_TARGET_LOG missing
CONFIG_NETFILTER_XT_TARGET_LOG=m
I have by now upgraded the same host to Debian 10 (Buster). It uses iptables-legacy (not the default iptables-nft), for this is in the context of a Kubernetes cluster.
What I am observing is that the same rules (e.g. iptables -t raw -A PREROUTING -d $service_ip -p tcp -j TRACE; also the same with $pod_ip) are apparently no longer working in the sense in that I do not see any resulting messages in /var/log/kern.log.
What could be the reason why and how can I further diagnose? It is perhaps the case that the TRACE capability requires a different boot config (different modules) with Debian 10, or does iptables-legacy now hinder somehow?
Now it looks as if this kind of iptables debugging does in fact still work under Debian 10 as it did previously for me under Debian 9.
Apparently I had made a mistake by installing rules for iptables debugging before recreating targeted Kubernetes services, etc. That way the iptable rules and Kubernetes resources were out of sync with respect to cluster IPs, node ports, pod IPs, etc., and so the rules never fired with traffic to those services, etc.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So I am exploring Jaeger for Tracing and I saw that we can directly send spans from the client to the collector in HTTP (PORT: 14268), if so then what is the advantage of using the jaeger agent.
When to go for the Jaeger Agent Approach and when to go with the direct HTTP Approach. What is the disadvantage of going with the Direct approach to the collector
From the official FAQ (https://www.jaegertracing.io/docs/latest/faq/#do-i-need-to-run-jaeger-agent):
jaeger-agent is not always necessary. Jaeger client libraries can be configured to export trace data directly to jaeger-collector. However, the following are the reasons why running jaeger-agent is recommended:
If we want SDK libraries to send trace data directly to collectors, we must provide them with a URL of the HTTP endpoint. It means that our applications require additional configuration containing this parameter, especially if we are running multiple Jaeger installations (e.g. in different availability zones or regions) and want the data sent to a nearby installation. In contrast, when using the agent, the libraries require no additional configuration because the agent is always accessible via localhost. It acts as a sidecar and proxies the requests to the appropriate collectors.
The agent can be configured to enrich the tracing data with infrastructure-specific metadata by adding extra tags to the spans, such as the current zone, region, etc. If the agent is running as a host daemon, it will be shared by all applications running on the same host. If the agent is running as a true sidecar, i.e. one per application, it can provide additional functionality such as strong authentication, multi-tenancy (see this blog post), pod name, etc.
Agents allow implementing traffic control to the collectors. If we have thousands of hosts in the data center, each running many applications, and each application sending data directly to the collectors, there may be too many open connections for each collector to handle. The agents can load balance this traffic with fewer connections.