Need to setup a Customized Kubernetes Logging strategy [closed] - kubernetes

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
So far in our legacy deployments of webservices to VM clusters, we have effectively been using Log4j2 based multi-file logging on to a persistent Volume where the log files are rolled over each day. We have a need to maintain logs for about 3 months, before they can be purged.
We are migrating to a Kubernetes Infrastructure and have been struggling on what would be the best logging strategy to adapt with Kubernetes Clusters. We don't quite like the strategies involving spitting out all logging to STDOUT/ERROUT and using come centralized tools like Datadog to manage the logs.
Our Design requirements for the Kubernetes Logging Solution are:
Using Lo4j2 to multiple files appenders.
We want to maintain the multi-file log appender structure.
We want to preserve the rolling logs in archives for about 3-months
Need a way to have easy access to the logs for searching, filtering etc.
The Kubectrl setup for viewing logs may be a bit too cumbersome for our needs.
Ideally we would like to use the Datadog dashboard approach BUT using multi-file appenders.
The serious limitation of Datadog we run into is the need for having everything pumped to STDOUT.

Start using containers platforms or building containers means that as a first step we must to change our mindset. Create logs files in your containers is not the best practices for two reasons:
Your containers should be stateless, so the should not save anything inside of it, because when it is deleted and created again your files will be desapeared.
When you send your outputs using Passive Logging(STDOUT/STDERR), Kubernetes creates the logs files for you, this files can be used by platforms like fluentd or logstash to collects those logs and send it to a log aggregation tool.
I recommend to use the Passive Logging which is the recommended way by Kubernetes and the standard for cloud native applications, maybe in the future you will need to use your app in a cloud services, which also use Passive Logging to check application errors
In the following links you will see some refereces about why k8s recommends to use Passive Logging:
k8s checklist best practices
Twelve Factor Applications Logging

Related

Kubernet Helm Concern [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
You have a collection of pods that all run the same application, but with slightly different configuration. Applying the configuration at run -time would also be desirable. What's the best way to achieve this?
a: Create a separate container image for each version of the application, each with a different configuration
b: Use a single container image, but create ConfigMap objects for the different Configurations and apply them to the different pods
c: Create persistent Volumes that contain each config files and mount then to different pods
In my opinion, the best solution for this would be "b" to use ConfigMaps.
Apply changes to the ConfigMap and redeploy the image/pod. I use this approach very often at work.
Edit from #AndD: [...] ConfigMaps can contain small files and as such can be mounted as read only volumes instead of giving environment variables. So they can be used also instead of option c in case files are required.
My favorite command for this
kubectl patch deployment <DEPLOYMENT> -p "{\"spec\": {\"template\": {\"metadata\": { \"labels\": { \"redeploy\": \"$(date +%s)\"}}}}}"
Patches a redeploy date to now in spec.template.metadata.labels.redeploy -> date +%s
Option "a" seems to be evenly good, the downside is you have to build the new Image, push it, deploy it. You could build a Jenkins/GitLab/whatever Pipeline to automate this way. But, I think ConfigMaps are fairly easier and are more "manageable". This approach is viable if you have very very many config files, which would be too much work to implement in ConfigMaps.
Option "c" feels a bit too clunky just for config files. Just my two cents. Best Practice is to not store config files in volumes. As you (often) want your Applications to be as "stateless" as possible.

AWS MSK vs Confluent for hosting Kafka? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
In terms of investing for the most value, how does AWS MSK compare to Confluent when it comes to hosting an end to end Kafka event sourcing?
The main criteria to be used for comparing are:
Monitoring
Ease of deployment and configuring
Security
I have used open-source, on-prem Cloudera, and MSK. When comparing them together they have all had their quirks.
If you just base on the speed of provisioning a secure Kafka cluster. I think MSK would win hands down. Someone with Kafka, AWS Certificate Manager, and Terraform can get it all done very quickly. Though there are a few issues around Terraform TLS and AWS CLI but there are workarounds.
If you are planning to use Kafka Connect then confluent makes lots of sense.
If you have Kafka developers who have experience in writing Kafka Connect sinks and source. Well, then you may not need a subscription-based model from confluent. Though you may not save a lot of money. Either spend in development or spend in subscription costs.
If you like serverless - MSK is quite good. However, there is no SSH access to the Kafka cluster. You cannot tune the JVM.
Monitoring is built out of the box for MSK via open monitoring via JMX metrics and prometheus. You also have CloudWatch as well. But open monitoring pretty much gives all the metrics you need. In open-source, you can easily deploy monitoring. Rather MSK is doing the same.
MSK provides security using either TLS or IAM. Though there are some issues around enabling IAM-based security for MSK using Terraform. 2 Way TLS Client authentication is quite easy to set up.
MSK also provides auto-scaling but again if you are planning to use Terraform there may be some interoperability issues.
I am sure folks here can add a lot more on confluent.

Log entries are split in StackDriver using GKE [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 20 days ago.
Improve this question
I am having some issues with the log entries in Stackdriver using GKE, when the log entry is greater than 20 KB, this is split into several chunks. According to GCP documentation, the limit size of log entries is 256 KB (https://cloud.google.com/logging/quotas). I have tested several configurations and found out something very curious: when the Cluster is set up using Ubuntu nodes thet issue is seen. When I use the default node type: Container-Optimized OS (cos), Stackdriver captures the log entries correctly.
Can somebody explain me the cause of this error?. I have checked this Logging with Docker and Kubernetes. Logs more than 16k split up, I think it could be related.
Additional information:
GKE static version: v1.14.10-gke.50
Kernel version (nodes): 4.15.0-1069-gke
OS image (nodes): Ubuntu 18.04.5 LTS
Docker version (nodes): 18.9.7
Cloud Operations for GKE: Legacy Logging and Monitoring
New feedback: I have created more clusters using different GKE versions and another "Cloud Operations for GKE" implementation (System and Workload and Monitoring) and the issue is the same one. Curret steps to reproduce the issue:
Create a GKE cluster using as image Ubuntu (No matter the GKE version)
Deploy an application which logs a log entry greater than 16 KB. I am using a Spring boot application + Log4j 1.X
Look for the log entry in the Stackdriver web console. The log entry is split into multiple chunks.
I see a similar happens in my GCP project when output of one log entry is large, (17KB). what difference is: the first log entry contains 0~40% of the full log output, the second contains 0~70%, and third/last contains 0~100%. My app is Spring Boot reactive application. I use a reactive log filter.

Advantages of using Jaeger Agent [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
So I am exploring Jaeger for Tracing and I saw that we can directly send spans from the client to the collector in HTTP (PORT: 14268), if so then what is the advantage of using the jaeger agent.
When to go for the Jaeger Agent Approach and when to go with the direct HTTP Approach. What is the disadvantage of going with the Direct approach to the collector
From the official FAQ (https://www.jaegertracing.io/docs/latest/faq/#do-i-need-to-run-jaeger-agent):
jaeger-agent is not always necessary. Jaeger client libraries can be configured to export trace data directly to jaeger-collector. However, the following are the reasons why running jaeger-agent is recommended:
If we want SDK libraries to send trace data directly to collectors, we must provide them with a URL of the HTTP endpoint. It means that our applications require additional configuration containing this parameter, especially if we are running multiple Jaeger installations (e.g. in different availability zones or regions) and want the data sent to a nearby installation. In contrast, when using the agent, the libraries require no additional configuration because the agent is always accessible via localhost. It acts as a sidecar and proxies the requests to the appropriate collectors.
The agent can be configured to enrich the tracing data with infrastructure-specific metadata by adding extra tags to the spans, such as the current zone, region, etc. If the agent is running as a host daemon, it will be shared by all applications running on the same host. If the agent is running as a true sidecar, i.e. one per application, it can provide additional functionality such as strong authentication, multi-tenancy (see this blog post), pod name, etc.
Agents allow implementing traffic control to the collectors. If we have thousands of hosts in the data center, each running many applications, and each application sending data directly to the collectors, there may be too many open connections for each collector to handle. The agents can load balance this traffic with fewer connections.

Spring Batch and Pivotal Cloud Foundry [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
We are evaluating Spring Batch framework to replace our home grown batch framework in our organization and we should be able to deploy the batch in Pivotal Cloud Foundry (PCF). In this regard, can you let us know your thoughts on the issue below:
Let us say if we use Remote Partitioning strategy to process large volume of records, can the batch job auto scale Slave nodes in the cloud based on the amount of that the batch job processes? Or we have to scale appropriate number of Slave nodes and keep them in place before the batch job kicks-off?
How does the "grid size" parameter configuration in the scenario above?
You have a few questions here. However, before getting into them, let me take a minute and walk through where batch processing is on PCF right now and then get to your questions.
Current state of CF
As of PCF 1.6, Diego (the dynamic runtime within CF) provided a new primitive called Tasks. Traditionally, all applications running on CF were expected to be long running processes. Because of this, in order to run a batch job on CF, you'd need to package it up as a long running process (web app usually) and then deploy that. If you wanted to use remote partitioning, you'd need to deploy and scale slaves as you saw fit, but it was all external to CF. With Tasks, Diego now supports short lived processes...aka processes that won't be restarted when they complete. This means that you can run a batch job as a Spring Boot über jar and once it completes, CF won't try to restart it (that's a good thing). The issue with 1.6 is that an API exposing Tasks was not available so it was only an internal construct.
With PCF 1.7, a new API is being released to expose Tasks for general use. As part of the v3 API, you'll be able to deploy your own apps as Tasks. This allows you to launch a batch job as a task knowing it will execute, then be cleaned up by PCF. With that in mind...
Can the batch job auto scale Slave nodes in the cloud based on the amount of that the batch job processes?
When using Spring Batch's partitioning capabilities, there are two key components. The Partitioner and the PartitionHandler. The Partitioner is responsible for understanding the data and how it can be divided up. The PartitionHandler is responsible for understanding the fabric in which to distribute the partitions to the slaves.
For Spring Cloud Data Flow, we plan on creating a PartitionHandler implementation that will allow users to execute slave partitions as Tasks on CF. Essentially, what we'd expect is that the PartitionHandler would launch the slaves as tasks and once they are complete, they would be cleaned up.
This approach allows the number of slaves to be dynamically launched based on the number of partitions (configurable to a max).
We plan on doing this work for Spring Cloud Data Flow but the PartitionHandler should be available for users outside of that workflow as well.
How does the "grid size" parameter configuration in the scenario above?
The grid size parameter is really used by the Partitioner and not the PartitionHandler and is intended to be a hint on how many workers there may be. In this case, it could be used to configure how many partitions you want to create, but that really is up to the Partitioner implementation.
Conclusion
This is a description of how a batch workflow on CF would look like. It's important to note that CF 1.7 is not out as of the writing of this answer. It is scheduled to be out Q1 of 2016 and at that time, this functionality will follow shortly afterwards.