OpenShift deployment - pod console logs are truncated - kubernetes

we are using OpenShift container platform (v3.11) for hosting our java application. We are writing application logs to standard pod console. However when I try to view pod logs or try to save logs to file, I am not getting complete log file instead getting only partial log (looks logs are truncated). I have tried to provide different options while viewing logs (like --since=48h etc..), but none of them worked.
Is there any way I can increase pod console buffer size or write complete log file contents to file.

The better way is configuring log aggrigation via fluentd/elastic (see elk_logging), however there's an option to change docker log driver settings on the node with the running container (see managing_docker_container_logs or docker_logging_configure)

Related

Is possible for a container to send kafka event when finishes?

We just migrated to a kubernetes cluster, I was wondering if it is possible to send a kafka event when a container/pod finishes automatically with the stdout as message. Right now we are using fluentd with elastic search but the output of a pod is used as input for the next one, we need to poll constantly elastic search for when the output is ready and that causes performance issues on overall execution
I'm not sure of your current setup but my first thought would jump to:
Use something such as fluentd or Logstash on it's own pod per node
Configure volume access to Kubernetes log folder /var/log/containers/*
Use the Kafka output for either fluentd or Logstash with file input (tail) on the logging folder
This approach would require the configuration above on each node however but requires minimal configuration of logging locations etc..
It's not something I've personally configured but have considered it for the future.
More info here

Can't submit new job via gui on standalone kubernetes flink deployment (session mode)

After deploy a flink in standalone kubernetes mode (session cluster) i can't upload any new job using flink GUI. After click +Add New button and choosing jar file, the progress strap ends and nothing happens.
There is no information/error on Job Manager logs about this.
When I try to upload any file (eg. text file) I get an error, and there is an info at the log:
"Exception occured in REST handler: Only Jar files are allowed."
I've also try to upload fake jar (an empty file called .jar) and it works - I can upload this kind of file.
I have a brand new, clean Apache Flink cluster running on Kubernetes cluster.
I have used docker hub image and I've try two different versions:
*1.13.2-scala_2.12-java8, and
1.13-scala_2.11-java8*
But the result was the same on both versions.
My deployment are based on this howto:
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
and I've used yaml files provided in Appendix #
Common cluster resource definitions # to this article:
flink-configuration-configmap.yaml
jobmanager-service.yaml
taskmanager-session-deployment.yaml
jobmanager-session-deployment-non-ha.yaml
I'e also used ingress controller to publish GUI running on 8081 on jobmanager.
I have tree pods (1 jobmanager, 2 task managers) and can't see any errors from flink logs.
Any suggestions what I'm missing, or when to find any errors ?
Problem solved. Problem was caused by nginx upload limit (default is 1024kb). Flink GUI are published outside Kubernetes using ingress controller and nginx.
When we try to upload job files bigger than 1MB (1024kb), nginx limit prevented to do it. Jobs with size below this limit (for example fake jar with 0 kb size) was uploaded successfully

GCP stackdriver logging logs format changed in bucket from folder per container to stdout\stderr

i have a question, similar as describe here: GKE kubernetes container stdout logs format changed
in old version of stackdriver i had 1 sink with filter like this:
resource.type=container,
resource.namespace_id=[NAMESPACE_NAME]
resource.pod_id=[POD_NAME]
and logs was stored in bucket pretty well, like this:
logName=projects/[PROJECT-NAME]/logs/[CONTAINER-NAME]
...so i had folders whith logs for each container.
But now i updated my stackdriver logging+monitoring to last version and now i have 2 folders stdout\stderr which contains all logs for all containers!
logName=projects/[PROJECT-NAME]/logs/stdout
logName=projects/[PROJECT-NAME]/logs/stderr
All logs from many containers stored in this single folders! This is pretty uncomfortable =(
I'v read about this in docs: https://cloud.google.com/monitoring/kubernetes-engine/migration#changes_in_log_entry_contents
The logName field might change. Stackdriver Kubernetes Engine Monitoring log entries use stdout or stderr in their log names whereas Legacy Stackdriver used a wider variety of names, including the container name. The container name is still available as a resource label.
...but i can't find solution! Please, help me, how to make container per folder logging, like it was in old version of stackdriver?
Here is a workaround that has been suggested:
Create a different sink for each of your containers filtered by
resource.labels.container_name
Export each sink to a different
bucket
Note: If you configure each separate sink to the same bucket the logs will be combined.
More details at Google Issue Tracker

How can I get log rotation working inside a kubernetes container/pod?

Our setup:
We are using kubernetes in GCP.
We have pods that write logs to a shared volume, with a sidecar container that sucks up our logs for our logging system.
We cannot just use stdout instead for this process.
Some of these pods are long lived and are filling up disk space because of no log rotation.
Question:
What is the easiest way to prevent the disk space from filling up here (without scheduling pod restarts)?
I have been attempting to install logrotate using: RUN apt-get install -y logrotate in our Dockerfile and placing a logrotate config file in /etc/logrotate.d/dynamicproxy but it doesnt seem to get run. /var/lib/logrotate/status never gets generated.
I feel like I am barking up the wrong tree or missing something integral to getting this working. Any help would be appreciated.
We ended up writing our own daemonset to properly collect the logs from the nodes instead of the container level. We then stopped writing to shared volumes from the containers and logged to stdout only.
We used fluentd to the logs around.
https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-kubernetes-logging
In general, you should write logs to stdout and configure log collection tool like ELK stack. This is the best practice.
However, if you want to run logrotate as a separate process in your container - you may use Supervisor, which serves as a very simple init system and allows you to run as many parallel process in container as you want.
Simple example for using Supervisor for rotating Nginx logs can be found here: https://github.com/misho-kr/docker-appliances/tree/master/nginx-nodejs
If you write to the filesystem the application creating the logs should be responsible for rotation. If you are running a java application with logback or log4j it is simple configuration change. For other languages/frameworks it is usually similar.
If that is not an option you could use a specialized tool to handle the rotation and piping the output to it. One example would be http://cr.yp.to/daemontools/multilog.html
As method of last resort you could investigate to log into a named pipe (FIFO) instead of a real file and have some other process handling the retrieval and writing of the data - including the rotation.

openshift pod fails and restarts frequently

I am creating an app in Origin 3.1 using my Docker image.
Whenever I create image new pod gets created but it restarts again and again and finally gives status as "CrashLoopBackOff".
I analysed logs for pod but it gives no error, all log data is as expected for a successfully running app. Hence, not able to determine the cause.
I came across below link today, which says "running an application inside of a container as root still has risks, OpenShift doesn't allow you to do that by default and will instead run as an arbitrary assigned user ID."
What is CrashLoopBackOff status for openshift pods?
Here my image is using root user only, what to do to make this work? as logs shows no error but pod keeps restarting.
Could anyone please help me with this.
You are seeing this because whatever process your image is starting isn't a long running process and finds no TTY and the container just exits and gets restarted repeatedly, which is a "crash loop" as far as openshift is concerned.
Your dockerfile mentions below :
ENTRYPOINT ["container-entrypoint"]
What actually this "container-entrypoint" doing ?
you need to check.
Did you use the -p or --previous flag to oc logs to see if the logs from the previous attempt to start the pod show anything
The recommendation of Red Hat is to make files group owned by GID 0 - the user in the container is always in the root group. You won't be able to chown, but you can selectively expose which files to write to.
A second option:
In order to allow images that use either named users or the root (0) user to build in OpenShift, you can add the project’s builder service account (system:serviceaccount::builder) to the privileged security context constraint (SCC). Alternatively, you can allow all images to run as any user.
Can you see the logs using
kubectl logs <podname> -p
This should give you the errors why the pod failed.
I am able to resolve this by creating a script as "run.sh" with the content at end:
while :; do
sleep 300
done
and in Dockerfile:
ADD run.sh /run.sh
RUN chmod +x /*.sh
CMD ["/run.sh"]
This way it works, thanks everybody for pointing out the reason ,which helped me in finding the resolution. But one doubt I still have why process gets exited in openshift in this case only, I have tried running tomcat server in the same way which just works fine without having sleep in script.