How to show timestamps for each line in Argo Workflows pods? - argo-workflows

I'm trying to figure out how to show a timestamp for each line of STDOUT of an Argo Workflows pod. The init and wait containers by default show a timestamp, but never the main container.
The Argo CLI has a --timestamp flag when viewing logs.
Also the argo-java-client has a logOptionsTimestamps property that also enables timestamps.
However I cannot find a similar option when defining a Workflow in YAML. I've gone through the field reference guide but haven't been able to find something to enable timestamps in the main container.
Does anyone know if this is possible, or how to enable them?
Thanks,
Weldon

The reason init and wait log statements have timestamps is that the Argo executable's logger writes timestamps by default.
The --timestamps option does not cause the containers themselves to log timestamps. It just decorates each log line with a timestamp (kubectl has a similar option).
As far as I know, there's no way to declaratively cause code running in the main container to log timestamps. You'd have to modify the code itself to use a logger which inserts timestamps.

Related

Kubernetes Log Splitting (Stdout/Stderr)

When I call kubectl logs pod_name, I get both the stdout/err combined. Is it possible to specify that I only want stdout or stderr? Likewise I am wondering if it is possible to do so through the k8s rest interface. I've searched for several hours and read through the repository but could not find anything.
Thanks!
No, this is not possible. To my knowlegde, the moment of writing this, kubernetes supports only one logs api endpoint that returns all logs (stdout and stderr combined).
If you want to access them separately you should consider using different logging driver or query logs directly from docker.

How can I get log rotation working inside a kubernetes container/pod?

Our setup:
We are using kubernetes in GCP.
We have pods that write logs to a shared volume, with a sidecar container that sucks up our logs for our logging system.
We cannot just use stdout instead for this process.
Some of these pods are long lived and are filling up disk space because of no log rotation.
Question:
What is the easiest way to prevent the disk space from filling up here (without scheduling pod restarts)?
I have been attempting to install logrotate using: RUN apt-get install -y logrotate in our Dockerfile and placing a logrotate config file in /etc/logrotate.d/dynamicproxy but it doesnt seem to get run. /var/lib/logrotate/status never gets generated.
I feel like I am barking up the wrong tree or missing something integral to getting this working. Any help would be appreciated.
We ended up writing our own daemonset to properly collect the logs from the nodes instead of the container level. We then stopped writing to shared volumes from the containers and logged to stdout only.
We used fluentd to the logs around.
https://github.com/splunk/splunk-connect-for-kubernetes/tree/master/helm-chart/splunk-kubernetes-logging
In general, you should write logs to stdout and configure log collection tool like ELK stack. This is the best practice.
However, if you want to run logrotate as a separate process in your container - you may use Supervisor, which serves as a very simple init system and allows you to run as many parallel process in container as you want.
Simple example for using Supervisor for rotating Nginx logs can be found here: https://github.com/misho-kr/docker-appliances/tree/master/nginx-nodejs
If you write to the filesystem the application creating the logs should be responsible for rotation. If you are running a java application with logback or log4j it is simple configuration change. For other languages/frameworks it is usually similar.
If that is not an option you could use a specialized tool to handle the rotation and piping the output to it. One example would be http://cr.yp.to/daemontools/multilog.html
As method of last resort you could investigate to log into a named pipe (FIFO) instead of a real file and have some other process handling the retrieval and writing of the data - including the rotation.

Is it possible to stop a job in Kubernetes without deleting it

Because Kubernetes handles situations where there's a typo in the job spec, and therefore a container image can't be found, by leaving the job in a running state forever, I've got a process that monitors job events to detect cases like this and deletes the job when one occurs.
I'd prefer to just stop the job so there's a record of it. Is there a way to stop a job?
1) According to the K8S documentation here.
Finished Jobs are usually no longer needed in the system. Keeping them around in the system will put pressure on the API server. If the Jobs are managed directly by a higher level controller, such as CronJobs, the Jobs can be cleaned up by CronJobs based on the specified capacity-based cleanup policy.
Here are the details for the failedJobsHistoryLimit property in the CronJobSpec.
This is another way of retaining the details of the failed job for a specific duration. The failedJobsHistoryLimit property can be set based on the approximate number of jobs run per day and the number of days the logs have to be retained. Agree that the Jobs will be still there and put pressure on the API server.
This is interesting. Once the job completes with failure as in the case of a wrong typo for image, the pod is getting deleted and the resources are not blocked or consumed anymore. Not sure exactly what kubectl job stop will achieve in this case. But, when the Job with a proper image is run with success, I can still see the pod in kubectl get pods.
2) Another approach without using the CronJob is to specify the ttlSecondsAfterFinished as mentioned here.
Another way to clean up finished Jobs (either Complete or Failed) automatically is to use a TTL mechanism provided by a TTL controller for finished resources, by specifying the .spec.ttlSecondsAfterFinished field of the Job.
Not really, no such mechanism exists in Kubernetes yet afaik.
You can workaround is to ssh into the machine and run a: (if you're are using Docker)
# Save the logs
$ docker log <container-id-that-is-running-your-job> 2>&1 > save.log
$ docker stop <main-container-id-for-your-job>
It's better to stream log with something like Fluentd, or logspout, or Filebeat and forward the logs to an ELK or EFK stack.
In any case, I've opened this
You can suspend cronjobs by using the suspend attribute. From the Kubernetes documentation:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
Documentation says:
The .spec.suspend field is also optional. If it is set to true, all
subsequent executions are suspended. This setting does not apply to
already started executions. Defaults to false.
So, to pause a cron you could:
run and edit "suspend" from False to True.
kubectl edit cronjob CRON_NAME (if not in default namespace, then add "-n NAMESPACE_NAME" at the end)
you could potentially create a loop using "for" or whatever you like, and have them all changed at once.
you could just save the yaml file locally and then just run:
kubectl create -f cron_YAML
and this would recreate the cron.
The other answers hint around the .spec.suspend solution for the CronJob API, which works, but since the OP asked specifically about Jobs it is worth noting the solution that does not require a CronJob.
As of Kubernetes 1.21, there alpha support for the .spec.suspend field in the Job API as well, (see docs here). The feature is behind the SuspendJob feature gate.

Passing long configuration file to Kubernetes

I like the work methology of Kuberenetes, use self-contained image and pass the configuration in a ConfigMap, as a volume.
Now this worked great until I tried to do this thing with Liquibase container, The SQL is very long ~1.5K lines, and Kubernetes rejects it as too long.
Error from Kubernetes:
The ConfigMap "liquibase-test-content" is invalid: metadata.annotations: Too long: must have at most 262144 characters
I thought of passing the .sql files as a hostPath, but as I understand these hostPath's content is probably not going to be there
Is there any other way to pass configuration from the K8s directory to pods? Thanks.
The error you are seeing is not about the size of the actual ConfigMap contents, but about the size of the last-applied-configuration annotation that kubectl apply automatically creates on each apply. If you use kubectl create -f foo.yaml instead of kubectl apply -f foo.yaml, it should work.
Please note that in doing this you will lose the ability to use kubectl diff and do incremental updates (without replacing the whole object) with kubectl apply.
Since 1.18 you can use server-side apply to circumvent the problem.
kubectl apply --server-side=true -f foo.yml
where server-side=true runs the apply command on the server instead of the client.
This will properly show conflicts with other actors, including client-side apply and thus fail:
Apply failed with 4 conflicts: conflicts with "kubectl-client-side-apply" using apiextensions.k8s.io/v1:
- .status.conditions
- .status.storedVersions
- .status.acceptedNames.kind
- .status.acceptedNames.plural
Please review the fields above--they currently have other managers. Here
are the ways you can resolve this warning:
* If you intend to manage all of these fields, please re-run the apply
command with the `--force-conflicts` flag.
* If you do not intend to manage all of the fields, please edit your
manifest to remove references to the fields that should keep their
current managers.
* You may co-own fields by updating your manifest to match the existing
value; in this case, you'll become the manager if the other manager(s)
stop managing the field (remove it from their configuration).
See http://k8s.io/docs/reference/using-api/api-concepts/#conflicts
If the changes are intended you can simple use the first option:
kubectl apply --server-side=true -force-conflicts -f foo.yml
You can use an init container for this. Essentially, put the .sql files on GitHub or S3 or really any location you can read from and populate a directory with it. The semantics of the init container guarantee that the Liquibase container will only be launched after the config files have been downloaded.

Statefulset - Possible to Skip creation of pod 0 when it fails and proceed with the next one?

I currently do have a problem with the statefulset under the following condition:
I have a percona SQL cluster running with persistent storage and 2 nodes
now i do force both pods to fail.
first i will force pod-0 to fail
Afterwards i will force pod-1 to fail
Now the cluster is not able to recover without manual interference and possible dataloss
Why:
The statefulset is trying to bring pod-0 up first, however this one will not be brought online because of the following message:
[ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1
What i could do alternatively, but what i dont really like:
I could change ".spec.podManagementPolicy" to "Parallel" but this could lead to race conditions when forming the cluster. Thus i would like to avoid that, i basically like the idea of starting the nodes one after another
What i would like to have:
the possibility to have ".spec.podManagementPolicy":"OrderedReady" activated but with the possibility to adjust the order somehow
to be able to put specific pods into "inactive" mode so they are being ignored until i enable them again
Is something like that available? Does someone have any other ideas?
Unfortunately, nothing like that is available in standard functions of Kubernetes.
I see only 2 options here:
Use InitContainers to somehow check the current state on relaunch.
That will allow you to run any code before the primary container is started so you can try to use a custom script in order to resolve the problem etc.
Modify the database startup script to allow it to wait for some Environment Variable or any flag file and use PostStart hook to check the state before running a database.
But in both options, you have to write your own logic of startup order.