How to wait for tekton pipelinRun conditions - kubernetes

I have the following code within a gitlab pipeline which results in some kind of race condition:
kubectl apply -f pipelineRun.yaml
tkn pipelinerun logs -f pipeline-run
The tkn command immediately exits, since the pipelineRun object is not yet created. There is one very nice solution for this problem:
kubectl apply -f pipelineRun.yaml
kubectl wait --for=condition=Running --timeout=60s pipelinerun/pipeline-run
tkn pipelinerun logs -f pipeline-run
Unfortunately this is not working as expected, since Running seems to be no valid condition for a pipelineRun object. So my question is: what are the valid conditions of a pipelineRun object?

I didn't search too far and wide, but it looks like they only have two condition types imported from the knative.dev project?
https://github.com/tektoncd/pipeline/blob/main/vendor/knative.dev/pkg/apis/condition_types.go#L32
The link above is for the imported condition types from the pipeline source code of which it looks like Tekton only uses "Ready" and "Succeeded".
const (
// ConditionReady specifies that the resource is ready.
// For long-running resources.
ConditionReady ConditionType = "Ready"
// ConditionSucceeded specifies that the resource has finished.
// For resource which run to completion.
ConditionSucceeded ConditionType = "Succeeded"
)
But there may be other imports of this nature elsewhere in the project.

Tekton TaskRuns and PipelineRun only use a condition of type Succeeded.
Example:
conditions:
- lastTransitionTime: "2020-05-04T02:19:14Z"
message: "Tasks Completed: 4, Skipped: 0"
reason: Succeeded
status: "True"
type: Succeeded
The different status and messages available for the Succeeded condition are available in the documentation:
TaskRun: https://tekton.dev/docs/pipelines/taskruns/#monitoring-execution-status
PipelineRun: https://tekton.dev/docs/pipelines/pipelineruns/#monitoring-execution-status
As a side note, there is an activity timeout available in the API. That timeout is not surfaced to the CLI options though. You could create a tkn feature request for that.

Related

IBM Cloud Code Engine: How to overwrite environment variable in cron-invoked job run?

When working with jobs in IBM Cloud Code Engine, I would submit a jobrun for the actual invocation. For the jobrun I can specify environment variables to be passed into the runtime environment (--env FOO=BAR).
How can I do the same when using a cron subscription to trigger the job, i.e., to set FOO=BAR?
I believe the correct flag to pass to the CLI is the --ext or --extension.
% ibmcloud ce subscription cron create --
NAME:
create - Create a cron event subscription.
USAGE:
create --name CRON_SOURCE_NAME --destination DESTINATION_REF [options...]
OPTIONS:
--destination, -d value Required. The name of the resource that will receive events.
--name, -n value Required. The name of the cron event subscription. Use a name that is unique within the project.
--content-type, --ct value The media type of the 'data' or 'data-base64' option. Examples include 'application/json',
'application/x-www-form-urlencoded', 'text/html', and 'text/plain'.
--data, --da value The data to send to the destination.
--data-base64, --db value The base64-encoded data to send to the destination.
--destination-type, --dt value The type of the 'destination'. Valid values are 'app' and 'job'. (default: "app")
--extension, --ext value Set CloudEvents extensions to send to the destination. Must be in NAME=VALUE format. This option can be
specified multiple times. (accepts multiple inputs)
For example:
% ibmcloud ce subscription cron create -d application-5c -n sample-cron-sub --ext FOO=BAR
Creating cron event subscription 'sample-cron-sub'...
Run 'ibmcloud ce subscription cron get -n sample-cron-sub' to check the cron event subscription status.
% ibmcloud ce subscription cron get -n sample-cron-sub
Getting cron event subscription 'sample-cron-sub'...
OK
Name: sample-cron-sub
ID: xxxx
Project Name: susan-project
Project ID: xxxx
Age: 3m16s
Created: 2022-06-06T11:17:17-07:00
Destination Type: app
Destination: application-5c
Schedule: * * * * *
Time Zone: UTC
Ready: true
CloudEvents Extensions:
Name Value
FOO BAR
Events:
Type Reason Age Source Messages
Normal PingSourceSkipped 3m17s pingsource-controller PingSource is not ready
Normal PingSourceDeploymentUpdated 3m17s (x2 over 3m17s) pingsource-controller PingSource adapter deployment updated
Normal PingSourceSynchronized 3m17s pingsource-controller PingSource adapter is synchronized
After looking at this a bit more it appears that the name=value pairs you pass in to an event subscription prepend the string 'CE_' to the name.
Therefore, to allow for this in the running job, you would need to prepend the environment variable in the job with the CE_. For example:
When I create the jobdefinition I add the environment variable like this:
CE_FOO=BAR
Then, when I create the event subscription, for the --ext flag, I use the original suggestion: --ext FOO=BAR
I believe since the FOO variable in the event subscription automatically gets the prepended CE_ to the FOO variable it should work.
Please let me know if this does not work or I misunderstood you.

Kubernetes pod marked as `Completed` despite the exit code `255`

Situation:
I've got a CronJob that often fails (this is expected at the moment). Due to the fact that the container performing the job, has a side-car, the dependencies are between the containers are expressed through bash scripts and common mounts of emptyDir in /etc/liveness folder:
spec:
containers:
- args:
- -c
- set -x;
...
./process; # execute the main process
rc=$?;
rm /etc/liveness; # clean-up
exit $rc;
command:
- /bin/bash
Problem:
In the scenarios, where the job fails, I see the following in the logs:
+ rc=255
+ rm /etc/liveness
+ exit 255
With retryPolicy set to never, the failed pod enters the Completed status, which is misleading:
scheduler-1594015200-wl9xc 0/2 Completed 0 24m
According to official doc,
A Job creates one or more Pods and ensures that a specified number of
them successfully terminate.
And containers enter terminated state when
it has successfully completed execution or when it has failed for some
reason.
So if you set retryPolicy to never, this is what will happen.
A Pod's status field is a PodStatus object, which has a phase field.
Ref: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
Status and Phase is not the same. So I learned, that what happens above is that my pods end up in status Completed and phase Failed.

How to determine if a job is failed

How can I programatically determine if a job has failed for good and will not retry any more? I've seen the following on failed jobs:
status:
conditions:
- lastProbeTime: 2018-04-25T22:38:34Z
lastTransitionTime: 2018-04-25T22:38:34Z
message: Job has reach the specified backoff limit
reason: BackoffLimitExceeded
status: "True"
type: Failed
However, the documentation doesn't explain why conditions is a list. Can there be multiple conditions? If so, which one do I rely on? Is it a guarantee that there will only be one with status: "True"?
JobConditions is similar as PodConditions. You may read about PodConditions in official docs.
Anyway, To determine a successful pod, I follow another way. Let's look at it.
There are two fields in Job Spec.
One is spec.completion (default value 1), which says,
Specifies the desired number of successfully finished pods the
job should be run with.
Another is spec.backoffLimit (default value 6), which says,
Specifies the number of retries before marking this job failed.
Now In JobStatus
There are two fields in JobStatus too. Succeeded and Failed. Succeeded means how many times the Pod completed successfully and Failed denotes, The number of pods which reached phase Failed.
Once the Success is equal or bigger than the spec.completion, the job will become completed.
Once the Failed is equal or bigger than the spec.backOffLimit, the job will become failed.
So, the logic will be here,
if job.Status.Succeeded >= *job.Spec.Completion {
return "completed"
} else if job.Status.Failed >= *job.Spec.BackoffLimit {
return "failed"
}
If so, which one do I rely on?
You might not have to choose, considering commit dd84bba64
When a job is complete, the controller will indefinitely update its conditions
with a Complete condition.
This change makes the controller exit the
reconcilation as soon as the job is already found to be marked as complete.
As https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.26/#jobstatus-v1-batch says:
The latest available observations of an object's current state. When a
Job fails, one of the conditions will have type "Failed" and status
true. When a Job is suspended, one of the conditions will have type
"Suspended" and status true; when the Job is resumed, the status of
this condition will become false. When a Job is completed, one of the
conditions will have type "Complete" and status true. More info:
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/

What is the difference between a resourceVersion and a generation?

In Kubernetes object metadata, there are the concepts of resourceVersion and generation. I understand the notion of resourceVersion: it is an optimistic concurrency control mechanism—it will change with every update. What, then, is generation for?
resourceVersion changes on every write, and is used for optimistic concurrency control
in some objects, generation is incremented by the server as part of persisting writes affecting the spec of an object.
some objects' status fields have an observedGeneration subfield for controllers to persist the generation that was last acted on.
In a Deployment context:
In Short
resourceVersion is the version of a k8s resource, while generation is the version of the deployment which you can use to undo, pause and so on using kubectl cli.
Source code for kubectl rollout: https://github.com/kubernetes/kubectl/blob/master/pkg/cmd/rollout/rollout.go#L50
The Long Version
resourceVersion
K8s server saves all the modifications to any k8s resource. Each modification has a version which is called resourceVersion.
k8s languages libraries provide a way to receive in real-time events of ADD, DELETE, MODIFY events of any resource. You also have BOOKMARK event, but let's leave that for a moment aside.
On any modification operation, you receive the new k8s resource with updated resourceVersion. You can use this resourceVersion and start a watch starting from this resourceVersion, so you won't miss any events between the time the k8s server sent you back the first response, until the watch has started.
K8s doesn't preserve the history for every resource for ever. I think that it will save for 5m, but I'm not sure exactly.
resourceVersion will change after any modification of the object.
The reason of it's existence is to avoid concurrency problems where multiple clients try to modify the same k8s resource. This pattern is pretty common also in databases and you can find more info about it:
Optimistic concurrency control (https://en.wikipedia.org/wiki/Optimistic_concurrency_control)
https://www.programmersought.com/article/1104647506/
observedGeneration
You didn't talk about it in your question but thats important piece of information we need to clarify before moving on to generation.
It is the version of the replicaSet which this deployment is currently tracking on.
When the deployment is still creating for the first time, this value won't exist (good discussion on this can be found here: https://github.com/kubernetes/kubernetes/issues/47871).
This value can be found under status:
....
apiVersion: apps/v1
kind: Deployment
.....
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-02-07T19:04:17Z"
lastUpdateTime: "2021-02-07T19:04:17Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-02-07T19:04:15Z"
lastUpdateTime: "2021-02-07T19:17:09Z"
message: ReplicaSet "deployment-bcb437a4-59bb9f6f69" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 3. <<<--------------------
readyReplicas: 1
replicas: 1
updatedReplicas: 1
p.s, from this article: https://thenewstack.io/kubernetes-deployments-work/
observedGeneration is equal to the deployment.kubernetes.io/revision annotation. It is the observedGeneration.
It looks correct because deployment.kubernetes.io/revision does not exist when the deployment is first created and not yet ready, and also it has the same value as observedGeneration when the deployment is updated.
generation
It represents the version of the "new" replicaSet which this deployment should track on.
When a deployment is created for the first time, the value of this will be equal to 1. When the observedGeneration will be set to 1, it means that replicate set is ready (This question is not about how to know if a deployment was successful (or not) so I'm not getting into what is "ready", which is some terminology I created for this answer - but be sure that there are additional conditions to check if a deployment was successful or not).
Same goes for any change in the deployment k8s resource which will trigger re-deployment. the generation value will be incremented by 1, and then it will take some time until observedGeneration will be equal to generation value.
More info on observedGeneration and generation in the context of kuebctl rollout status (to check if a deployment "finished") from kubectl source code:
https://github.com/kubernetes/kubectl/blob/a2d36ec6d62f756e72fb3a5f49ed0f720ad0fe83/pkg/polymorphichelpers/rollout_status.go#L75
if deployment.Generation <= deployment.Status.ObservedGeneration {
cond := deploymentutil.GetDeploymentCondition(deployment.Status, appsv1.DeploymentProgressing)
if cond != nil && cond.Reason == deploymentutil.TimedOutReason {
return "", false, fmt.Errorf("deployment %q exceeded its progress deadline", deployment.Name)
}
if deployment.Spec.Replicas != nil && deployment.Status.UpdatedReplicas < *deployment.Spec.Replicas {
return fmt.Sprintf("Waiting for deployment %q rollout to finish: %d out of %d new replicas have been updated...\n", deployment.Name, deployment.Status.UpdatedReplicas, *deployment.Spec.Replicas), false, nil
}
if deployment.Status.Replicas > deployment.Status.UpdatedReplicas {
return fmt.Sprintf("Waiting for deployment %q rollout to finish: %d old replicas are pending termination...\n", deployment.Name, deployment.Status.Replicas-deployment.Status.UpdatedReplicas), false, nil
}
if deployment.Status.AvailableReplicas < deployment.Status.UpdatedReplicas {
return fmt.Sprintf("Waiting for deployment %q rollout to finish: %d of %d updated replicas are available...\n", deployment.Name, deployment.Status.AvailableReplicas, deployment.Status.UpdatedReplicas), false, nil
}
return fmt.Sprintf("deployment %q successfully rolled out\n", deployment.Name), true, nil
}
return fmt.Sprintf("Waiting for deployment spec update to be observed...\n"), false, nil
I must say that I'm not sure when a observedGeneration can be higher than generation. Maybe folks can help me out in the comments.
To sum it all up: Illustration from this great article: https://thenewstack.io/kubernetes-deployments-work/
More Info:
https://kubernetes.slack.com/archives/C2GL57FJ4/p1612651711106700?thread_ts=1612650049.105300&cid=C2GL57FJ4
Some more information about Bookmarks Events (Which are related to resourceVersion: What k8s bookmark solves?
How do you rollback deployments in Kubernete: https://learnk8s.io/kubernetes-rollbacks

msdeploy - stop deploy in postsync if presync fails

I am using msdeploy -presync to backup the current deployment of a website in IIS before the -postsync deploys it, however I recently had a situation where the -presync failed (raised a warning due to a missing dll) and the -postsync continued and overwrote the code.
Both the presync and postsync run batch files.
Obviously this is bad as the backup failed so there is no backout route if the deployment has bugs or fails.
Is there anyway to stop the postsync if the presync raises warnings with msdeploy?
Perhaps the issue here is that the presync failure was raised as a warning not an error.
Supply successReturnCodes parameter set to 0 (success return code convention) to presync option such as:
-preSync:runCommand="your script",successReturnCodes=0
More info at: http://technet.microsoft.com/en-us/library/ee619740(v=ws.10).aspx