Cert-Manager: renewing certificate not working - kubernetes

Folks, am trying to renew certificates for a wildcard domain, and am seeing the following errors when looking at the logs on the certmanager pod, and at the error in the certificaterequest
Message: Waiting on certificate issuance from order
production/certmanager-xxxxxxxxx-pp9n2-3392968554: "pending"
production/cert-manager-877fd747c-4nf2f[cert-manager]: E0817 21:32:34.447585 1
controller.go:166] cert-manager/challenges "msg"="re-queuing item due to error
processing" "error"="failed to change Route 53 record set: InvalidChangeBatch: [RRSet
with DNS name _acme-challenge.xxxxxx.com., type TXT, SetIdentifier
\"xxxxxxx\" cannot be created because a non
multivalue answer rrset exists with the same name and type.]"
"key"="production/certmanager-xxxx-pp9n2-3392968554-1376642102"
Do I need to update the TXT record in DNS? Currently it is set to a different value than the SetIdentifier value from the output above.
Also noticing a strange error in the log. The pod name mention is incorrect, there is a different pod by another name running:
production/cert-manager-877fd747c-4nf2f[cert-manager]: E0817 21:45:46.379332 1
controller.go:208] cert-manager/challenges "msg"="challenge in work queue no longer
exists" "error"="challenge.acme.cert-manager.io \"certmanager-idrive-ssl-srvw4-
3392968554-1376642102\" not found"
Thanks!

Related

InvalidIdentityToken: Couldn't retrieve verification key from your identity provider

I am new to aws and kubectl, I need to deploy one of the app to aws. After deploying to eks cluster, I edited the ingress in the kubectl but unfortunately it returned 404 not found. (i am pretty sure the new service container works fine)
after checking from kubectl describe ingress, here are some events reports:
Warning FailedBuildModel 40m ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-4a93-4e27-9d6b-xxxxxxxx
Warning FailedBuildModel 22m ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-5368-41e1-8a4d-xxxxxxxx
Warning FailedBuildModel 5m8s ingress Failed build model due to WebIdentityErr: failed to retrieve credentials
caused by: InvalidIdentityToken: Couldn't retrieve verification key from your identity provider, please reference AssumeRoleWithWebIdentity documentation for requirements
status code: 400, request id: xxxxxxxx-20ea-4bd0-b1cb-xxxxxxxx
Anyone has ideas about this issue?

Kubernetes DSE Cassandra CommitLogReplayer$CommitLogReplayException

I have installed Cassandra on Kubernetes (9 pods) All the pods are up and running except
for one pod, which shows the below error.
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Caused by: org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:111)
... 12 more
ERROR [main] 2021-09-06 06:19:08,990 JVMStabilityInspector.java:251 - JVM state determined to be unstable. Exiting forcefully due to:
org.apache.cassandra.db.commitlog.CommitLogReplayer$CommitLogReplayException: Encountered bad header at position 47137 of commit log /var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log, with bad position but valid CRC
at org.apache.cassandra.db.commitlog.CommitLogReplayer.shouldSkipSegmentOnError(CommitLogReplayer.java:438)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.handleUnrecoverableError(CommitLogReplayer.java:452)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:109)
at org.apache.cassandra.db.commitlog.CommitLogSegmentReader$SegmentIterator.computeNext(CommitLogSegmentReader.java:84)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:236)
at org.apache.cassandra.db.commitlog.CommitLogReader.readAllFiles(CommitLogReader.java:134)
at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:154)
at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:213)
at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:194)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:338)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:527)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:702)
at com.datastax.bdp.DseModule.main(DseModule.java:96)
Can someone help me out please
For whatever reason, one of the commit log segments got corrupted on the node.
You can workaround the issue by manually deleting this file on the pod:
/var/lib/cassandra/commitlog/CommitLog-600-1630582314923.log
Interestingly, that commit log segment was created on September 2 (1630582314923) but the log entry you posted was from September 6. This indicates something happened to the pod which resulted in the corrupted file.
You'll need to review the Cassandra logs on the pod (not the pod logs itself) to determine the root cause and address it. Cheers!

Enable custom kubernetes scheduler for a namespace

I have a k8 job that brings up multiple pods. This job is used for load testing so all the pods need to come up at the same time. Job shouldn't be started until nodes are available for all pods to be scheduled.
I came across kube-batch https://github.com/kubernetes-sigs/kube-batch to do this scheduling. I have couple of questions:
1. How to enable kube-batch for only one namespace in a cluster?
2. Installed kube-batch by following the tutorial. But pods are failing on startup with below error. How to resolve this error?
I1204 20:07:55.911393 1 allocate.go:96] Queue <default> is overused, ignore it.
I1204 20:07:55.911399 1 allocate.go:194] Leaving Allocate ...
I1204 20:07:55.911407 1 backfill.go:41] Enter Backfill ...
I1204 20:07:55.911413 1 backfill.go:71] Leaving Backfill ...
E1204 20:07:55.911521 1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:522
/usr/local/go/src/runtime/panic.go:513
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/framework/session.go:368
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/plugins/gang/gang.go:154
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/framework/framework.go:58
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/scheduler.go:102
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/scheduler.go:85
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x10ab979]
Not sure what you are trying to achive is doable. In my opinion what you can do is to modify the pods dockerfile to include Supervisord . Then in supervisord specify the commands you want to run when the pods come in running state using priority for supervisord.
Example
[program:api]
directory=/usr/local
command=go main.go
priority=100
autostart=true
autorestart=true
stderr_logfile=/var/log
stdout_logfile=/var/log

How to fix NetworkUnavailable:True error in kubernetes node

Node:
Status NetworkUnavailable: True
Message error:
RouteController failed to create a route
Try delete the node and create new node. After create new node status returned to normal.
How to troubleshoot or check problem
In order to get more info from your problem you can retrieve logs from StackDriver--> Logging --> Loggs viewer, user "Advanced Filter" and there search by "Status NetworkUnavailable: True Message error" or "RouteController failed to create a route"

Spinnaker pipeline failing when deployment strategy is Recreate

When the deployment strategy is changed from "Rolling update" to "Recreate", I am facing the below error
Failure executing: PATCH at: https://3x.xxx.2x1.xxx/apis/extensions/v1beta1/namespaces/default/deployments/xxxxxx. Message: Deployment.apps "xxxxxx" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy type is 'Recreate'. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=spec.strategy.rollingUpdate, message=Forbidden: may not be specified when strategy type is 'Recreate', reason=FieldValueForbidden, additionalProperties={})], group=apps, kind=Deployment, name=xxxxxx, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Deployment.apps "xxxxxx" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy type is 'Recreate', metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}).
Any help on this? I am using Spinnaker 1.6.0
There are many tickets on GitHub related to that problem: Kubernetes, Cert-manager, Spinnaker. And in each one you can find the same answer - it is not possible to switch the update strategy of already created resources.
So, the only way is to create a new deployment with a new strategy due to the implementation of the updating process in Kubernetes.