CreateContainerError with microk8s, ghrc.io image - kubernetes

The error message is CreateContainerError
Error: failed to create containerd container: error unpacking image: failed to extract layer sha256:b9b5285004b8a3: failed to get stream processor for application/vnd.in-toto+json: no processor for media-type: unknown
Image pull was successful with the token I supplied (jmtoken)
I am testing on AWS EC2 t2.medium, the docker image is tested on local machine.
Anybody experience this issue ? How did you solve it ?
deployment yaml file

I found a bug in my yaml file.
I supply command and CMD in K8S and Dockerfile each. So the CMD in Dockerfile which is actual command doesn't run, and cause side effects including this issue.
Another tip. Adding sleep 3000 command in K8S sometimes solve other issues like crash.

Related

Spiffe error while deploying client-agent pods

I am using this guide for deploying Spiffe on K8s Cluster "https://spiffe.io/docs/latest/try/getting-started-k8s/"
One of the steps in this process is running the command "kubectl apply -f client-deployment.yaml" which deploys spiffe client agent.
But the pods keeps on getting in the error state
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "sleep": executable file not found in $PATH: unknown
Image used : ghcr.io/spiffe/spire-agent:1.5.1
It seems connected to this PR from 3 days ago (there is no longer a "sleep" executable in the image).
SPIRE is moving away from the alpine Docker release images in favor of scratch images that contain only the release binary to minimize the size of the images and include only the software that is necessary to run in the container.
You should report the issue and use
gcr.io/spiffe-io/spire-agent:1.2.3
(the last image they used) meanwhile.

Kubeflow missing .kube/config files on local setup (Laptop/Desktop)

I have installed Kubeflow via MiniKF on my laptop.
I am trying to run a CNN on some MNIST data that uses TensorFlow 1.4.0. Currently, I am following this tutorial: https://codelabs.arrikto.com/codelabs/minikf-kale-katib-kfserving/index.html#0
The code is in a Jupyter Notebook server, and it runs completely fine. When I build a pipeline, it was completed successfully. But at the step for "Serving," when I run the kfserver command on my model, I get strange behavior: it gets stuck at the "Waiting for InferenceService."
An example screenshot from a successful run is shown below, where the process must end with the service being created:
enter image description here
Stuck at the "waiting for inferenceservice" means that the pipeline doesn't get the hostname of the servable. You can also validate this by running kubectl get inferenceservices, your new inference service should have it's READY state to True. If this is not the case, the deployment has failed.
There might be many different reasons why the inference service is not in ready state, there is a nice troubleshooting guide at kfserving's repo.

Google cloud build: No space left on device

I have been using google's cloud build for building my artifacts/docker for my deployment. But I am suddenly getting the following error when submitting a build:
Creating temporary tarball archive of 1103 file(s) totalling 99.5 MiB before compression.
ERROR: gcloud crashed (IOError): [Errno 28] No space left on device
I have increased the diskSizeGB size as well but still I am getting this error. Where does cloud build happen in the cloud or which VM ? How to get rid of this error ?
Cloud Build is a service. While its builds are on GCE VMs these are VMs managed by the service and opaque to you. You cannot access the build service's resources directly.
What value did you try for diskSizeGB?
Please updating your question to include the (salient parts of) cloudbuild.yaml and the gcloud command that you're using to submit the job.
I'm wondering whether the error corresponds to a lack of space locally (your host) rather than on the service's VM.

ERROR: (gcloud.compute.instance-templates.create) Could not fetch image resource:

The cluster was running fine for 255 days. I brought down the cluster and after that, I was unable to run the cluster up. It gives the following error while running the cluster up.
Creating minions.
Attempt 1 to create kubernetes-minion-template
ERROR: (gcloud.compute.instance-templates.create) Could not fetch image resource:
- The resource 'projects/google-containers/global/images/container-vm-v20170627' was not found
Attempt 1 failed to create instance template kubernetes-minion-template. Retrying.
This Attempt goes on and it always fails. Am I missing something?
The kubernetes version is v1.7.2.
It looks like the image you are trying to use to create the machines has been deprecated and/or is no longer available.
You should try specifying an alternative image to create these machines from Google's current public images.

openshift pod fails and restarts frequently

I am creating an app in Origin 3.1 using my Docker image.
Whenever I create image new pod gets created but it restarts again and again and finally gives status as "CrashLoopBackOff".
I analysed logs for pod but it gives no error, all log data is as expected for a successfully running app. Hence, not able to determine the cause.
I came across below link today, which says "running an application inside of a container as root still has risks, OpenShift doesn't allow you to do that by default and will instead run as an arbitrary assigned user ID."
What is CrashLoopBackOff status for openshift pods?
Here my image is using root user only, what to do to make this work? as logs shows no error but pod keeps restarting.
Could anyone please help me with this.
You are seeing this because whatever process your image is starting isn't a long running process and finds no TTY and the container just exits and gets restarted repeatedly, which is a "crash loop" as far as openshift is concerned.
Your dockerfile mentions below :
ENTRYPOINT ["container-entrypoint"]
What actually this "container-entrypoint" doing ?
you need to check.
Did you use the -p or --previous flag to oc logs to see if the logs from the previous attempt to start the pod show anything
The recommendation of Red Hat is to make files group owned by GID 0 - the user in the container is always in the root group. You won't be able to chown, but you can selectively expose which files to write to.
A second option:
In order to allow images that use either named users or the root (0) user to build in OpenShift, you can add the project’s builder service account (system:serviceaccount::builder) to the privileged security context constraint (SCC). Alternatively, you can allow all images to run as any user.
Can you see the logs using
kubectl logs <podname> -p
This should give you the errors why the pod failed.
I am able to resolve this by creating a script as "run.sh" with the content at end:
while :; do
sleep 300
done
and in Dockerfile:
ADD run.sh /run.sh
RUN chmod +x /*.sh
CMD ["/run.sh"]
This way it works, thanks everybody for pointing out the reason ,which helped me in finding the resolution. But one doubt I still have why process gets exited in openshift in this case only, I have tried running tomcat server in the same way which just works fine without having sleep in script.