kube-apiserver pod sticks in the CreateContainerError status

kube-apiserver pod sticks in the CreateContainerError status - kubernetes

I bootstrap a kubernetes cluster using kubeadm. After a few month of inactivity, when I get our running pods, I realize that the kube-apiserver sticks in the CreatecontainerError!
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-bcv8m 1/1 Running 435 175d
coredns-576cbf47c7-dwvmv 1/1 Running 435 175d
etcd-master 1/1 Running 23 175d
kube-apiserver-master 0/1 CreateContainerError 23 143m
kube-controller-manager-master 1/1 Running 27 175d
kube-proxy-2s9sx 1/1 Running 23 175d
kube-proxy-rrp7m 1/1 Running 20 127d
kube-scheduler-master 1/1 Running 24 175d
kubernetes-dashboard-65c76f6c97-7cwwp 1/1 Running 34 169d
tiller-deploy-779784fbd6-cwrqn 1/1 Running 0 152m
weave-net-2g8s5 2/2 Running 62 170d
weave-net-9r6cp 2/2 Running 44 127d
I delete this pod to restart it, but still goes same problem.
More info :
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 175d v1.12.1
worker Ready worker 175d v1.12.1
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:46:06Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:36:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl describe pod kube-apiserver-master -n kube-system
Name: kube-apiserver-master
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: master/192.168.88.205
Start Time: Wed, 07 Aug 2019 17:58:29 +0430
Labels: component=kube-apiserver
tier=control-plane
Annotations: kubernetes.io/config.hash: ce0f74ad5fcbf28c940c111df265f4c8
kubernetes.io/config.mirror: ce0f74ad5fcbf28c940c111df265f4c8
kubernetes.io/config.seen: 2019-08-07T17:58:28.178339939+04:30
kubernetes.io/config.source: file
scheduler.alpha.kubernetes.io/critical-pod:
Status: Running
IP: 192.168.88.205
Containers:
kube-apiserver:
Container ID: docker://3328849ad82745341717616f4ef6e951116fde376d19990610f670c30eb1e26f
Image: k8s.gcr.io/kube-apiserver:v1.12.1
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:52b9dae126b5a99675afb56416e9ae69239e012028668f7274e30ae16112bb1f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.88.205
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Waiting
Reason: CreateContainerError
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 07 Aug 2019 17:58:30 +0430
Finished: Wed, 07 Aug 2019 13:28:11 +0430
Ready: False
Restart Count: 23
Requests:
cpu: 250m
Liveness: http-get https://192.168.88.205:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/ca-certificates from etc-ca-certificates (ro)
/etc/kubernetes/pki from k8s-certs (ro)
/etc/ssl/certs from ca-certs (ro)
/usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
/usr/share/ca-certificates from usr-share-ca-certificates (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
k8s-certs:
Type: HostPath (bare host directory volume)
Path: /etc/kubernetes/pki
HostPathType: DirectoryOrCreate
ca-certs:
Type: HostPath (bare host directory volume)
Path: /etc/ssl/certs
HostPathType: DirectoryOrCreate
usr-share-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /usr/share/ca-certificates
HostPathType: DirectoryOrCreate
usr-local-share-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /usr/local/share/ca-certificates
HostPathType: DirectoryOrCreate
etc-ca-certificates:
Type: HostPath (bare host directory volume)
Path: /etc/ca-certificates
HostPathType: DirectoryOrCreate
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoExecute
Events: <none>
$ kubectl get pods kube-apiserver-master -n kube-system -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/config.hash: ce0f74ad5fcbf28c940c111df265f4c8
kubernetes.io/config.mirror: ce0f74ad5fcbf28c940c111df265f4c8
kubernetes.io/config.seen: 2019-08-07T17:58:28.178339939+04:30
kubernetes.io/config.source: file
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: 2019-08-13T08:33:18Z
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver-master
namespace: kube-system
resourceVersion: "19613877"
selfLink: /api/v1/namespaces/kube-system/pods/kube-apiserver-master
uid: 0032d68b-bda5-11e9-860c-000c292f9c9e
spec:
containers:
- command:
- kube-apiserver
- --authorization-mode=Node,RBAC
- --advertise-address=192.168.88.205
- --allow-privileged=true
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --insecure-port=0
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
image: k8s.gcr.io/kube-apiserver:v1.12.1
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 192.168.88.205
path: /healthz
port: 6443
scheme: HTTPS
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
name: kube-apiserver
resources:
requests:
cpu: 250m
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ca-certificates
name: etc-ca-certificates
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /usr/share/ca-certificates
name: usr-share-ca-certificates
readOnly: true
- mountPath: /usr/local/share/ca-certificates
name: usr-local-share-ca-certificates
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeName: master
priority: 2000000000
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
operator: Exists
volumes:
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /usr/share/ca-certificates
type: DirectoryOrCreate
name: usr-share-ca-certificates
- hostPath:
path: /usr/local/share/ca-certificates
type: DirectoryOrCreate
name: usr-local-share-ca-certificates
- hostPath:
path: /etc/ca-certificates
type: DirectoryOrCreate
name: etc-ca-certificates
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2019-08-07T13:28:29Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2019-08-07T08:58:11Z
message: 'containers with unready status: [kube-apiserver]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2019-08-07T08:58:11Z
message: 'containers with unready status: [kube-apiserver]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2019-08-07T13:28:29Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://3328849ad82745341717616f4ef6e951116fde376d19990610f670c30eb1e26f
image: k8s.gcr.io/kube-apiserver:v1.12.1
imageID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:52b9dae126b5a99675afb56416e9ae69239e012028668f7274e30ae16112bb1f
lastState:
terminated:
containerID: docker://3328849ad82745341717616f4ef6e951116fde376d19990610f670c30eb1e26f
exitCode: 255
finishedAt: 2019-08-07T08:58:11Z
reason: Error
startedAt: 2019-08-07T13:28:30Z
name: kube-apiserver
ready: false
restartCount: 23
state:
waiting:
message: 'Error response from daemon: Conflict. The container name "/k8s_kube-apiserver_kube-apiserver-master_kube-system_ce0f74ad5fcbf28c940c111df265f4c8_24"
is already in use by container 14935b714aee924aa42295fa5d252c760264d24ee63ea74e67092ccc3fb2b530.
You have to remove (or rename) that container to be able to reuse that name.'
reason: CreateContainerError
hostIP: 192.168.88.205
phase: Running
podIP: 192.168.88.205
qosClass: Burstable
startTime: 2019-08-07T13:28:29Z
If any other information is needed let me know.
How can I make it run properly?

The issue is explained by this error message from docker daemon:
message: 'Error response from daemon: Conflict. The container name
"/k8s_kube-apiserver_kube-apiserver-master_kube-system_ce0f74ad5fcbf28c940c111df265f4c8_24"
is already in use by container 14935b714aee924aa42295fa5d252c760264d24ee63ea74e67092ccc3fb2b530.
You have to remove (or rename) that container to be able to reuse that name.'
reason: CreateContainerError
List all containers using:
docker ps -a
You should be able to find on the list container with following name:
/k8s_kube-apiserver_kube-apiserver-master_kube-system_ce0f74ad5fcbf28c940c111df265f4c8_24
or ID:
14935b714aee924aa42295fa5d252c760264d24ee63ea74e67092ccc3fb2b530
Then you can try to delete it by running:
docker rm "/k8s_kube-apiserver_kube-apiserver-master_kube-system_ce0f74ad5fcbf28c940c111df265f4c8_24"
or by providing its ID:
docker rm 14935b714aee924aa42295fa5d252c760264d24ee63ea74e67092ccc3fb2b530
If there is still any problem with removing it, add the -f flag to delete it forcefully:
docker rm -f 14935b714aee924aa42295fa5d252c760264d24ee63ea74e67092ccc3fb2b530
Once done that, you can try to delete kube-apiserver-master pod, so it can be recreated.

Related

Is there any way to inject environment variables to initializing/running init-containers when it is pre-loaded with your inferenceservice/deployment?

I am creating a inferenceservice instance using the given yaml file-
apiVersion: "serving.kubeflow.org/v1alpha2"
kind: "InferenceService"
metadata:
minReplicas: 1
name: "sklearn-iris"
spec:
default:
predictor:
sklearn:
storageUri: "gs://kfserving-samples/models/sklearn/iris"
Now this will create a deployment, and since I am working from behind the proxy I am injecting the env variables for proxies as:
kubectl set env deployment/sklearn-iris-predictor-default-dclkq-deployment -n kfserving-test http_proxy={http_proxy value}
kubectl set env deployment/sklearn-iris-predictor-default-dclkq-deployment -n kfserving-test https_proxy={https_proxy value}
kubectl set env deployment/sklearn-iris-predictor-default-dclkq-deployment -n kfserving-test no_proxy={no_proxy value}
Since I have fixed minimum replicas as 1 so it make sure that one pod is there even without the traffic, now when this pod is being made it runs 1 init-container and 2 containers, so doing the kubectl set env thing, the proxies are being set in the container variables but not in the init-container and it is failing, so overall things are failing.
So in crux is there any way to set proxy/env details in init-container, without having the availibility of whole deployment yaml file to configure the env?
Edit:
kubectl edit deploy/deployment_name -o yaml -n namespace gives
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/minScale: "1"
deployment.kubernetes.io/revision: "1"
internal.serving.kubeflow.org/storage-initializer-sourceuri: gs://kfserving-samples/models/sklearn/iris
serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
creationTimestamp: "2021-02-03T06:51:29Z"
generation: 3
labels:
app: sklearn-iris-predictor-default-6xcgj
component: predictor
service.istio.io/canonical-name: sklearn-iris-predictor-default
service.istio.io/canonical-revision: sklearn-iris-predictor-default-6xcgj
serving.knative.dev/configuration: sklearn-iris-predictor-default
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/revision: sklearn-iris-predictor-default-6xcgj
serving.knative.dev/revisionUID: 470195f7-db41-4e9c-ac6b-c96c79a1218f
serving.knative.dev/service: sklearn-iris-predictor-default
serving.kubeflow.org/inferenceservice: sklearn-iris
name: sklearn-iris-predictor-default-6xcgj-deployment
namespace: kfserving-test
ownerReferences:
- apiVersion: serving.knative.dev/v1
blockOwnerDeletion: true
controller: true
kind: Revision
name: sklearn-iris-predictor-default-6xcgj
uid: 470195f7-db41-4e9c-ac6b-c96c79a1218f
resourceVersion: "633491"
selfLink: /apis/apps/v1/namespaces/kfserving-test/deployments/sklearn-iris-predictor-default-6xcgj-deployment
uid: 2fc10485-ba59-4eaf-b62a-480ecf4ab078
spec:
progressDeadlineSeconds: 120
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
serving.knative.dev/revisionUID: 470195f7-db41-4e9c-ac6b-c96c79a1218f
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/minScale: "1"
internal.serving.kubeflow.org/storage-initializer-sourceuri: gs://kfserving-samples/models/sklearn/iris
serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
creationTimestamp: null
labels:
app: sklearn-iris-predictor-default-6xcgj
component: predictor
service.istio.io/canonical-name: sklearn-iris-predictor-default
service.istio.io/canonical-revision: sklearn-iris-predictor-default-6xcgj
serving.knative.dev/configuration: sklearn-iris-predictor-default
serving.knative.dev/configurationGeneration: "1"
serving.knative.dev/revision: sklearn-iris-predictor-default-6xcgj
serving.knative.dev/revisionUID: 470195f7-db41-4e9c-ac6b-c96c79a1218f
serving.knative.dev/service: sklearn-iris-predictor-default
serving.kubeflow.org/inferenceservice: sklearn-iris
spec:
containers:
- args:
- --model_name=sklearn-iris
- --model_dir=/mnt/models
- --http_port=8080
- --workers=0
env:
- name: http_proxy
value: {proxy data}
- name: https_proxy
value: {proxy data}
- name: no_proxy
value: {no proxy data}
- name: PORT
value: "8080"
- name: K_REVISION
value: sklearn-iris-predictor-default-6xcgj
- name: K_CONFIGURATION
value: sklearn-iris-predictor-default
- name: K_SERVICE
value: sklearn-iris-predictor-default
- name: K_INTERNAL_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: K_INTERNAL_POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: gcr.io/kfserving/sklearnserver#sha256:fd87e984a6092aae6efd28a2d596aac16d83d207a0269a503a221cb24cfd2f39
imagePullPolicy: IfNotPresent
lifecycle:
preStop:
httpGet:
path: /wait-for-drain
port: 8022
scheme: HTTP
name: kfserving-container
ports:
- containerPort: 8080
name: user-port
protocol: TCP
resources:
limits:
cpu: "1"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/log
name: knative-var-log
subPathExpr: $(K_INTERNAL_POD_NAMESPACE)_$(K_INTERNAL_POD_NAME)_kfserving-container
- env:
- name: SERVING_NAMESPACE
value: kfserving-test
- name: SERVING_SERVICE
value: sklearn-iris-predictor-default
- name: SERVING_CONFIGURATION
value: sklearn-iris-predictor-default
- name: SERVING_REVISION
value: sklearn-iris-predictor-default-6xcgj
- name: QUEUE_SERVING_PORT
value: "8012"
- name: CONTAINER_CONCURRENCY
value: "0"
- name: REVISION_TIMEOUT_SECONDS
value: "300"
- name: SERVING_POD
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: SERVING_POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: SERVING_LOGGING_CONFIG
value: |-
{
"level": "info",
"development": false,
"outputPaths": ["stdout"],
"errorOutputPaths": ["stderr"],
"encoding": "json",
"encoderConfig": {
"timeKey": "ts",
"levelKey": "level",
"nameKey": "logger",
"callerKey": "caller",
"messageKey": "msg",
"stacktraceKey": "stacktrace",
"lineEnding": "",
"levelEncoder": "",
"timeEncoder": "iso8601",
"durationEncoder": "",
"callerEncoder": ""
}
}
- name: SERVING_LOGGING_LEVEL
- name: SERVING_REQUEST_LOG_TEMPLATE
value: '{"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl":
"{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}",
"status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent":
"{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}",
"serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}",
"latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"},
"traceId": "{{index .Request.Header "X-B3-Traceid"}}"}'
- name: SERVING_ENABLE_REQUEST_LOG
value: "false"
- name: SERVING_REQUEST_METRICS_BACKEND
value: prometheus
- name: TRACING_CONFIG_BACKEND
value: none
- name: TRACING_CONFIG_ZIPKIN_ENDPOINT
- name: TRACING_CONFIG_STACKDRIVER_PROJECT_ID
- name: TRACING_CONFIG_DEBUG
value: "false"
- name: TRACING_CONFIG_SAMPLE_RATE
value: "0.1"
- name: USER_PORT
value: "8080"
- name: SYSTEM_NAMESPACE
value: knative-serving
- name: METRICS_DOMAIN
value: knative.dev/internal/serving
- name: SERVING_READINESS_PROBE
value: '{"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}'
- name: ENABLE_PROFILING
value: "false"
- name: SERVING_ENABLE_PROBE_REQUEST_LOG
value: "false"
- name: METRICS_COLLECTOR_ADDRESS
image: gcr.io/knative-releases/knative.dev/serving/cmd/queue#sha256:0db974f58b48b219ab8047e11b481c2bbda52b7a2d54db5ed58e8659748ec125
imagePullPolicy: IfNotPresent
name: queue-proxy
ports:
- containerPort: 8022
name: http-queueadm
protocol: TCP
- containerPort: 9090
name: http-autometric
protocol: TCP
- containerPort: 9091
name: http-usermetric
protocol: TCP
- containerPort: 8012
name: queue-port
protocol: TCP
readinessProbe:
exec:
command:
- /ko-app/queue
- -probe-period
- "0"
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
requests:
cpu: 25m
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 300
volumes:
- emptyDir: {}
name: knative-var-log
status:
conditions:
- lastTransitionTime: "2021-02-03T06:51:29Z"
lastUpdateTime: "2021-02-03T06:51:29Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2021-02-03T07:38:08Z"
lastUpdateTime: "2021-02-03T07:38:08Z"
message: ReplicaSet "sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96"
has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 2
replicas: 1
unavailableReplicas: 1
updatedReplicas: 1
kubectl describe pod podname -n namespace gives
Name: sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96vqbgr
Namespace: kfserving-test
Priority: 0
Node: minikube/192.168.99.109
Start Time: Wed, 03 Feb 2021 13:50:33 +0530
Labels: app=sklearn-iris-predictor-default-6xcgj
component=predictor
pod-template-hash=7c97895d96
service.istio.io/canonical-name=sklearn-iris-predictor-default
service.istio.io/canonical-revision=sklearn-iris-predictor-default-6xcgj
serving.knative.dev/configuration=sklearn-iris-predictor-default
serving.knative.dev/configurationGeneration=1
serving.knative.dev/revision=sklearn-iris-predictor-default-6xcgj
serving.knative.dev/revisionUID=470195f7-db41-4e9c-ac6b-c96c79a1218f
serving.knative.dev/service=sklearn-iris-predictor-default
serving.kubeflow.org/inferenceservice=sklearn-iris
Annotations: autoscaling.knative.dev/class: kpa.autoscaling.knative.dev
autoscaling.knative.dev/minScale: 1
internal.serving.kubeflow.org/storage-initializer-sourceuri: gs://kfserving-samples/models/sklearn/iris
serving.knative.dev/creator: system:serviceaccount:kfserving-system:default
Status: Pending
IP: 172.17.0.22
Controlled By: ReplicaSet/sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96
Init Containers:
storage-initializer:
Container ID: docker://262a195f39fad7dfc62b494d9c9bbda8c7cdeee2f4b903b2948b809c5e00fb0c
Image: gcr.io/kfserving/storage-initializer:v0.5.0-rc2
Image ID: docker-pullable://gcr.io/kfserving/storage-initializer#sha256:9a16e6af385412bb62fd7e09f6d749e107e3ad92c488039acd20361fb5dd68cc
Port: <none>
Host Port: <none>
Args:
gs://kfserving-samples/models/sklearn/iris
/mnt/models
State: Running
Started: Wed, 03 Feb 2021 13:58:00 +0530
Last State: Terminated
Reason: Error
Message: ownload(src_uri, dest_path)
File "/usr/local/lib/python3.7/site-packages/kfserving/storage.py", line 58, in download
Storage._download_gcs(uri, out_dir)
File "/usr/local/lib/python3.7/site-packages/kfserving/storage.py", line 116, in _download_gcs
for blob in blobs:
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 212, in _items_iter
for page in self._page_iter(increment=False):
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 243, in _page_iter
page = self._next_page()
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 369, in _next_page
response = self._get_next_page_response()
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 419, in _get_next_page_response
method=self._HTTP_METHOD, path=self.path, query_params=params
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/_http.py", line 63, in api_request
return call()
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 206, in retry_target
last_exc,
File "<string>", line 3, in raise_from
google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(functools.partial(<bound method JSONConnection.api_request of <google.cloud.storage._http.Connection object at 0x7fd57c954c50>>, timeout=60, method='GET', path='/b/kfserving-samples/o', query_params={'projection': 'noAcl', 'prefix': 'models/sklearn/iris/'})), last exception: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /storage/v1/b/kfserving-samples/o?projection=noAcl&prefix=models%2Fsklearn%2Firis%2F&prettyPrint=false (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fd57c91bb90>: Failed to establish a new connection: [Errno 113] No route to host'))
Exit Code: 1
Started: Wed, 03 Feb 2021 13:53:53 +0530
Finished: Wed, 03 Feb 2021 13:57:45 +0530
Ready: False
Restart Count: 2
Limits:
cpu: 1
memory: 1Gi
Requests:
cpu: 100m
memory: 100Mi
Environment: <none>
Mounts:
/mnt/models from kfserving-provision-location (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hw2rw (ro)
Containers:
kfserving-container:
Container ID:
Image: gcr.io/kfserving/sklearnserver#sha256:fd87e984a6092aae6efd28a2d596aac16d83d207a0269a503a221cb24cfd2f39
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Args:
--model_name=sklearn-iris
--model_dir=/mnt/models
--http_port=8080
--workers=0
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
Environment:
PORT: 8080
K_REVISION: sklearn-iris-predictor-default-6xcgj
K_CONFIGURATION: sklearn-iris-predictor-default
K_SERVICE: sklearn-iris-predictor-default
K_INTERNAL_POD_NAME: sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96vqbgr (v1:metadata.name)
K_INTERNAL_POD_NAMESPACE: kfserving-test (v1:metadata.namespace)
Mounts:
/mnt/models from kfserving-provision-location (ro)
/var/log from knative-var-log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hw2rw (ro)
queue-proxy:
Container ID:
Image: gcr.io/knative-releases/knative.dev/serving/cmd/queue#sha256:0db974f58b48b219ab8047e11b481c2bbda52b7a2d54db5ed58e8659748ec125
Image ID:
Ports: 8022/TCP, 9090/TCP, 9091/TCP, 8012/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 25m
Readiness: exec [/ko-app/queue -probe-period 0] delay=0s timeout=10s period=10s #success=1 #failure=3
Environment:
SERVING_NAMESPACE: kfserving-test
SERVING_SERVICE: sklearn-iris-predictor-default
SERVING_CONFIGURATION: sklearn-iris-predictor-default
SERVING_REVISION: sklearn-iris-predictor-default-6xcgj
QUEUE_SERVING_PORT: 8012
CONTAINER_CONCURRENCY: 0
REVISION_TIMEOUT_SECONDS: 300
SERVING_POD: sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96vqbgr (v1:metadata.name)
SERVING_POD_IP: (v1:status.podIP)
SERVING_LOGGING_CONFIG: {
"level": "info",
"development": false,
"outputPaths": ["stdout"],
"errorOutputPaths": ["stderr"],
"encoding": "json",
"encoderConfig": {
"timeKey": "ts",
"levelKey": "level",
"nameKey": "logger",
"callerKey": "caller",
"messageKey": "msg",
"stacktraceKey": "stacktrace",
"lineEnding": "",
"levelEncoder": "",
"timeEncoder": "iso8601",
"durationEncoder": "",
"callerEncoder": ""
}
}
SERVING_LOGGING_LEVEL:
SERVING_REQUEST_LOG_TEMPLATE: {"httpRequest": {"requestMethod": "{{.Request.Method}}", "requestUrl": "{{js .Request.RequestURI}}", "requestSize": "{{.Request.ContentLength}}", "status": {{.Response.Code}}, "responseSize": "{{.Response.Size}}", "userAgent": "{{js .Request.UserAgent}}", "remoteIp": "{{js .Request.RemoteAddr}}", "serverIp": "{{.Revision.PodIP}}", "referer": "{{js .Request.Referer}}", "latency": "{{.Response.Latency}}s", "protocol": "{{.Request.Proto}}"}, "traceId": "{{index .Request.Header "X-B3-Traceid"}}"}
SERVING_ENABLE_REQUEST_LOG: false
SERVING_REQUEST_METRICS_BACKEND: prometheus
TRACING_CONFIG_BACKEND: none
TRACING_CONFIG_ZIPKIN_ENDPOINT:
TRACING_CONFIG_STACKDRIVER_PROJECT_ID:
TRACING_CONFIG_DEBUG: false
TRACING_CONFIG_SAMPLE_RATE: 0.1
USER_PORT: 8080
SYSTEM_NAMESPACE: knative-serving
METRICS_DOMAIN: knative.dev/internal/serving
SERVING_READINESS_PROBE: {"tcpSocket":{"port":8080,"host":"127.0.0.1"},"successThreshold":1}
ENABLE_PROFILING: false
SERVING_ENABLE_PROBE_REQUEST_LOG: false
METRICS_COLLECTOR_ADDRESS:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-hw2rw (ro)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
knative-var-log:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-hw2rw:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-hw2rw
Optional: false
kfserving-provision-location:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m50s default-scheduler Successfully assigned kfserving-test/sklearn-iris-predictor-default-6xcgj-deployment-7c97895d96vqbgr to minikube
Warning BackOff 36s kubelet, minikube Back-off restarting failed container
Normal Pulled 24s (x3 over 7m47s) kubelet, minikube Container image "gcr.io/kfserving/storage-initializer:v0.5.0-rc2" already present on machine
Normal Created 23s (x3 over 7m47s) kubelet, minikube Created container storage-initializer
Normal Started 23s (x3 over 7m46s) kubelet, minikube Started container storage-initializer

I don't think it is possible to do the same via kubectl set env command.
When I tried to run the updated command on my local setup passing the initcontainer name, it returns with the message:
warning: Deployment/dummy does not have any containers matching "dummy-init"
Command used:
kubectl set env -n dummy-ns deploy/dummy -c "dummy-init" dummy_env="true"
You can however use the kubectl edit command which will open the full yaml in edit mode and you can add the required environment variable to whichever container you need and save the spec. This will create a new pod with the new spec.
kubectl edit -n dummy-ns deploy/dummy -o yaml

Unable to deploy mongodb community operator in openshift

I'm trying to deploy the mongodb community operator in openshift 3.11, using the following commands:
git clone https://github.com/mongodb/mongodb-kubernetes-operator.git
cd mongodb-kubernetes-operator
oc new-project mongodb
oc create -f deploy/crds/mongodb.com_mongodb_crd.yaml -n mongodb
oc create -f deploy/operator/role.yaml -n mongodb
oc create -f deploy/operator/role_binding.yaml -n mongodb
oc create -f deploy/operator/service_account.yaml -n mongodb
oc apply -f deploy/openshift/operator_openshift.yaml -n mongodb
oc apply -f deploy/crds/mongodb.com_v1_mongodb_openshift_cr.yaml -n mongodb
Operator pod is successfully running, but the mongodb replicaset pods do not spin up. Error is as follows:
[kubenode#master mongodb-kubernetes-operator]$ oc get pods
NAME READY STATUS RESTARTS AGE
example-openshift-mongodb-0 1/2 CrashLoopBackOff 4 2m
mongodb-kubernetes-operator-66bfcbcf44-9xvj7 1/1 Running 0 2m
[kubenode#master mongodb-kubernetes-operator]$ oc logs -f example-openshift-mongodb-0 -c mongodb-agent
panic: Failed to get current user: user: unknown userid 1000510000
goroutine 1 [running]:
com.tengen/cm/util.init.3()
/data/mci/2f46ec94982c5440960d2b2bf2b6ae15/mms-automation/build/go-dependencies/src/com.tengen/cm/util/user.go:14 +0xe5
I have gone through all the issues raised on the mongodb-kubernetes-operator repository which are related to this issue (reference), and found a suggestion to set the MANAGED_SECURITY_CONTEXT environment variable to true in the operator, mongodb and mongodb-agent containers.
I have done so for all of these containers, but am still facing the same issue.
Here is the confirmation that the environment variables are correctly set:
[kubenode#master mongodb-kubernetes-operator]$ oc set env statefulset.apps/example-openshift-mongodb --list
# statefulsets/example-openshift-mongodb, container mongodb-agent
AGENT_STATUS_FILEPATH=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
AUTOMATION_CONFIG_MAP=example-openshift-mongodb-config
HEADLESS_AGENT=true
MANAGED_SECURITY_CONTEXT=true
# POD_NAMESPACE from field path metadata.namespace
# statefulsets/example-openshift-mongodb, container mongod
AGENT_STATUS_FILEPATH=/healthstatus/agent-health-status.json
MANAGED_SECURITY_CONTEXT=true
[kubenode#master mongodb-kubernetes-operator]$ oc set env deployment.apps/mongodb-kubernetes-operator --list
# deployments/mongodb-kubernetes-operator, container mongodb-kubernetes-operator
# WATCH_NAMESPACE from field path metadata.namespace
# POD_NAME from field path metadata.name
MANAGED_SECURITY_CONTEXT=true
OPERATOR_NAME=mongodb-kubernetes-operator
AGENT_IMAGE=quay.io/mongodb/mongodb-agent:10.19.0.6562-1
VERSION_UPGRADE_HOOK_IMAGE=quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
Operator Information
Operator Version: 0.3.0
MongoDB Image used: 4.2.6
Cluster Information
[kubenode#master mongodb-kubernetes-operator]$ openshift version
openshift v3.11.0+62803d0-1
[kubenode#master mongodb-kubernetes-operator]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2018-10-15T09:45:30Z", GoVersion:"go1.10.2", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.0+d4cacc0", GitCommit:"d4cacc0", GitTreeState:"clean", BuildDate:"2020-12-07T17:59:40Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
Update
When I check the replica pod yaml (see below), I see three occurrences of runAsUser security context set as 1000510000. I'm not sure how, but this is being set even though I'm not setting it manually.
[kubenode#master mongodb-kubernetes-operator]$ oc get -o yaml pod example-openshift-mongodb-0
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/scc: restricted
creationTimestamp: 2021-01-19T07:45:05Z
generateName: example-openshift-mongodb-
labels:
app: example-openshift-mongodb-svc
controller-revision-hash: example-openshift-mongodb-6549495b
statefulset.kubernetes.io/pod-name: example-openshift-mongodb-0
name: example-openshift-mongodb-0
namespace: mongodb
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: StatefulSet
name: example-openshift-mongodb
uid: 3e91eb40-5a2a-11eb-a5e0-0050569b1f59
resourceVersion: "15616863"
selfLink: /api/v1/namespaces/mongodb/pods/example-openshift-mongodb-0
uid: 3ea17a28-5a2a-11eb-a5e0-0050569b1f59
spec:
containers:
- command:
- agent/mongodb-agent
- -cluster=/var/lib/automation/config/cluster-config.json
- -skipMongoStart
- -noDaemonize
- -healthCheckFilePath=/var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- -serveStatusPort=5000
- -useLocalMongoDbTools
env:
- name: AGENT_STATUS_FILEPATH
value: /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
- name: AUTOMATION_CONFIG_MAP
value: example-openshift-mongodb-config
- name: HEADLESS_AGENT
value: "true"
- name: MANAGED_SECURITY_CONTEXT
value: "true"
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imagePullPolicy: Always
name: mongodb-agent
readinessProbe:
exec:
command:
- /var/lib/mongodb-mms-automation/probes/readinessprobe
failureThreshold: 60
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/automation/config
name: automation-config
readOnly: true
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /var/log/mongodb-mms-automation/healthstatus
name: healthstatus
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
- command:
- /bin/sh
- -c
- |2
# run post-start hook to handle version changes
/hooks/version-upgrade
# wait for config to be created by the agent
while [ ! -f /data/automation-mongod.conf ]; do sleep 3 ; done ; sleep 2 ;
# start mongod with this configuration
exec mongod -f /data/automation-mongod.conf ;
env:
- name: AGENT_STATUS_FILEPATH
value: /healthstatus/agent-health-status.json
- name: MANAGED_SECURITY_CONTEXT
value: "true"
image: mongo:4.2.6
imagePullPolicy: IfNotPresent
name: mongod
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /data
name: data-volume
- mountPath: /var/lib/mongodb-mms-automation/authentication
name: example-openshift-mongodb-agent-scram-credentials
- mountPath: /healthstatus
name: healthstatus
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
dnsPolicy: ClusterFirst
hostname: example-openshift-mongodb-0
imagePullSecrets:
- name: mongodb-kubernetes-operator-dockercfg-jhplw
initContainers:
- command:
- cp
- version-upgrade-hook
- /hooks/version-upgrade
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imagePullPolicy: Always
name: mongod-posthook
resources: {}
securityContext:
capabilities:
drop:
- KILL
- MKNOD
- SETGID
- SETUID
runAsUser: 1000510000
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /hooks
name: hooks
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: mongodb-kubernetes-operator-token-lr9l4
readOnly: true
nodeName: node1.192.168.27.116.nip.io
nodeSelector:
node-role.kubernetes.io/compute: "true"
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1000510000
seLinuxOptions:
level: s0:c23,c2
serviceAccount: mongodb-kubernetes-operator
serviceAccountName: mongodb-kubernetes-operator
subdomain: example-openshift-mongodb-svc
terminationGracePeriodSeconds: 30
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-volume-example-openshift-mongodb-0
- name: automation-config
secret:
defaultMode: 416
secretName: example-openshift-mongodb-config
- name: example-openshift-mongodb-agent-scram-credentials
secret:
defaultMode: 384
secretName: example-openshift-mongodb-agent-scram-credentials
- emptyDir: {}
name: healthstatus
- emptyDir: {}
name: hooks
- name: mongodb-kubernetes-operator-token-lr9l4
secret:
defaultMode: 420
secretName: mongodb-kubernetes-operator-token-lr9l4
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:45Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:46:39Z
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: Ready
- lastProbeTime: null
lastTransitionTime: null
message: 'containers with unready status: [mongodb-agent]'
reason: ContainersNotReady
status: "False"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2021-01-19T07:45:05Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://bd3ede9178bb78267bc19d1b5da0915d3bcd1d4dcee3e142c7583424bd2aa777
image: docker.io/mongo:4.2.6
imageID: docker-pullable://docker.io/mongo#sha256:c880f6b56f443bb4d01baa759883228cd84fa8d78fa1a36001d1c0a0712b5a07
lastState: {}
name: mongod
ready: true
restartCount: 0
state:
running:
startedAt: 2021-01-19T07:46:55Z
- containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
image: quay.io/mongodb/mongodb-agent:10.19.0.6562-1
imageID: docker-pullable://quay.io/mongodb/mongodb-agent#sha256:790c2670ef7cefd61cfaabaf739de16dbd2e07dc3b539add0da21ab7d5ac7626
lastState:
terminated:
containerID: docker://5e39c0b6269b8231bbf9cabb4ff3457d9f91e878eff23953e318a9475fb8a90e
exitCode: 2
finishedAt: 2021-01-19T19:39:58Z
reason: Error
startedAt: 2021-01-19T19:39:58Z
name: mongodb-agent
ready: false
restartCount: 144
state:
waiting:
message: Back-off 5m0s restarting failed container=mongodb-agent pod=example-openshift-mongodb-0_mongodb(3ea17a28-5a2a-11eb-a5e0-0050569b1f59)
reason: CrashLoopBackOff
hostIP: 192.168.27.116
initContainerStatuses:
- containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
image: quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook:1.0.2
imageID: docker-pullable://quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook#sha256:e99105b1c54e12913ddaf470af8025111a6e6e4c8917fc61be71d1bc0328e7d7
lastState: {}
name: mongod-posthook
ready: true
restartCount: 0
state:
terminated:
containerID: docker://7c31cef2a68e3e6100c2cc9c83e3780313f1e8ab43bebca79ad4d48613f124bd
exitCode: 0
finishedAt: 2021-01-19T07:46:45Z
reason: Completed
startedAt: 2021-01-19T07:46:44Z
phase: Running
podIP: 10.129.0.119
qosClass: BestEffort
startTime: 2021-01-19T07:46:39Z

ISTIO fails when host specified and HTTP redirect is enabled

I am using ISTIO and hostnames to load balance and direct traffic. I have the following Virtual Service enabled:
kind: VirtualService
metadata:
name: app-lab-app
namespace: my-namespace
spec:
gateways:
- istio-system/ingressgateway
hosts:
- hostname1.lab
http:
- match:
route:
- destination:
host: search-head-service
port:
number: 8000
When I try to reach this service via cURL, I receive the following error (32271 is the hostport which is mapped to port 80 on ingressgateway):
curl -Hhost:hostname1.lab http://10.20.1.108:32271/ -L
curl: (7) Failed to connect to hostname1.lab port 80: Connection refused
The issue is this..the endpoint does a redirect. I can reach the first website, but once the redirect happens, it fails
I can make this work by removing the hostname in the spec and changing to '*' but this won't help me do the host-based load balancing.
EDIT: ingress-gateway config (kubectl describe pod/ingress-gateway-xxxx)
Name: istio-ingressgateway-657df8bc75-cmghw
Namespace: istio-system
Priority: 0
Node: ip-10-20-1-108.us-west-2.compute.internal/10.20.1.108
Start Time: Tue, 21 Apr 2020 13:22:48 -0500
Labels: app=istio-ingressgateway
chart=gateways
heritage=Tiller
istio=ingressgateway
pod-template-hash=657df8bc75
release=istio
service.istio.io/canonical-name=istio-ingressgateway
service.istio.io/canonical-revision=1.5
Annotations: cni.projectcalico.org/podIP: 10.192.1.36/32
kubernetes.io/psp: 00-privileged
sidecar.istio.io/inject: false
Status: Running
IP: 10.192.1.36
IPs:
IP: 10.192.1.36
Controlled By: ReplicaSet/istio-ingressgateway-657df8bc75
Containers:
istio-proxy:
Container ID: docker://bfa29df838cd1e42a24674838bbf8454c8d56ec898b1833563f1b89a19a38030
Image: docker.io/istio/proxyv2:1.5.0
Image ID: docker-pullable://docker.io/istio/proxyv2#sha256:89b5fe2df96920189a193dd5f7dbd776e00024e4c1fd1b59bb53867278e9645a
Ports: 15020/TCP, 80/TCP, 443/TCP, 15029/TCP, 15030/TCP, 15031/TCP, 15032/TCP, 31400/TCP, 15443/TCP, 15011/TCP, 8060/TCP, 853/TCP, 15090/TCP
Host Ports: 0/TCP, 80/TCP, 443/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Args:
proxy
router
--domain
$(POD_NAMESPACE).svc.cluster.local
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--drainDuration
45s
--parentShutdownDuration
1m0s
--connectTimeout
10s
--serviceCluster
istio-ingressgateway
--zipkinAddress
zipkin.istio-system:9411
--proxyAdminPort
15000
--statusPort
15020
--controlPlaneAuthPolicy
NONE
--discoveryAddress
istio-pilot.istio-system.svc:15012
--trust-domain=cluster.local
State: Running
Started: Tue, 21 Apr 2020 13:22:50 -0500
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15020/healthz/ready delay=1s timeout=1s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
ISTIO_META_USER_SDS: true
CA_ADDR: istio-pilot.istio-system.svc:15012
NODE_NAME: (v1:spec.nodeName)
POD_NAME: istio-ingressgateway-657df8bc75-cmghw (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
HOST_IP: (v1:status.hostIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
ISTIO_META_WORKLOAD_NAME: istio-ingressgateway
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-ingressgateway
ISTIO_META_MESH_ID: cluster.local
ISTIO_AUTO_MTLS_ENABLED: true
ISTIO_META_POD_NAME: istio-ingressgateway-657df8bc75-cmghw (v1:metadata.name)
ISTIO_META_CONFIG_NAMESPACE: istio-system (v1:metadata.namespace)
ISTIO_META_ROUTER_MODE: sni-dnat
ISTIO_META_CLUSTER_ID: Kubernetes
Mounts:
/etc/istio/ingressgateway-ca-certs from ingressgateway-ca-certs (ro)
/etc/istio/ingressgateway-certs from ingressgateway-certs (ro)
/etc/istio/pod from podinfo (rw)
/var/run/ingress_gateway from ingressgatewaysdsudspath (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from istio-ingressgateway-service-account-token-7ssdg (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
ingressgatewaysdsudspath:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
ingressgateway-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio-ingressgateway-certs
Optional: true
ingressgateway-ca-certs:
Type: Secret (a volume populated by a Secret)
SecretName: istio-ingressgateway-ca-certs
Optional: true
istio-ingressgateway-service-account-token-7ssdg:
Type: Secret (a volume populated by a Secret)
SecretName: istio-ingressgateway-service-account-token-7ssdg
Optional: false
QoS Class: Burstable
Node-Selectors: istio-ingressgateway=true
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>

While I'd still like to understand what was happening originally, an ISTIO guru had me apply the patch below. These steps create an ISTIO Gateway (not an ingress gateway) on all nodes with the appropriate label:
Step 1 - Label certain nodes:
kubectl label nodes <hostname> istio-ingressgateway=true
kubectl label nodes <hostname> istio-ingressgateway=true
Step 2 - Save patch to a file like patch.json:
"spec": {
"replicas": 2,
"template": {
"spec": {
"nodeSelector": {"istio-ingressgateway" : "true"},
"containers": [
{"name" : "istio-proxy", "ports": [{"containerPort" : 80, "hostPort" : 80, "protocol": "TCP"}, {"containerPort":443, "hostPort": 443, "protocol" : "TCP"}]}
]
}
}
}
}
Step 3 - Apply the patch:
kubectl -n istio-system patch deployment/istio-ingressgateway --patch "$(cat patch.json)"

PriorityClass doesn't populate its value to podSpec

env: vagrant + virtualbox
kubernetes: 1.14
docker 18.06.3~ce~3-0~debian
os: debian stretch
I have priority classes:
root#k8s-master:/# kubectl get priorityclass
NAME VALUE GLOBAL-DEFAULT AGE
cluster-health-priority 1000000000 false 33m < -- created by me
default-priority 100 true 33m < -- created by me
system-cluster-critical 2000000000 false 33m < -- system
system-node-critical 2000001000 false 33m < -- system
default-priority - has been set as globalDefault
root#k8s-master:/# kubectl get priorityclass default-priority -o yaml
apiVersion: scheduling.k8s.io/v1
description: Used for all Pods without priorityClassName
globalDefault: true <------------------
kind: PriorityClass
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"scheduling.k8s.io/v1","description":"Used for all Pods without priorityClassName","globalDefault":true,"kind":"PriorityClass","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile"},"name":"default-priority"},"value":100}
creationTimestamp: "2019-07-15T16:48:23Z"
generation: 1
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: default-priority
resourceVersion: "304"
selfLink: /apis/scheduling.k8s.io/v1/priorityclasses/default-priority
uid: 5bea6f73-a720-11e9-8343-0800278dc04d
value: 100
I have some pods, which were created after policy classes creation
This
kube-state-metrics-874ccb958-b5spd 1/1 Running 0 9m18s 10.20.59.67 k8s-master <none> <none>
And this
tmp-shell-one-59fb949cb5-b8khc 1/1 Running 1 47s 10.20.59.73 k8s-master <none> <none>
kube-state-metrics pod is using priorityClass cluster-health-priority
root#k8s-master:/etc/kubernetes/addons# kubectl -n kube-system get pod kube-state-metrics-874ccb958-b5spd -o yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-07-15T16:48:24Z"
generateName: kube-state-metrics-874ccb958-
labels:
k8s-app: kube-state-metrics
pod-template-hash: 874ccb958
name: kube-state-metrics-874ccb958-b5spd
namespace: kube-system
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: kube-state-metrics-874ccb958
uid: 5c64bf85-a720-11e9-8343-0800278dc04d
resourceVersion: "548"
selfLink: /api/v1/namespaces/kube-system/pods/kube-state-metrics-874ccb958-b5spd
uid: 5c88143e-a720-11e9-8343-0800278dc04d
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kube-role
operator: In
values:
- master
containers:
- image: gcr.io/google_containers/kube-state-metrics:v1.6.0
imagePullPolicy: Always
name: kube-state-metrics
ports:
- containerPort: 8080
name: http-metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /healthz
port: 8080
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-state-metrics-token-jvz5b
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: k8s-master
nodeSelector:
namespaces/default: "true"
priorityClassName: cluster-health-priority <------------------------
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-state-metrics
serviceAccountName: kube-state-metrics
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: master
- key: CriticalAddonsOnly
operator: Exists
volumes:
- name: kube-state-metrics-token-jvz5b
secret:
defaultMode: 420
secretName: kube-state-metrics-token-jvz5b
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:48:24Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:48:58Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:48:58Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:48:24Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://a736dce98492b7d746079728b683a2c62f6adb1068075ccc521c5e57ba1e02d1
image: gcr.io/google_containers/kube-state-metrics:v1.6.0
imageID: docker-pullable://gcr.io/google_containers/kube-state-metrics#sha256:c98991f50115fe6188d7b4213690628f0149cf160ac47daf9f21366d7cc62740
lastState: {}
name: kube-state-metrics
ready: true
restartCount: 0
state:
running:
startedAt: "2019-07-15T16:48:46Z"
hostIP: 10.0.2.15
phase: Running
podIP: 10.20.59.67
qosClass: BestEffort
startTime: "2019-07-15T16:48:24Z"
tmp-shell pod has nothing about priority classes at all:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2019-07-15T16:56:49Z"
generateName: tmp-shell-one-59fb949cb5-
labels:
pod-template-hash: 59fb949cb5
run: tmp-shell-one
name: tmp-shell-one-59fb949cb5-b8khc
namespace: monitoring
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: tmp-shell-one-59fb949cb5
uid: 89c3caa3-a721-11e9-8343-0800278dc04d
resourceVersion: "1350"
selfLink: /api/v1/namespaces/monitoring/pods/tmp-shell-one-59fb949cb5-b8khc
uid: 89c71bad-a721-11e9-8343-0800278dc04d
spec:
containers:
- args:
- /bin/bash
image: nicolaka/netshoot
imagePullPolicy: Always
name: tmp-shell-one
resources: {}
stdin: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
tty: true
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-g9lnc
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: k8s-master
nodeSelector:
namespaces/default: "true"
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: default-token-g9lnc
secret:
defaultMode: 420
secretName: default-token-g9lnc
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:56:49Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:57:20Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:57:20Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2019-07-15T16:56:49Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://545d4d029b440ebb694386abb09e0377164c87d1170ac79704f39d3167748bf5
image: nicolaka/netshoot:latest
imageID: docker-pullable://nicolaka/netshoot#sha256:b3e662a8730ee51c6b877b6043c5b2fa61862e15d535e9f90cf667267407753f
lastState:
terminated:
containerID: docker://dfdfd0d991151e94411029f2d5a1a81d67b5b55d43dcda017aec28320bafc7d3
exitCode: 130
finishedAt: "2019-07-15T16:57:17Z"
reason: Error
startedAt: "2019-07-15T16:57:03Z"
name: tmp-shell-one
ready: true
restartCount: 1
state:
running:
startedAt: "2019-07-15T16:57:19Z"
hostIP: 10.0.2.15
phase: Running
podIP: 10.20.59.73
qosClass: BestEffort
startTime: "2019-07-15T16:56:49Z"
According to the docs:
The globalDefault field indicates that the value of this PriorityClass
should be used for Pods without a priorityClassName
and
Pod priority is specified by setting the priorityClassName field of
podSpec. The integer value of priority is then resolved and populated
to the priority field of podSpec
So, the questions are:
Why tmp-shell pod is not using priorityClass default-priority, even it created after priority class with globalDefault to true?
Why kube-state-metrics pod does not have field priority with parsed value from the priority class cluster-health-priority in podSpec?(look at .yaml above)
What am I doing wrong?

The only way I can reproduce it is by disabling the Priority Admission Controller by adding this argument --disable-admission-plugins=Priority to the kube-api-server definition which is under /etc/kubernetes/manifests/kube-apiserver.yaml of the Host running the API Server.
According to the documentation in v1.14 this is enabled by default. Please make sure that it is enabled in your cluster as well.

Openshift/Kubernetes volumes_from

I'm trying to mimic volumes_from using Openshift/Kubernetes. I have container A packaging the app and container B server the packaged app.
I cannot use init containers since I'm stuck on kubernetes 1.2.
I've tried the postStart lifecycle hook, detailed here:
How to mimic '--volumes-from' in Kubernetes
But, Openshift/Kubernetes is always complaining that container A is contantly crashing because once it's done packaging, it exits.
How do I get Openshift/Kubernetes to stop complaining about container A crashing and just accept that it finished it's job?
Or is there another way of having one container build a package for another container to run?
Thanks in advance for your time.
Update 1:
I don't have kubectl, but using oc describe pod myapp-2-prehook:
me:~/Projects/myapp (master) $ oc describe pod myapp-2-prehook
Name: myapp-2-prehook
Namespace: myproject
Node: my.host/my.ip
Start Time: Tue, 01 Nov 2016 15:30:55 -1000
Labels: openshift.io/deployer-pod-for.name=myapp-2
Status: Failed
IP:
Controllers: <none>
Containers:
lifecycle:
Container ID: docker://97a5272ebfa56f0c40fdc95094f13da06dba889049f2cc964fe3e89f61bd7792
Image: my.ip:5000/myproject/myapp#sha256:cde5739c5f2bdc8c25b1dd514f698c543cfb6c8b68c3f1afbc7760e11597fde9
Image ID: docker://3be476fec505e5b979bac69d327d4ffb53b3f568e85547c5b66c229948435f44
Port:
Command:
scripts/build.sh
QoS Tier:
cpu: BestEffort
memory: BestEffort
State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 01 Nov 2016 15:31:21 -1000
Finished: Tue, 01 Nov 2016 15:31:42 -1000
Ready: False
Restart Count: 0
Environment Variables:
CUSTOM_VAR1: custom_value1
OPENSHIFT_DEPLOYMENT_NAME: myapp
OPENSHIFT_DEPLOYMENT_NAMESPACE: myproject
Conditions:
Type Status
Ready False
Volumes:
default-token-goe98:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-goe98
No events.
Output of oc get pod assessor-2-prehook -o yaml:
apiVersion: v1
kind: Pod
metadata:
annotations:
openshift.io/deployment.name: myapp-2
openshift.io/scc: restricted
creationTimestamp: 2016-11-02T01:30:55Z
labels:
openshift.io/deployer-pod-for.name: myapp-2
name: myapp-2-prehook
namespace: myproject
resourceVersion: "21512896"
selfLink: /api/v1/namespaces/myproject/pods/myapp-2-prehook
uid: ffcb7766-a09b-11e6-9053-005056a65cf8
spec:
activeDeadlineSeconds: 21600
containers:
- command:
- scripts/build.sh
env:
- name: CUSTOM_VAR1
value: custom_value1
- name: OPENSHIFT_DEPLOYMENT_NAME
value: myapp-2
- name: OPENSHIFT_DEPLOYMENT_NAMESPACE
value: myproject
image: my.ip:5000/myproject/myapp#sha256:cde5739c5f2bdc8c25b1dd514f698c543cfb6c8b68c3f1afbc7760e11597fde9
imagePullPolicy: IfNotPresent
name: lifecycle
resources: {}
securityContext:
privileged: false
seLinuxOptions:
level: s0:c21,c0
terminationMessagePath: /dev/termination-log
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-goe98
readOnly: true
dnsPolicy: ClusterFirst
host: my.host
imagePullSecrets:
- name: default-dockercfg-srrog
nodeName: my.host
restartPolicy: Never
securityContext:
seLinuxOptions:
level: s0:c21,c0
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: default-token-goe98
secret:
secretName: default-token-goe98
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2016-11-02T01:31:49Z
message: 'containers with unready status: [lifecycle]'
reason: ContainersNotReady
status: "False"
type: Ready
containerStatuses:
- containerID: docker://97a5272ebfa56f0c40fdc95094f13da06dba889049f2cc964fe3e89f61bd7792
image: my.ip:5000/myproject/myapp#sha256:cde5739c5f2bdc8c25b1dd514f698c543cfb6c8b68c3f1afbc7760e11597fde9
imageID: docker://3be476fec505e5b979bac69d327d4ffb53b3f568e85547c5b66c229948435f44
lastState: {}
name: lifecycle
ready: false
restartCount: 0
state:
terminated:
containerID: docker://97a5272ebfa56f0c40fdc95094f13da06dba889049f2cc964fe3e89f61bd7792
exitCode: 1
finishedAt: 2016-11-02T01:31:42Z
reason: Error
startedAt: 2016-11-02T01:31:21Z
hostIP: 128.49.90.62
phase: Failed
startTime: 2016-11-02T01:30:55Z

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

kube-apiserver pod sticks in the CreateContainerError status - kubernetes

Related

Is there any way to inject environment variables to initializing/running init-containers when it is pre-loaded with your inferenceservice/deployment?

Unable to deploy mongodb community operator in openshift

ISTIO fails when host specified and HTTP redirect is enabled

PriorityClass doesn't populate its value to podSpec

Openshift/Kubernetes volumes_from

Categories

Resources