The k8s docker container mounts the host, but fails to output log files to the host. Can you tell me the reason?
kubernets yaml like this
apiVersion: apps/v1
kind: Deployment
metadata:
name: db
namespace: test
spec:
replicas: 1
template:
spec:
containers:
- name: db
image: postgres:11.0-alpine
command:
- "docker-entrypoint.sh"
- "postgres"
- "-c"
- "logging_collector=on"
- "-c"
- "log_directory=/var/lib/postgresql/log"
ports:
- containerPort: 5432
protocol: TCP
volumeMounts:
- name: log-fs
mountPath: /var/lib/postgresql/log
volumes:
- name: log-fs
hostPath:
path: /var/log
kubectl logs nfs-685944f556-r2pjr
Serving /exports
Serving /
rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused
Starting rpcbind
exportfs: / does not support NFS export
NFS started
nfs.deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs
labels:
app: nfs
spec:
replicas: 1
selector:
matchLabels:
app: nfs
template:
metadata:
labels:
app: nfs
spec:
containers:
- name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
- name: nfs
containerPort: 2049
- name: mountd
containerPort: 20048
- name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
- mountPath: /exports
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: nfs-data
What does exportfs refer to? How can I diagnose this further?
Within the nfs pod, not too sure why it's exporting /?
[root#nfs-685944f556-r2pjr /]# cat /etc/exports
/exports *(rw,fsid=0,insecure,no_root_squash)
/ *(rw,fsid=0,insecure,no_root_squash)
not too sure why it's exporting /
It is done by run_nfs.sh script, running with two arguments:
/bin/bash /usr/local/bin/run_nfs.sh /exports /
There's an issue with the image gcr.io/google_containers/volume-nfs, so it is suggested to use jsafrane/nfs-data image instead
See the corresponding github discussion
I get
All host(s) tried for query failed (tried: 10.244.0.72/10.244.0.72:9042 (com.datastax.driver.core.exceptions.TransportException: [10.244.0.72/10.244.0.72:9042] Channel has been closed))
when trying to access Cassandra within the same namespace. Although when I forward ports it works ok from localhost. keyspace is created successfully.
kubectl port-forward cassandra1-0 9042:9042
My yaml
apiVersion: v1
kind: Service
metadata:
name: cassandra1
labels:
app: cassandra1
spec:
ports:
- name: "cql"
protocol: "TCP"
port: 9042
targetPort: 9042
- name: "thrift"
protocol: "TCP"
port: 9160
targetPort: 9160
selector:
app: cassandra1
type: NodePort
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra1
labels:
app: cassandra1
spec:
serviceName: cassandra1
replicas: 1
selector:
matchLabels:
app: cassandra1
template:
metadata:
labels:
app: cassandra1
spec:
terminationGracePeriodSeconds: 1800
containers:
- name: cassandra1
image: gcr.io/google-samples/cassandra:v13
imagePullPolicy: Always
ports:
- containerPort: 7000
name: intra-node
- containerPort: 7001
name: tls-intra-node
- containerPort: 7199
name: jmx
- containerPort: 9042
name: cql
- containerPort: 9160
name: thrift
resources:
limits:
cpu: "500m"
memory: 1Gi
requests:
cpu: "500m"
memory: 1Gi
securityContext:
capabilities:
add:
- IPC_LOCK
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- nodetool drain
env:
- name: MAX_HEAP_SIZE
value: 512M
- name: HEAP_NEWSIZE
value: 100M
- name: CASSANDRA_SEEDS
value: "cassandra1-0.cassandra1.default.svc.cluster.local"
- name: CASSANDRA_CLUSTER_NAME
value: "cassandra1"
- name: CASSANDRA_DC
value: "DC1-cassandra1"
- name: CASSANDRA_RACK
value: "Rack1-cassandra1"
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
readinessProbe:
exec:
command:
- /bin/bash
- -c
- /ready-probe.sh
initialDelaySeconds: 15
timeoutSeconds: 5
# These volume mounts are persistent. They are like inline claims,
# but not exactly because the names need to match exactly one of
# the stateful pod volumes.
volumeMounts:
- name: cassandra1-data
mountPath: /cassandra1_data
volumeClaimTemplates:
- metadata:
name: cassandra1-data
namespace: default
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
Cassandra starts with following properties:
Starting Cassandra on 10.244.0.72
CASSANDRA_CONF_DIR /etc/cassandra
CASSANDRA_CFG /etc/cassandra/cassandra.yaml
CASSANDRA_AUTO_BOOTSTRAP true
CASSANDRA_BROADCAST_ADDRESS 10.244.0.72
CASSANDRA_BROADCAST_RPC_ADDRESS 10.244.0.72
CASSANDRA_CLUSTER_NAME cassandra1
CASSANDRA_COMPACTION_THROUGHPUT_MB_PER_SEC
CASSANDRA_CONCURRENT_COMPACTORS
CASSANDRA_CONCURRENT_READS
CASSANDRA_CONCURRENT_WRITES
CASSANDRA_COUNTER_CACHE_SIZE_IN_MB
CASSANDRA_DC DC1-cassandra1
CASSANDRA_DISK_OPTIMIZATION_STRATEGY ssd
CASSANDRA_ENDPOINT_SNITCH SimpleSnitch
CASSANDRA_GC_WARN_THRESHOLD_IN_MS
CASSANDRA_INTERNODE_COMPRESSION
CASSANDRA_KEY_CACHE_SIZE_IN_MB
CASSANDRA_LISTEN_ADDRESS 10.244.0.72
CASSANDRA_LISTEN_INTERFACE
CASSANDRA_MEMTABLE_ALLOCATION_TYPE
CASSANDRA_MEMTABLE_CLEANUP_THRESHOLD
CASSANDRA_MEMTABLE_FLUSH_WRITERS
CASSANDRA_MIGRATION_WAIT 1
CASSANDRA_NUM_TOKENS 32
CASSANDRA_RACK Rack1-cassandra1
CASSANDRA_RING_DELAY 30000
CASSANDRA_RPC_ADDRESS 0.0.0.0
CASSANDRA_RPC_INTERFACE
CASSANDRA_SEEDS cassandra1-0.cassandra1.default.svc.cluster.local
CASSANDRA_SEED_PROVIDER org.apache.cassandra.locator.SimpleSeedProvider
changed ownership of '/cassandra_data/data' from root to cassandra
changed ownership of '/cassandra_data' from root to cassandra
In my application that runs in the same namespace i tried setting cassandraport to 9042 and host to:
10.240.0.4 (hostIP)
10.244.0.72 (podIP)
cassandra1 (name of the service)
cassandra1.default
cassandra1.default.svc.cluster.local
cassandra1-0.cassandra1.default.svc.cluster.local
_cql._tcp.cassandra1.default.svc.cluster.local
I also tried different types of a service:
headless, ClusterIP, NodePort
Does anybody has ANY ideas what is wrong or what else can i try to get this to work?
I have an installer that spins up two pods in my CI flow, let's call them web and activemq. When the web pod starts it tries to communicate with the activemq pod using the k8s assigned amq-deployment-0.activemq pod name.
Randomly, the web will get an unknown host exception when trying to access amq-deployment1.activemq. If I restart the web pod in this situation the web pod will have no problem communicating with the activemq pod.
I've logged into the web pod when this happens and the /etc/resolv.conf and /etc/hosts files look fine. The host machines /etc/resolve.conf and /etc/hosts are sparse with nothing that looks questionable.
Information:
There is only 1 worker node.
kubectl --version
Kubernetes v1.8.3+icp+ee
Any ideas on how to go about debugging this issue. I can't think of a good reason for it to happen randomly nor resolve itself on a pod restart.
If there is other useful information needed, I can get it. Thank in advance
For activeMQ we do have this service file
apiVersion: v1 kind: Service
metadata:
name: activemq
labels:
app: myapp
env: dev
spec:
ports:
- port: 8161
protocol: TCP
targetPort: 8161
name: http
- port: 61616
protocol: TCP
targetPort: 61616
name: amq
selector:
component: analytics-amq
app: myapp
environment: dev
type: fa-core
clusterIP: None
And this ActiveMQ stateful set (this is the template)
kind: StatefulSet
apiVersion: apps/v1beta1
metadata:
name: pa-amq-deployment
spec:
replicas: {{ activemqs }}
updateStrategy:
type: RollingUpdate
serviceName: "activemq"
template:
metadata:
labels:
component: analytics-amq
app: myapp
environment: dev
type: fa-core
spec:
containers:
- name: pa-amq
image: default/myco/activemq:latest
imagePullPolicy: Always
resources:
limits:
cpu: 150m
memory: 1Gi
livenessProbe:
exec:
command:
- /etc/init.d/activemq
- status
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 16
ports:
- containerPort: 8161
protocol: TCP
name: http
- containerPort: 61616
protocol: TCP
name: amq
envFrom:
- configMapRef:
name: pa-activemq-conf-all
- secretRef:
name: pa-activemq-secret
volumeMounts:
- name: timezone
mountPath: /etc/localtime
volumes:
- name: timezone
hostPath:
path: /usr/share/zoneinfo/UTC
The Web stateful set:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: pa-web-deployment
spec:
replicas: 1
updateStrategy:
type: RollingUpdate
serviceName: "pa-web"
template:
metadata:
labels:
component: analytics-web
app: myapp
environment: dev
type: fa-core
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: component
operator: In
values:
- analytics-web
topologyKey: kubernetes.io/hostname
containers:
- name: pa-web
image: default/myco/web:latest
imagePullPolicy: Always
resources:
limits:
cpu: 1
memory: 2Gi
readinessProbe:
httpGet:
path: /versions
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 76
livenessProbe:
httpGet:
path: /versions
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 80
securityContext:
privileged: true
ports:
- containerPort: 8080
name: http
protocol: TCP
envFrom:
- configMapRef:
name: pa-web-conf-all
- secretRef:
name: pa-web-secret
volumeMounts:
- name: shared-volume
mountPath: /MySharedPath
- name: timezone
mountPath: /etc/localtime
volumes:
- nfs:
server: 10.100.10.23
path: /MySharedPath
name: shared-volume
- name: timezone
hostPath:
path: /usr/share/zoneinfo/UTC
This web pod also has a similar "unknown host" problem finding an external database we have configured. The issue being resolved similarly by restarting the pod. Here is the configuration of that external service. Maybe it is easier to tackle the problem from this angle? ActiveMQ has no problem using the database service name to find the DB and startup.
apiVersion: v1
kind: Service
metadata:
name: dbhost
labels:
app: myapp
env: dev
spec:
type: ExternalName
externalName: mydb.host.com
Is it possible that it is a question of which pod, and the app in its container, is started up first and which second?
In any case, connecting using a Service and not the pod name would be recommended as the pod's name assigned by Kubernetes changes between pod restarts.
A way to test connectivity, is to use telnet (or curl for the protocols it supports), if found in the image:
telnet <host/pod/Service> <port>
Not able to find a solution, I created a workaround. I set up the entrypoint.sh in my image to lookup the domain I need to access and write to the log, exiting on error:
#!/bin/bash
#disable echo and exit on error
set +ex
#####################################
# verfiy that the db service can be found or exit container
#####################################
# we do not want to install nslookup to determine if the db_host_name is valid name
# we have ping available though
# 0-success, 1-error pinging but lookup worked (services can not be pinged), 2-unreachable host
ping -W 2 -c 1 ${db_host_name} &> /dev/null
if [ $? -le 1 ]
then
echo "service ${db_host_name} is known"
else
echo "${db_host_name} service is NOT recognized. Exiting container..."
exit 1
fi
Next since only a pod restart fixed the issue. In my ansible deploy, I do a rollout check, querying the log to see if I need to do a pod restart. For example:
rollout-check.yml
- name: "Rollout status for {{rollout_item.statefulset}}"
shell: timeout 4m kubectl rollout status -n {{fa_namespace}} -f {{ rollout_item.statefulset }}
ignore_errors: yes
# assuming that the first pod will be the one that would have an issue
- name: "Get {{rollout_item.pod_name}} log to check for issue with dns lookup"
shell: kubectl logs {{rollout_item.pod_name}} --tail=1 -n {{fa_namespace}}
register: log_line
# the entrypoint will write dbhost service is NOT recognized. Exiting container... to the log
# if there is a problem getting to the dbhost
- name: "Try removing {{rollout_item.component}} pod if unable to deploy"
shell: kubectl delete pods -l component={{rollout_item.component}} --force --grace-period=0 --ignore-not-found=true -n {{fa_namespace}}
when: log_line.stdout.find('service is NOT recognized') > 0
I repeat this rollout check 6 times as sometimes even after a pod restart the service cannot be found. The additional checks are instant once the pod is successfully up.
- name: "Web rollout"
include_tasks: rollout-check.yml
loop:
- { c: 1, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 2, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 3, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 4, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 5, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
- { c: 6, statefulset: "{{ dest_deploy }}/web.statefulset.yml", pod_name: "pa-web-deployment-0", component: "analytics-web" }
loop_control:
loop_var: rollout_item
is there a way to deploy the docker image to our Kubernetes Cluster?
I have been trying to add it with the below yaml file.
but when I run status it says the environment is not setup.
What I basically tried to do is to convert the docker run command into a kubectl deployment file:
docker run -d -it --privileged=true --net=host --name=Db2wh -v /mnt/clusterfs:/mnt/bludata0 -v /mnt/clusterfs:/mnt/blumeta0 store/ibmcorp/db2wh_ee:v2.10.0-db2wh-ppcle
Can you please help me?
#testreplicaset.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
run: db2whce
name: db2whce
spec:
# modify replicas according to your case
replicas: 1
selector:
matchLabels:
app: db2whce
template:
metadata:
labels:
app: db2whce
spec:
containers:
- name: db2whce
image: store/ibmcorp/db2wh_ee:v2.10.0-db2wh-linux
ports:
- containerPort: 8443
- containerPort: 389
- containerPort: 50022
- containerPort: 50001
- containerPort: 50000
- containerPort: 9929
- containerPort: 9300
- containerPort: 8998
- containerPort: 5000
- containerPort: 22
args:
- "--privileged=true"
- "--net=host"
- "--name=Db2wh"
- "-v /mnt/clusterfs:/mnt/bludata0"
- "-v /mnt/clusterfs:/mnt/blumeta0"
resources:
requests:
cpu: 3
memory: 14Gi
volumeMounts:
- mountPath: /mnt/bludata0
name: db2wh-pvc
- mountPath: /mnt/clusterfs
name: db2wh-pvc
volumes:
- name: db2wh-pvc
persistentVolumeClaim:
claimName: db2wh-pvc
imagePullSecrets:
- name: dockerstore