Istio blocking connection to MySQL - kubernetes

I would like to deploy the java petstore for kubernetes. In order to achieve this I have 2 simple deployments. The first one is the java web app and the second one is a MySQL database.
When istio is disabled the connection between the app and the DB works well.
Unfortunatly when the istio sidecar is injected the communication between the two stops working.
Here is the deployment file of the web app:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jpetstoreweb
spec:
replicas: 1
template:
metadata:
labels:
app: jpetstoreweb
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: jpetstoreweb
image: wingardiumleviosa/petstore:v7
env:
- name: VERSION
value: "1"
- name: DB_URL
value: "jpetstoredb-service"
- name: DB_PORT
value: "3306"
- name: DB_NAME
value: "jpetstore"
- name: DB_USERNAME
value: "jpetstore"
- name: DB_PASSWORD
value: "foobar"
ports:
- containerPort: 9080
readinessProbe:
httpGet:
path: /
port: 9080
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: jpetstoreweb-service
spec:
selector:
app: jpetstoreweb
ports:
- port: 80
targetPort: 9080
---
And next the deployment file of the mySql database :
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jpetstoredb
spec:
replicas: 1
template:
metadata:
labels:
app: jpetstoredb
annotations:
sidecar.istio.io/inject: "true"
spec:
containers:
- name: jpetstoredb
image: wingardiumleviosa/petstoredb:v1
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
value: "foobar"
- name: MYSQL_DATABASE
value: "jpetstore"
- name: MYSQL_USER
value: "jpetstore"
- name: MYSQL_PASSWORD
value: "foobar"
---
apiVersion: v1
kind: Service
metadata:
name: jpetstoredb-service
spec:
selector:
app: jpetstoredb
ports:
- port: 3306
targetPort: 3306
Finally the error logs from the web app trying to connect to the DB :
Exception thrown by application class 'org.springframework.web.servlet.FrameworkServlet.processRequest:488'
org.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.springframework.transaction.CannotCreateTransactionException: Could not open JDBC Connection for transaction; nested exception is java.sql.SQLException: Communication link failure: java.io.EOFException, underlying cause: null ** BEGIN NESTED EXCEPTION ** java.io.EOFException STACKTRACE: java.io.EOFException at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1395) at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:1539) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:1930) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1168) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:1279) at com.mysql.jdbc.MysqlIO.sqlQuery(MysqlIO.java:1225) at com.mysql.jdbc.Connection.execSQL(Connection.java:2278) at com.mysql.jdbc.Connection.execSQL(Connection.java:2237) at com.mysql.jdbc.Connection.execSQL(Connection.java:2218) at com.mysql.jdbc.Connection.setAutoCommit(Connection.java:548) at org.apache.commons.dbcp.DelegatingConnection.setAutoCommit(DelegatingConnection.java:331) at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.setAutoCommit(PoolingDataSource.java:317) at org.springframework.jdbc.datasource.DataSourceTransactionManager.doBegin(DataSourceTransactionManager.java:221) at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:350) at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:261) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:101) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:89) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at com.sun.proxy.$Proxy28.getCategory(Unknown Source) at org.springframework.samples.jpetstore.web.spring.ViewCategoryController.handleRequest(ViewCategoryController.java:31) at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:874) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:808) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:476) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1255) at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:743) at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:440) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:182) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:93) at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56) at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201) at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:90) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:996) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1134) at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1005) at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75) at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:927) at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1023) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:417) at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:376) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:532) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:466) at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:331) at com.ibm.ws.http.channel.internal.inbound.HttpICLReadCallback.complete(HttpICLReadCallback.java:70) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:501) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:571) at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:926) at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1015) at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:232) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.lang.Thread.run(Thread.java:812) ** END NESTED EXCEPTION **
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:488)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:431)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
Extract : Could not open JDBC Connection for transaction
Additionnal info :
1) I can curl the DB from the web app container using CURL and it answers correctly.
2) I use Cilium instead of Calico
3) I installed Istio using HELM
4) Kubernetes is installed on bare metal (no cloud provider)
5) kubectl get pods -n istio-system all istio pods are running
6) kubectl get pods -n kube-system all cilium pods are running
7) Istio is injected using kubectl apply -f <(~/istio-1.0.5/bin/istioctl kube-inject -f ~/jpetstore.yaml) -n foo. If I use any other method Istio is not injecting itself in the Web pod (But works for the DB pod, god knows why)
8) The DB pod is always happy and working well
9) Logs of the istio-proxy container inside the WebApp pod : kubectl logs jpetstoreweb-84c7d8964-s642k istio-proxy -n myns
2018-12-28T03:52:30.610101Z info Version root#6f6ea1061f2b-docker.io/istio-1.0.5-c1707e45e71c75d74bf3a5dec8c7086f32f32fad-Clean
2018-12-28T03:52:30.610167Z info Proxy role: model.Proxy{ClusterID:"", Type:"sidecar", IPAddress:"10.233.72.142", ID:"jpetstoreweb-84c7d8964-s642k.myns", Domain:"myns.svc.cluster.local", Metadata:map[string]string(nil)}
2018-12-28T03:52:30.611217Z info Effective config: binaryPath: /usr/local/bin/envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system:15007
discoveryRefreshDelay: 1s
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: jpetstoreweb
zipkinAddress: zipkin.istio-system:9411
2018-12-28T03:52:30.611249Z info Monitored certs: []envoy.CertSource{envoy.CertSource{Directory:"/etc/certs/", Files:[]string{"cert-chain.pem", "key.pem", "root-cert.pem"}}}
2018-12-28T03:52:30.611829Z info Starting proxy agent
2018-12-28T03:52:30.611902Z info Received new config, resetting budget
2018-12-28T03:52:30.611912Z info Reconciling configuration (budget 10)
2018-12-28T03:52:30.611926Z info Epoch 0 starting
2018-12-28T03:52:30.613236Z info Envoy command: [-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster jpetstoreweb --service-node sidecar~10.233.72.142~jpetstoreweb-84c7d8964-s642k.myns~myns.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only]
[2018-12-28 03:52:30.630][20][info][main] external/envoy/source/server/server.cc:190] initializing epoch 0 (hot restart version=10.200.16384.256.options=capacity=16384, num_slots=8209 hash=228984379728933363 size=4882536)
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:192] statically linked extensions:
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:194] access_loggers: envoy.file_access_log,envoy.http_grpc_access_log
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:197] filters.http: envoy.buffer,envoy.cors,envoy.ext_authz,envoy.fault,envoy.filters.http.header_to_metadata,envoy.filters.http.jwt_authn,envoy.filters.http.rbac,envoy.grpc_http1_bridge,envoy.grpc_json_transcoder,envoy.grpc_web,envoy.gzip,envoy.health_check,envoy.http_dynamo_filter,envoy.ip_tagging,envoy.lua,envoy.rate_limit,envoy.router,envoy.squash,istio_authn,jwt-auth,mixer
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:200] filters.listener: envoy.listener.original_dst,envoy.listener.proxy_protocol,envoy.listener.tls_inspector
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:203] filters.network: envoy.client_ssl_auth,envoy.echo,envoy.ext_authz,envoy.filters.network.rbac,envoy.filters.network.thrift_proxy,envoy.http_connection_manager,envoy.mongo_proxy,envoy.ratelimit,envoy.redis_proxy,envoy.tcp_proxy,mixer
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:205] stat_sinks: envoy.dog_statsd,envoy.metrics_service,envoy.stat_sinks.hystrix,envoy.statsd
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:207] tracers: envoy.dynamic.ot,envoy.lightstep,envoy.zipkin
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:210] transport_sockets.downstream: alts,envoy.transport_sockets.capture,raw_buffer,tls
[2018-12-28 03:52:30.631][20][info][main] external/envoy/source/server/server.cc:213] transport_sockets.upstream: alts,envoy.transport_sockets.capture,raw_buffer,tls
[2018-12-28 03:52:30.634][20][info][config] external/envoy/source/server/configuration_impl.cc:50] loading 0 static secret(s)
[2018-12-28 03:52:30.638][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:30.638][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:30.638][20][info][config] external/envoy/source/server/configuration_impl.cc:60] loading 1 listener(s)
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:94] loading tracing configuration
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:103] loading tracing driver: envoy.zipkin
[2018-12-28 03:52:30.640][20][info][config] external/envoy/source/server/configuration_impl.cc:116] loading stats sink configuration
[2018-12-28 03:52:30.640][20][info][main] external/envoy/source/server/server.cc:432] starting main dispatch loop
[2018-12-28 03:52:32.010][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:32.011][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:34.691][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:240] gRPC config stream closed: 14, no healthy upstream
[2018-12-28 03:52:34.691][20][warning][upstream] external/envoy/source/common/config/grpc_mux_impl.cc:41] Unable to establish new stream
[2018-12-28 03:52:38.483][20][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:130] cm init: initializing cds
[2018-12-28 03:53:01.596][20][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:494] add/update cluster outbound|443||kubernetes.default.svc.cluster.local during init
...
[2018-12-28T04:09:09.561Z] - 115 1548 6 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40318 10.233.72.142:9080 10.233.72.1:43098
[2018-12-28T04:09:14.555Z] - 115 1548 8 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40350 10.233.72.142:9080 10.233.72.1:43130
[2018-12-28T04:09:19.556Z] - 115 1548 5 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40364 10.233.72.142:9080 10.233.72.1:43144
[2018-12-28T04:09:24.558Z] - 115 1548 6 "127.0.0.1:9080" inbound|80||jpetstoreweb-service.myns.svc.cluster.local 127.0.0.1:40378 10.233.72.142:9080 10.233.72.1:43158
10) Using Istio 1.0.5 and kubernetes 1.13.0
All idears are welcome ;-)
Thx

So there really is an issue with Istio 1.0.5 and the JDBC of MySQL
The temporary solution is to delete the mesh ressource in the following way :
kubectl delete meshpolicies.authentication.istio.io default
As stated here and referencing this.
(FYI : I deleted the ressource BEFORE deploying my petstore app.)
As of Istio 1.1.1 there is more data on this problem in the FAQ

Related

Is EFS a good logs backup option if Loki pod terminated accidentally in EKS Fargate

I am currently using Loki to store logs generated by my applications from EKS Fargate. Sidecar pattern with promtail is used to scrape logs. Single Loki pod is used and S3 is configured as a destination to store logs. It works nicely as expected. However, when I tested the availability of the logging system by deleting pods, I discovered that if Loki’s pod was deleted, some logs would be missing (range around 20 mins before the pod was deleted to the time the pod was deleted) even after the pod restarted.
To solve this problem, I tried to use EFS as the persistent volume of Loki’ pod, mounting the path /loki. The whole process is followed by this article (https://aws.amazon.com/blogs/aws/new-aws-fargate-for-amazon-eks-now-supports-amazon-efs/). But I have got an error from the Loki pod with msg "error running loki" err="mkdir /loki/compactor: permission denied”
Therefore, I have 2 questions in my mind:
Should I use EFS as a solution for log backup in my case?
Why did I get a permission denied inside the pod, any ways to solve this problem?
My Loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
# grpc_listen_port: 9096
ingester:
wal:
enabled: true
dir: /loki/wal
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
# final_sleep: 0s
chunk_idle_period: 3m
chunk_retain_period: 30s
max_transfer_retries: 0
chunk_target_size: 1048576
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
aws:
bucketnames: bucketnames
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
access_key_id: access_key_id
secret_access_key: secret_access_key
sse_encryption: true
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 48h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 96h
querier:
query_ingesters_within: 0
analytics:
reporting_enabled: false
Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: fargate-api-dev
name: dev-loki
spec:
selector:
matchLabels:
app: dev-loki
template:
metadata:
labels:
app: dev-loki
spec:
volumes:
- name: loki-config
configMap:
name: dev-loki-config
- name: dev-loki-efs-pv
persistentVolumeClaim:
claimName: dev-loki-efs-pvc
containers:
- name: loki
image: loki:2.6.1
args:
- -print-config-stderr=true
- -config.file=/tmp/loki.yaml
resources:
limits:
memory: "500Mi"
cpu: "200m"
ports:
- containerPort: 3100
volumeMounts:
- name: dev-loki-config
mountPath: /tmp
readOnly: false
- name: dev-loki-efs-pv
mountPath: /loki
Promtail-config.yaml
server:
log_level: info
http_listen_port: 9080
clients:
- url: http://loki.com/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: api-log
static_configs:
- targets:
- localhost
labels:
job: apilogs
pod: ${POD_NAME}
__path__: /var/log/*.log
I had a similar issue using EFS as volume to store the logs and I found this solution https://github.com/grafana/loki/issues/2018#issuecomment-1030221498
Basically loki container by it's own is not able to create a directory to start working, so we used a initcotainer to do it for it.
This solution worked like a charm for.

Kubernetes : WARN datanode.DataNode: Problem connecting to server: namenode:9001

I am a novice to Kubernetes and I currently trying to deploy hadoop in kubernetes. I follow that docker-compose to deploy h https://github.com/big-data-europe/docker-hadoop/blob/master/docker-compose-v3.yml Below there is my yaml for the namenode :
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-namenode
labels:
traefik.docker.network: hbase-namenode
traefik.port: "9870"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
template:
metadata:
labels:
app: hadoop-namenode
traefik.docker.network: hbase-namenode
traefik.port: "9870"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-namenode
image: bde2020/hadoop-namenode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9870
env:
- name: CLUSTER_NAME
value: test
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: data-namenode
mountPath: /hadoop/dfs/name
volumes:
- name: data-namenode
persistentVolumeClaim:
claimName: namenode-pvc
I create the pv and pvc for my pod and I attach my service on the pod.
apiVersion: v1
kind: Service
metadata:
name: service-namenode
namespace: hdfs
spec:
selector:
traefik.docker.network: hbase-namenode
type: NodePort
ports:
- name: namenode
protocol: TCP
port: 9870
targetPort: 9870
When I deploy my pod hadoop-namenode, he was up and running correctly on my cluster. I had those logs :
2021-12-14 15:43:25,893 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-12-14 15:43:26,044 INFO namenode.NameNode: createNameNode []
2021-12-14 15:43:26,202 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-12-14 15:43:26,367 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-12-14 15:43:26,368 INFO impl.MetricsSystemImpl: NameNode metrics system started
2021-12-14 15:43:26,422 INFO namenode.NameNodeUtils: fs.defaultFS is hdfs://namenode:9001
2021-12-14 15:43:26,423 INFO namenode.NameNode: Clients should use namenode:9001 to access this namenode/service.
2021-12-14 15:43:26,634 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-14 15:43:26,676 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://0.0.0.0:9870
And finally when i setup my datanode and service using this file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hadoop-datanode
labels:
traefik.docker.network: hbase-datanode
traefik.port: "9864"
namespace: hdfs
spec:
selector:
matchLabels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
template:
metadata:
labels:
app: hadoop-datanode
traefik.docker.network: hbase-datanode
traefik.port: "9864"
spec:
nodeSelector:
kubernetes.io/hostname: data-mesh-hdfs
containers:
- name: hadoop-datanode
image: bde2020/hadoop-datanode:2.0.0-hadoop3.2.1-java8
ports:
- containerPort: 9864
env:
- name: SERVICE_PRECONDITION
value: "10.47.0.1:9870"
envFrom:
- secretRef:
name: hadoop-secret
volumeMounts:
- name: datanode
mountPath: /hadoop/dfs/data
volumes:
- name: datanode
persistentVolumeClaim:
claimName: datanode-pvc
All my pod up and running but I got this error on my datanode.
kubectl get pod -n hdfs
NAME READY STATUS RESTARTS AGE
hadoop-datanode-7f9bdb4f54-4hh6t 1/1 Running 0 2d21h
hadoop-namenode-6bddbc7b6-h8bqk 1/1 Running 0 2d22h
2021-12-17 14:46:57,697 INFO server.AbstractConnector: Started ServerConnector#4b5c304e{HTTP/1.1,[http/1.1]}{localhost:41327}
2021-12-17 14:46:57,698 INFO server.Server: Started #2486ms
2021-12-17 14:46:57,943 INFO web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864
2021-12-17 14:46:57,952 INFO util.JvmPauseMonitor: Starting JVM pause monitor
2021-12-17 14:46:57,958 INFO datanode.DataNode: dnUserName = root
2021-12-17 14:46:57,958 INFO datanode.DataNode: supergroup = supergroup
2021-12-17 14:46:58,026 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false.
2021-12-17 14:46:58,043 INFO ipc.Server: Starting Socket Reader #1 for port 9867
2021-12-17 14:46:58,310 INFO datanode.DataNode: Opened IPC server at /0.0.0.0:9867
2021-12-17 14:46:58,332 INFO datanode.DataNode: Refresh request received for nameservices: null
2021-12-17 14:46:58,359 WARN hdfs.DFSUtilClient: Namenode for null remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
2021-12-17 14:46:58,363 INFO datanode.DataNode: Starting BPOfferServices for nameservices: <default>
2021-12-17 14:46:58,389 INFO datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to namenode:9001 starting to offer service
2021-12-17 14:46:58,408 INFO ipc.Server: IPC Server Responder: starting
2021-12-17 14:46:58,409 INFO ipc.Server: IPC Server listener on 9867: starting
2021-12-17 14:46:58,480 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:03,481 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:08,483 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:13,484 WARN datanode.DataNode: Problem connecting to server: namenode:9001
2021-12-17 14:47:18,486 WARN datanode.DataNode: Problem connecting to server: namenode:9001
I use secret ressource to setup my environment variable for datanode and namenode which give that result for my datanode at the start :
Configuring core
- Setting hadoop.proxyuser.hue.hosts=*
- Setting fs.defaultFS=hdfs://namenode:9001
- Setting hadoop.http.staticuser.user=root
- Setting io.compression.codecs=org.apache.hadoop.io.compress.SnappyCodec
- Setting hadoop.proxyuser.hue.groups=*
Configuring hdfs
- Setting dfs.datanode.data.dir=file:///hadoop/dfs/data
- Setting dfs.namenode.datanode.registration.ip-hostname-check=false
- Setting dfs.webhdfs.enabled=true
- Setting dfs.permissions.enabled=false
Configuring yarn
- Setting yarn.timeline-service.enabled=true
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-vcores=4
- Setting yarn.resourcemanager.system-metrics-publisher.enabled=true
- Setting yarn.resourcemanager.store.class=org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
- Setting yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=98.5
- Setting yarn.log.server.url=http://historyserver:8188/applicationhistory/logs/
- Setting yarn.resourcemanager.fs.state-store.uri=/rmstate
- Setting yarn.timeline-service.generic-application-history.enabled=true
- Setting yarn.log-aggregation-enable=true
- Setting yarn.resourcemanager.hostname=resourcemanager
- Setting yarn.scheduler.capacity.root.default.maximum-allocation-mb=8192
- Setting yarn.nodemanager.aux-services=mapreduce_shuffle
- Setting yarn.resourcemanager.resource_tracker.address=resourcemanager:8031
- Setting yarn.timeline-service.hostname=historyserver
- Setting yarn.resourcemanager.scheduler.address=resourcemanager:8030
- Setting yarn.resourcemanager.address=resourcemanager:8032
- Setting mapred.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec
- Setting yarn.nodemanager.remote-app-log-dir=/app-logs
- Setting yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
- Setting mapreduce.map.output.compress=true
- Setting yarn.nodemanager.resource.memory-mb=16384
- Setting yarn.resourcemanager.recovery.enabled=true
- Setting yarn.nodemanager.resource.cpu-vcores=8
Configuring httpfs
Configuring kms
Configuring mapred
- Setting mapreduce.map.java.opts=-Xmx3072m
- Setting mapreduce.reduce.memory.mb=8192
- Setting mapreduce.reduce.java.opts=-Xmx6144m
- Setting yarn.app.mapreduce.am.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.map.memory.mb=4096
- Setting mapred.child.java.opts=-Xmx4096m
- Setting mapreduce.reduce.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
- Setting mapreduce.framework.name=yarn
- Setting mapreduce.map.env=HADOOP_MAPRED_HOME=/opt/hadoop-3.2.1/
Configuring for multihomed network
[1/100] 10.47.0.1:9001 is available.
2021-12-17 14:46:55,883 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = hadoop-datanode-7f9bdb4f54-4mb8r/10.47.0.6
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.2.1
If I look at my open port on my namenode, I see this :
netstat -lpten
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:9001 0.0.0.0:* LISTEN 0 13617204 361/java
tcp 0 0 0.0.0.0:9870 0.0.0.0:* LISTEN 0 13617098 361/java
On my file core-site.xml on datanode, I have the data below :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Same result on the core-site.xml file of namenode :
<configuration>
<property><name>hadoop.proxyuser.hue.hosts</name><value>*</value></property>
<property><name>fs.defaultFS</name><value>hdfs://namenode:9001</value></property>
<property><name>hadoop.http.staticuser.user</name><value>root</value></property>
<property><name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.SnappyCodec</value></property>
<property><name>hadoop.proxyuser.hue.groups</name><value>*</value></property>
</configuration>
Can anyone please help me on identifying the issue ?

Microk8s Ingress returns 502

I'm new at Kubernetes and trying to do a simple project to connect MySQL and PhpMyAdmin using Kubernetes on my Ubuntu 20.04. I created the components needed and here is the components.
mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql-deployment
labels:
app: mysql
spec:
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql
ports:
- containerPort: 3306
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-root-password
- name: MYSQL_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-username
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-password
- name: MYSQL_DATABASE
valueFrom:
configMapKeyRef:
name: mysql-configmap
key: mysql-database
---
apiVersion: v1
kind: Service
metadata:
name: mysql-service
spec:
selector:
app: mysql
ports:
- protocol: TCP
port: 3306
targetPort: 3306
phpmyadmin.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: phpmyadmin
labels:
app: phpmyadmin
spec:
replicas: 1
selector:
matchLabels:
app: phpmyadmin
template:
metadata:
labels:
app: phpmyadmin
spec:
containers:
- name: phpmyadmin
image: phpmyadmin
ports:
- containerPort: 3000
env:
- name: PMA_HOST
valueFrom:
configMapKeyRef:
name: mysql-configmap
key: database_url
- name: PMA_PORT
value: "3306"
- name: PMA_USER
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-username
- name: PMA_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: mysql-user-password
---
apiVersion: v1
kind: Service
metadata:
name: phpmyadmin-service
spec:
selector:
app: phpmyadmin
ports:
- protocol: TCP
port: 8080
targetPort: 3000
ingress-service.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-service
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
defaultBackend:
service:
name: phpmyadmin-service
port:
number: 8080
rules:
- host: test.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: phpmyadmin-service
port:
number: 8080
when I execute microk8s kubectl get ingress ingress-service, the output is:
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-service public test.com 127.0.0.1 80 45s
and when I tried to access test.com, that's when I got 502 error.
My kubectl version:
Client Version: v1.22.2-3+9ad9ee77396805
Server Version: v1.22.2-3+9ad9ee77396805
My microk8s' client and server version:
Client:
Version: v1.5.2
Revision: 36cc874494a56a253cd181a1a685b44b58a2e34a
Go version: go1.15.15
Server:
Version: v1.5.2
Revision: 36cc874494a56a253cd181a1a685b44b58a2e34a
UUID: b2bf55ad-6942-4824-99c8-c56e1dee5949
As for my microk8s' own version, I followed the installation instructions from here, so it should be 1.21/stable. (Couldn't find the way to check the exact version from the internet, if someone know how, please tell me how)
mysql.yaml logs:
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 8.0.26-1debian10 started.
2021-10-14 07:05:38+00:00 [Note] [Entrypoint]: Initializing database files
2021-10-14T07:05:38.960693Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.26) initializing of server in progress as process 41
2021-10-14T07:05:38.967970Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:39.531763Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:40.591862Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:40.592247Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:40.670594Z 6 [Warning] [MY-010453] [Server] root#localhost is created with an empty password ! Please consider switching off the --initialize-insecure option.
2021-10-14 07:05:45+00:00 [Note] [Entrypoint]: Database files initialized
2021-10-14 07:05:45+00:00 [Note] [Entrypoint]: Starting temporary server
2021-10-14T07:05:45.362827Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 90
2021-10-14T07:05:45.486702Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:45.845971Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:46.022043Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:46.022189Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:46.023446Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-10-14T07:05:46.023728Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2021-10-14T07:05:46.026088Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2021-10-14T07:05:46.044967Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Socket: /var/run/mysqld/mysqlx.sock
2021-10-14T07:05:46.045036Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 0 MySQL Community Server - GPL.
2021-10-14 07:05:46+00:00 [Note] [Entrypoint]: Temporary server started.
Warning: Unable to load '/usr/share/zoneinfo/iso3166.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/leap-seconds.list' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone.tab' as time zone. Skipping it.
Warning: Unable to load '/usr/share/zoneinfo/zone1970.tab' as time zone. Skipping it.
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Creating database testing-database
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Creating user testinguser
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Giving user testinguser access to schema testing-database
2021-10-14 07:05:48+00:00 [Note] [Entrypoint]: Stopping temporary server
2021-10-14T07:05:48.422053Z 13 [System] [MY-013172] [Server] Received SHUTDOWN from user root. Shutting down mysqld (Version: 8.0.26).
2021-10-14T07:05:50.543822Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.26) MySQL Community Server - GPL.
2021-10-14 07:05:51+00:00 [Note] [Entrypoint]: Temporary server stopped
2021-10-14 07:05:51+00:00 [Note] [Entrypoint]: MySQL init process done. Ready for start up.
2021-10-14T07:05:51.711889Z 0 [System] [MY-010116] [Server] /usr/sbin/mysqld (mysqld 8.0.26) starting as process 1
2021-10-14T07:05:51.725302Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-14T07:05:51.959356Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-14T07:05:52.162432Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1 is enabled for channel mysql_main
2021-10-14T07:05:52.162568Z 0 [Warning] [MY-013746] [Server] A deprecated TLS version TLSv1.1 is enabled for channel mysql_main
2021-10-14T07:05:52.163400Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-10-14T07:05:52.163556Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2021-10-14T07:05:52.165840Z 0 [Warning] [MY-011810] [Server] Insecure configuration for --pid-file: Location '/var/run/mysqld' in the path is accessible to all OS users. Consider choosing a different directory.
2021-10-14T07:05:52.181516Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /var/run/mysqld/mysqlx.sock
2021-10-14T07:05:52.181562Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.26' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
phpmyadmin.yaml logs:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.114.139. Set the 'ServerName' directive globally to suppress this message
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.1.114.139. Set the 'ServerName' directive globally to suppress this message
[Thu Oct 14 03:57:32.653011 2021] [mpm_prefork:notice] [pid 1] AH00163: Apache/2.4.51 (Debian) PHP/7.4.24 configured -- resuming normal operations
[Thu Oct 14 03:57:32.653240 2021] [core:notice] [pid 1] AH00094: Command line: 'apache2 -D FOREGROUND'
Here is also my Allocatable on describe nodes command:
Allocatable:
cpu: 4
ephemeral-storage: 113289380Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 5904508Ki
pods: 110
and the Allocated resources:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 550m (13%) 200m (5%)
memory 270Mi (4%) 370Mi (6%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Any help? Thanks in advance.
Turns out it is a fudged up mistake of mine, where I specify the phpmyadmin's container port to be 3000, while the default image port opens at 80. After changing the containerPort and phpmyadmin-service's targetPort to 80, it opens the phpmyadmin's page.
So sorry for kkopczak and AndD for the fuss and also big thanks for trying to help! :)

Fabric v2.0 in kubernetes (minikube) - error Peer channel join - TLS issue because of pod's names

I am trying to set up the Fabric v2.0 test-network (https://hyperledger-fabric.readthedocs.io/en/release-2.0/test_network.html) on kubernetes (locally on minikube). I have an error whith peer channel join.
I created kubernetes files based on the docker-compose-test-net.yaml of the test-network. I successfully deploy the following pods:
an orderer (raft)
2 peers (peer0-org1-example-com and peer0-org2-example-com)
a fabric-tools pod.
I successfully generate the crypto material with cryptogen and configtxgen.
I successfully create the channel:
when I am in the fabric-tools pod:
bash-5.0# peer channel create -o orderer-example-com:7050 -c $CHANNEL_NAME --ordererTLSHostnameOverride orderer.example.com -f /fabric/${CHANNEL_NAME}.tx --tls --cafile $ORDERER_CA
2020-02-11 08:10:14.057 CET [channelCmd] InitCmdFactory -> INFO 001 Endorser and orderer connections initialized
2020-02-11 08:10:14.080 CET [cli.common] readBlock -> INFO 002 Expect block, but got status: &{NOT_FOUND}
...
2020-02-11 08:10:15.105 CET [cli.common] readBlock -> INFO 00c Received block: 0
But when I try for the first peer to join the channel, I have an error. I have been spending days on this, and I cannot find a solution. Your help would be much appreciated!!
in the fabric-tools pod:
bash-5.0# peer channel join -b $CHANNEL_NAME.block
Error: error getting endorser client for channel: endorser client failed to connect to peer0-org1-example-com:7051: failed to create new connection: context deadline exceeded
what I see in the peer0-org1-example-com pod logs:
[31m2020-02-11 08:11:29.945 CET [core.comm] ServerHandshake -> ERRO 1b9[0m TLS handshake failed with error remote error: tls: bad certificate server=PeerServer remoteaddress=172.17.0.6:43270
[36m2020-02-11 08:11:29.945 CET [grpc] handleRawConn -> DEBU 1ba[0m grpc: Server.Serve failed to complete security handshake from "172.17.0.6:43270": remote error: tls: bad certificate
Thank you!!
UPDATE:
If I run peer channel join directly on the peer0-org1-example-com pod, I can see that there is a certificate issue:
addrConn.createTransport failed to connect to {peer0-org1-example-com:7051 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for peer0.org1.example.com, peer0, localhost, peer0.org1.example.com, peer0, localhost, peer0.org1.example.com, peer0, localhost, not peer0-org1-example-com". Reconnecting.
It seems that it would accept the connection for peer0.org1.example.com but not for peer0-org1-example-com. But in Kubernetes, it does not allow me to put dots in the names of the services and the deployments, that is why I put dashes. Do you know how to solve this?
I tried to make the cryptogen tool generate certificates for peer0-org1-example-com, but it messed things up. The better would be, I think, to make kubernetes names with dots, but I can't seem to make it.
The names in peer deployments files:
apiVersion: apps/v1
kind: Deployment
metadata:
name: peer0-org1-example-com
spec:
selector:
matchLabels:
name: peer0-org1-example-com
replicas: 1
template:
metadata:
labels:
name: peer0-org1-example-com
The names in peer services files:
apiVersion: v1
kind: Service
metadata:
name: peer0-org1-example-com
labels:
run: peer0-org1-example-com
spec:
type: ClusterIP
selector:
name: peer0-org1-example-com
ports:
- protocol: TCP
port: 7051
name: grpc
We had a similar dot/dash certificate issue with OpenShift and solved it by setting a CommonName with dashes for each Host in our crypto-config file. Maybe this will work for you too.
Something like this:
PeerOrgs:
- Name: Org1
Domain: org1-example-com
EnableNodeOUs: true
Specs:
- Hostname: peer0
CommonName: "peer0-org1-example-com"
- Hostname: peer1
CommonName: "peer1-org1-example-com"
CA:
Hostname: ca
CommonName: "ca-org1-example-com"
PeerOrgs:
- Name: Org2
Domain: org2-example-com
EnableNodeOUs: true
Specs:
- Hostname: peer0
CommonName: "peer0-org2-example-com"
- Hostname: peer1
CommonName: "peer1-org2-example-com"
CA:
Hostname: ca
CommonName: "ca-org2-example-com"
OrdererOrgs:
- Name: Orderer
Domain: example.com
EnableNodeOUs: true
Specs:
- Hostname: orderer
CommonName: "orderer-example-com"
UPDATE:
We also changed all dot addresses in the configtx.yaml like this:
Orderer: &OrdererDefaults
...
EtcdRaft:
Consenters:
- Host: orderer-example-com
...
Addresses:
- orderer-example-com:7050
UPDATE 2:
probably you have to change the csr part in the fabric-ca-server-config.yaml of each org too:
csr:
cn: ca-example-com
names:
- C: US
ST: "New York"
L: "New York"
O: example-com
OU:
hosts:
- localhost
- example-com
ca:
expiry: 131400h
pathlength: 1
csr:
cn: ca-org1-example-com
names:
- C: US
ST: "North Carolina"
L: "Durham"
O: org1-example-com
OU:
hosts:
- localhost
- org1-example-com
ca:
expiry: 131400h
pathlength: 1
csr:
cn: ca-org2-example-com
names:
- C: UK
ST: "Hampshire"
L: "Hursley"
O: org2-example-com
OU:
hosts:
- localhost
- org2-example-com
ca:
expiry: 131400h
pathlength: 1

Cannot connect to Hive metastore from Spark application

I am trying to connect to Hive-metastore from the Spark application but each time it gets stuck on trying to connect and crash with a timeout:
INFO metastore:376 - Trying to connect to metastore with URI thrift://hive-metastore:9083
WARN metastore:444 - set_ugi() not successful, Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
The application crashes on the line where I create an external Hive table
I run Hive-metastore as well as Spark application (using Spark K8s operator) in Kubernetes cluster. I checked the accessibility of Hive-metastore service outside of the cluster using telnet (node ip: service node port) and curled the service inside of the cluster, the service seems to be assessable. What could be the reason of this error?
This is the configuration of Hive-metastore uri in Spark application
val sparkSession = SparkSession
.builder()
.config(sparkConf)
.config("hive.metastore.uris", "thrift://hive-metastore:9083")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.enableHiveSupport()
.getOrCreate()
Hive-metastore yaml configuration looks like this:
apiVersion: v1
kind: Service
metadata:
name: hive-metastore-np
spec:
selector:
app: hive-metastore
ports:
- protocol: TCP
targetPort: 9083
port: 9083
nodePort: 32083
type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hive-metastore
spec:
replicas: 1
selector:
matchLabels:
app: hive-metastore
template:
metadata:
labels:
app: hive-metastore
spec:
containers:
- name: hive-metastore
image: mozdata/docker-hive-metastore:1.2.1
imagePullPolicy: Always
env:
- name: DB_URI
value: postgresql
- name: DB_USER
value: hive
- name: DB_PASSWORD
value: hive-password
- name: CORE_CONF_fs_defaultFS
value: hdfs://hdfs-namenode:8020
ports:
- containerPort: 9083
UPDATE: When I try to curl hive-metastore:9083, the service is accessible but it returns an empty response which means there might be a problem with hive-metastore K8s definition
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: hive-metastore:9083
> Accept: */*
This error occurs when there's a discrepancy between versions of hive jars in your cluster and the hive jars that Spark uses (which is usually consistent with the Spark version you're using). You need to determine the version of hive jars used in the cluster and add these jars into your Spark image. You can then make your SparkSession use those compatible hive jars by adding the following configurations to your SparkSession:
.conf("spark.sql.hive.metastore.version", "<your hive metastore version>")
.conf("spark.sql.hive.metastore.version", "<your hive version>")
.conf("spark.sql.hive.metastore.jars", "<uri of all the correct hive jars>")