Failed to send the transaction successfully to the order status: SERVICE UNAVAILABLE - apache-kafka

I am using kakfa orderer service for hyperledger fabric 1.4. while updating chaincode or making any puutState call i am getting error message stated as Failed to send the transaction successfully to the order status: SERVICE UNAVAILABLE. while checking zoopeker and kafka node it seems like kafka nodes are not able to talk to each other.
kakfa & zookeeper logs

Could you provide more info about the topology of the zookeeper-kafka cluster?
Did you use docker to deploy the zk cluster? If so, you can refer to this file.
https://github.com/whchengaa/hyperledger-fabric-technical-tutorial/blob/cross-machine-hosts-file/balance-transfer/artifacts/docker-compose-kafka.yaml
Remember to specify the IP address of the other zookeeper nodes in the hosts file which is mounted to /etc/hosts of that zookeeper node.
Make sure the port number of zookeeper nodes listed in ZOO_SERVERS environment variables are correct.

Related

Artemis k8s cluster client connection handling

I am using the Artemis Cloud operator for deploying ActiveMQ Artemis in k8s cluster. Number of replicas configured is 3. Queues are expected to be created by client applications. Operator creates a headless service and service for each pod in the cluster setup.
Whenever client connects to a pod ,it creates a queue in that broker pod.So if client connects to 3 brokers at random time, three queues gets created in pods, one in each pod respectively.So when a producer sends message to the queue, it is sent to the connected pod. It wont be present in each broker pod.(No replication of messages).
My question is what service name should client applications use inorder to connect to artemis pods and also maintain session affinity? In other words, what should be done in order to make a client connect to same broker whenever it tries a connection?(and avoid duplicate queue creation)
What I currently use is a kubernetes clusterip service I created that splits traffics to pods.And queues are created via stomp producer.
Not sure which helm it will be using in the background but you will be using the service name inside the kubernetes app for connection instead of the ClusterIP.
Service name could be starting the with helm chart name. i was checking and referring this chart for ref : https://github.com/deviceinsight/activemq-artemis-helm could be different.
Looks like it's creating the service with Cluster IP:none (Headless service) so connecting to the existing service is not working, i doubt currently it might be returning all the PODs IP.
Option : 2 if above one not work give try to this
In another case, you can create the new service type clusterIP with a different name, everything else will be the same, like port and all.
In service you can notice amqp, mqtt and other port, so you app will connect to new service with port config as per requirement. For example : active-mq-service:61613
If everything working fine for you and you are just looking for session affinity you can add this config to your service and it will start managing the session affinity.
SessionAffinity: ClientIP
If you want to make sure that connections from a particular client are
passed to the same Pod each time, you can select the session affinity
based on the client's IP addresses by setting
service.spec.sessionAffinity to "ClientIP" (the default is "None").
You can also set the maximum session sticky time by setting
service.spec.sessionAffinityConfig.clientIP.timeoutSeconds
appropriately. (the default value is 10800, which works out to be 3
hours).
Ref doc : https://kubernetes.io/docs/concepts/services-networking/service/

Kakfa brokers inside kubernets can be addresed by clients outside k8s cluster

Basically I'm builing a system on google cloud. Most services are working on k8s cluster but some codes are not. Lambda and operator of composer, dataflow job are the examples. (Composer is also k8s but different cluster)
I picked kafka as event channel to interconnect the services and I have to decide proper place of kafka broker. K8s pods or VM. I prefer k8s pods, but I worry about the communication between brokers and services, espicially with services outside of k8s cluster.
Consumer addresses broker with "bootstrap server" that is list of some broker's static unique address. I suppose if brokers are installed inside k8s, addresses of them will be not static unique from outside. Can brokers are connected from service outside of k8s? If possible, which string must be provided to bootstrap sever config?
Conventional virtual machine is the solution without any suspicion. But I want put more and more things into k8s.
There are a different solutions to your problems
You can deploy the Kafka on K8s cluster and use the service mesh to interconnect both clusters. So broker and service can connect with each other without any worry.
If you are on GCP you can use the MCS service or traffic director and other service mesh.
You can also set up Kafka on VM and expose it over the IP and further that will be used by services to connect.
Can brokers are connected from service outside of k8s?
Yes, you can expose your Kafka broker using the service type Loadblanacer or Node Port. Reference doc
I suppose if brokers are installed inside k8s, addresses of them will
be not static unique from outside.
You dont need to bind Kafka to any specific hostname for the interface, Kafka will listen to all the interfaces and you can expose it using the K8s service if running on K8s.

Kubernetes: kafka pod rechability issue from another pod

I know the below information is not enough to trace the issue but still, I want some solution.
We have Amazon EKS cluster.
Currently, we are facing the reachability of the Kafka pod issue.
Environment:
Total 10 nodes with Availability zone ap-south-1a,1b
I have a three replica of the Kafka cluster (Helm chart installation)
I have a three replica of the zookeeper (Helm chart installation)
Kafka using external advertised listener on port 19092
Kafka has service with an internal network load balancer
I have deployed a test-pod to check reachability of Kafka pod.
we are using cloud-map based DNS for advertized listener
Working:
When I run telnet command from ec2 like telnet 10.0.1.45 19092. It works as expected. IP 10.0.1.45 is a loadbalancer ip.
When I run telnet command from ec2 like telnet 10.0.1.69 31899. It works as expected. IP 10.0.1.69 is a actual node's ip and 31899 is nodeport.
Problem:
When I run same command from test-pod. like telnet 10.0.1.45 19092. It works sometime and sometime it will gives an error like telnet: Unable to connect to remote host: Connection timed out
The issue is something related to kube-proxy. we need help to resolve this issue.
Can anyone help to guide me?
Can I restart kube-proxy? Does it affect other pods/deployments?
I believe this problem is caused by AWS's NLB TCP-only nature (as mentioned in the comments).
In a nutshell, your pod-to-pod communication fails when hairpin is needed.
To confirm this is the root cause, you can verify that when the telnet works, kafka pod and client pod are not in the same EC2 node. And when they're in the same EC2 server, the telnet fails.
There are (at least) two approaches to tackle this issue:
Use K8s internal networking - Refer to k8s Service's URL
Every K8s service has its own DNS FQDN for internal usage (meaning using k8s network only, without reaching the LoadBalancer and come back to k8s again). You can just telnet this instead of the NodePort via the LB.
I.e. let's assume your kafka service is named kafka. Then you can just telnet kafka.svc.cluster.local (on the port exposed by kafka service)
Use K8s anti-affinity to make sure client and kafka are never scheduled in the same node.
Oh and as indicated in this answer you might need to make that service headless.

Flink HA JobManager cluster cannot elect a leader

I'm trying to deploy Apache Flink 1.6 on kubernetes. With following the tutorial at job manager high availabilty
page. I already have a working Zookeeper 3.10 cluster from its logs I can see that it's healthy and doesn't configured to Kerberos or SASL.All ACL rules are let's every client to write and read znodes. When I start the cluster everything works as expected every JobManager and TaskManager pods are successfully getting into Running state and I can see the connected TaskManager instances from the master JobManager's web-ui. But when I delete the master JobManager's pod, the other JobManager pod's cannot elect a leader with following error message on any JobManager-UI in the cluster.
{
"errors": [
"Service temporarily unavailable due to an ongoing leader election. Please refresh."
]
}
Even if I restart this page nothing changes. It stucks at this error message.
My suspicion is, the problem is related with high-availability.storageDir option. I already have a working (tested with CloudExplorer) minio s3 deployment to my k8s cluster. But flink cannot write anything to the s3 server. Here you can find every config from github-gist.
According to the logs it looks as if the TaskManager cannot connect to the new leader. I assume that this is the same for the web ui. The logs say that it tries to connect to flink-job-manager-0.flink-job-svc.flink.svc.cluster.local/10.244.3.166:44013. I cannot say from the logs whether flink-job-manager-1 binds to this IP. But my suspicion is that the headless service might return multiple IPs and Flink picks the wrong/old one. Could you log into the flink-job-manager-1 pod and check what its IP address is?
I think you should be able to resolve this problem by defining for each JobManager a dedicated service or if you use the pod hostname instead.

in kubernetes 1.7 with multi-master nodes and etcd cluster, how the master node connect the etcd cluster by default?

I want to know when the master nodes want to connect the etcd cluster, which etcd node will be selected?does the master node always connects the same etcd node untill it becomes unavailable?does each node in master cluster will connect the same node in etcd cluster?
The scheduler and controller-manager talk to the API server present on the same node. In a HA setup you'll have only one of them running at a time (based on a lease) and whoever is the current active will be talking to the local API server. If for some reason it fails to connect to the local API server, it doesn't renew the lease and another leader will be elected.
As described only one API server will be the leader at any given moment so that's the only place that needs to worry about reaching the etcd cluster. As for the etcd cluster itself, when you configure the kubernetes API server you pass it the etcd-servers flag, which is a list of etcd nodes like:
--etcd-servers=https://10.240.0.10:2379,https://10.240.0.11:2379,https://10.240.0.12:2379
This is then passed the Go etcd/client library which, looking at it's README states:
etcd/client does round-robin rotation on other available endpoints if the preferred endpoint isn't functioning properly. For example, if the member that etcd/client connects to is hard killed, etcd/client will fail on the first attempt with the killed member, and succeed on the second attempt with another member. If it fails to talk to all available endpoints, it will return all errors happened.
Which means that it'll try each of the available nodes until it succeeds connecting to one.