kafka-connect on ECS timeouts when connecting to MSK - amazon-ecs

i deployed the kafka-connect docker image(confluentinc/cp-kafka-connect-base:6.0.1) into ECS / fargate, assigned a security group to my ECS service that permits both incoming zooper keeper and kafka bootstrap server traffic (both plain text and TLS) as well as an IAM role that permits my ECS tasks to run kafka actions against the MSK cluster but still the connect cluster is timing out when its trying get the list of brokers from MSK cluster.
Both kafka connect ECS service and MSK cluster are on the same private subnets in AWS.
Security group code
"Type": "AWS::EC2::SecurityGroup",
"Properties": {
"GroupDescription": "kakfa-connect-sg",
"SecurityGroupEgress": [
{
"CidrIp": "0.0.0.0/0",
"Description": "Allow all outbound traffic by default",
"IpProtocol": "-1"
}
],
"SecurityGroupIngress": [
{
"CidrIp": "0.0.0.0/0",
"Description": "Kafka Bootstrap Server Plaintext",
"FromPort": 9092,
"IpProtocol": "tcp",
"ToPort": 9092
},
{
"CidrIp": "0.0.0.0/0",
"Description": "Kafka Bootstrap Server TLS",
"FromPort": 9094,
"IpProtocol": "tcp",
"ToPort": 9094
},
{
"CidrIp": "0.0.0.0/0",
"Description": "ZooKeeper TLS",
"FromPort": 2182,
"IpProtocol": "tcp",
"ToPort": 2182
},
{
"CidrIp": "0.0.0.0/0",
"Description": "ZooKeeper Plaintext",
"FromPort": 2181,
"IpProtocol": "tcp",
"ToPort": 2181
}
],
"VpcId": "vpc-id"
}```
IAM role code
```{
"Version": "2012-10-17",
"Statement": [
{
"Action": "kafka:*",
"Resource": "*",
"Effect": "Allow"
}
]
}```
Is there anything i might be missing?

The security group settings on the MSK cluster was not permitting traffic from my kafka connect cluster.

Related

Kafka connect schema registry timeout

EDIT: Turns out I turned on a firewall which limited connectivity from containers to host. Adding firewall rules solved the issue.
I am running a kafka JDBC sink connector with the following properties:
{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"table.name.format": "events",
"connection.password": "******",
"tasks.max": "1",
"topics": "events",
"value.converter.schema.registry.url": "http://IP:PORT",
"db.buffer.size": "8000000",
"connection.user": "postgres",
"name": "cp-sink-events",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"connection.url": "jdbc:postgresql://IP:PORT/postgres?stringtype=unspecified",
"insert.mode": "upsert",
"pk.mode": "record_value",
"pk.fields": "source,timestamp,event,event_type,value"
}
It was working fine before, but since this week I have been getting the following errors while trying to sink my data to Postgres:
Caused by: org.apache.kafka.common.errors.SerializationException: Error retrieving Avro value schema for id 4
Caused by: java.net.SocketTimeoutException: connect timed out
It appears my kafka connect cannot acces my schema registry server anymore. I coulnd't manage to figure out why or how. I have tried multiple things but yet to find the solution.
I did install NGINX on this VM over last week, and killed apache2 which was running on port 80. But I haven't found any dependencies that this would cause any problems.
When I curl the schema registry address from the VM to retrieve the schemas of the mentioned IDs it works fine (http://IP:PORT/schemas/ids/4). any clue how to proceed?
EDIT:
If I configure the IP to random value I get
Caused by: java.net.NoRouteToHostException: No route to host (Host unreachable).
So my host schema registry seems reachable when right IP is configured, I don't know where the timeout limit comes from.
Tried to set timeout limit but didn't work:
SCHEMA_REGISTRY_KAFKASTORE_TIMEOUT_MS: 10000
My compose connect config is set as such:
CONNECT_BOOTSTRAP_SERVERS: kafka0:29092
CONNECT_GROUP_ID: compose-connect-group
CONNECT_CONFIG_STORAGE_TOPIC: _connect_configs
CONNECT_CONFIG_STORAGE_REPLICATION_FACTOR: 1
CONNECT_OFFSET_STORAGE_TOPIC: _connect_offset
CONNECT_OFFSET_STORAGE_REPLICATION_FACTOR: 1
CONNECT_STATUS_STORAGE_TOPIC: _connect_status
CONNECT_STATUS_STORAGE_REPLICATION_FACTOR: 1
CONNECT_KEY_CONVERTER: org.apache.kafka.connect.storage.StringConverter
CONNECT_KEY_CONVERTER_SCHEMA_REGISTRY_URL: http://schemaregistry0:8085
CONNECT_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_VALUE_CONVERTER_SCHEMA_REGISTRY_URL: http://schemaregistry0:8085
CONNECT_INTERNAL_KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_INTERNAL_VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_REST_ADVERTISED_HOST_NAME: kafka-connect0
CONNECT_PLUGIN_PATH: /usr/share/java,/usr/share/confluent-hub-components,/usr/share/local-connectors
Docker Kafka network:
[
{
"Name": "kafka_default",
"Id": "89cd2fe68f2ea3923a76ada4dcb89e505c18792e1abe50fa7ad047e10ee6b673",
"Created": "2023-01-16T18:42:35.531539648+01:00",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.25.0.0/16",
"Gateway": "172.25.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"41ac45882364494a357c26e14f8e3b2aede4ace7eaab3dea748c9a5f94430529": {
"Name": "kafka-schemaregistry1-1",
"EndpointID": "56612fbe41396799a8249620dc07b0a5c84c65d311959214b955f538225757ac",
"MacAddress": "02:42:ac:19:00:06",
"IPv4Address": "172.25.0.6/16",
"IPv6Address": ""
},
"42c0847ffb7545d35b2d4116fb5c590a869aec87037601d33267a35e7fe0cb2f": {
"Name": "kafka-kafka-connect0-1",
"EndpointID": "68eef87346aed70bc17ab9960daca4b24073961dcd93bc85c8f7bcdb714feac3",
"MacAddress": "02:42:ac:19:00:08",
"IPv4Address": "172.25.0.8/16",
"IPv6Address": ""
},
"46160f183ba8727fde7b4a7d1770b8d747ed596b8e6b7ca7fea28b39c81dcf7f": {
"Name": "kafka-zookeeper0-1",
"EndpointID": "512970666d1c07a632e0f450bef7ceb6aa3281ca648545ef22de4041fe32a845",
"MacAddress": "02:42:ac:19:00:03",
"IPv4Address": "172.25.0.3/16",
"IPv6Address": ""
},
"6804e9d36647971afe95f5882e7651e39ff8f76a9537c9c6183337fe6379ced9": {
"Name": "kafka-ui",
"EndpointID": "9e9a2a7a04644803703f9c8166d80253258ffba621a5990f3c1efca1112a33a6",
"MacAddress": "02:42:ac:19:00:09",
"IPv4Address": "172.25.0.9/16",
"IPv6Address": ""
},
"8b79e3af68df7d405567c896858a863fecf7f2b32d23138fa065327114b7ce83": {
"Name": "kafka-zookeeper1-1",
"EndpointID": "d5055748e626f1e00066642a7ef60b6606c5a11a4210d0df156ce532fab4e753",
"MacAddress": "02:42:ac:19:00:02",
"IPv4Address": "172.25.0.2/16",
"IPv6Address": ""
},
"92a09c7d3dfb684051660e84793b5328216bf5da4e0ce075d5918c55b9d4034b": {
"Name": "kafka-kafka0-1",
"EndpointID": "cbeba237d1f1c752fd9e4875c8694bdd4d85789bcde4d6d3590f4ef95bb82c6f",
"MacAddress": "02:42:ac:19:00:05",
"IPv4Address": "172.25.0.5/16",
"IPv6Address": ""
},
"e8c5aeef7a1a4a2be6ede2e5436a211d87cbe57ca1d8c506d1905d74171c4f6b": {
"Name": "kafka-kafka1-1",
"EndpointID": "e310477b655cfc60c846035896a62d32c0d07533bceea2c7ab3d17385fe9507b",
"MacAddress": "02:42:ac:19:00:04",
"IPv4Address": "172.25.0.4/16",
"IPv6Address": ""
},
"ecebbd73e861ed4e2ef8e476fa16d95b0983aaa0876a51b0d292b503ef5e9e54": {
"Name": "kafka-schemaregistry0-1",
"EndpointID": "844136d5def798c3837db4256b51c7995011f37576f81d4929087d53a2da7273",
"MacAddress": "02:42:ac:19:00:07",
"IPv4Address": "172.25.0.7/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "default",
"com.docker.compose.project": "kafka",
"com.docker.compose.version": "2.12.2"
}
}
]
curl http://kafka-schemaregistry0-1:8085/schemas/ids/5
Curling the container works from inside the connect container.
EDIT 5:
After changing the URL to the docker container name, I now have access to schema registry.
"value.converter.schema.registry.url": "http://kafka-schemaregistry0-1:8085"
However; now my postgres connection fails.
Caused by: org.postgresql.util.PSQLException: The connection attempt failed
I think the conclusion here would be that my connect container was previously able to access containers via the IP of the host machine, and that it now is not able to anymore. I am curious to know how this can be fixed.

Security group inbound rules wiped out and added 0.0.0.0/0 by eksclusterrole automatically

I have deployed AWS EKS 2 node cluster(Version 1.18). It contains some ELBs, microservices and a UI hosted on Kubernetes. ELB's have their own security group. I modify primary replica's security group's inbound rules manually to enable access to db for specific IP's(ex:117.123.111.99/32) on port 27017. However, I have noticed that after couple of days an inbound rule automatically gets added for port 27017 - 0.0.0.0/0 and custom ICMP-IPv4 0.0.0.0/0 for all 3 mongo replica LoadBalancer security groups.
when i see the logs in 'CloudTrail' it is saying that eksclusterrole
"type": "Role",
"principalId": “blablabla”,
"arn": "arn:aws:iam::MyAccountId:role/eksclusterrole",
"accountId": "MyAccountId",
"userName": "eksclusterrole"
},
"webIdFederationData": {},
"attributes": {
"mfaAuthenticated": "false",
"creationDate": “date”
}
},
"invokedBy": "eks.amazonaws.com"
},
"eventTime": "date",
"eventSource": "ec2.amazonaws.com",
"eventName": "AuthorizeSecurityGroupIngress",
"awsRegion": "us-east-2”,
"sourceIPAddress": "eks.amazonaws.com",
"userAgent": "eks.amazonaws.com",
"requestParameters": {
"groupId": "sg-mysecurityid,
"ipPermissions": {
"items": [
{
"ipProtocol": "icmp",
"fromPort": 3,
"toPort": 4,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "0.0.0.0/0"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
},
{
"ipProtocol": "tcp",
"fromPort": 27017,
"toPort": 27017,
"groups": {},
"ipRanges": {
"items": [
{
"cidrIp": "0.0.0.0/0"
}
]
},
"ipv6Ranges": {},
"prefixListIds": {}
}
]
}
}
From de Docs
Amazon EKS adds one inbound rule to the node's security group for client traffic and one rule for each load balancer subnet in the VPC for health checks for each Network Load Balancer that you create
Can be disable Docs
service.beta.kubernetes.io/aws-load-balancer-manage-backend-security-group-rules specifies whether the controller should automatically add the ingress rules to the instance/ENI security group.

Access pod localhost from Service

New to Kubernetes.
I have a private dockerhub image deployed on a Kubernetes instance. When I exec into the pod I can run the following so I know my docker image is running:
root#private-reg:/# curl 127.0.0.1:8085
Hello world!root#private-reg:/#
From the dashboard I can see my service has an external endpoint which ends with port 8085. When I try to load this I get 404. My service YAML is as below:
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "test",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/services/test",
"uid": "a1a2ae23-339b-11e9-a3db-ae0f8069b739",
"resourceVersion": "3297377",
"creationTimestamp": "2019-02-18T16:38:33Z",
"labels": {
"k8s-app": "test"
}
},
"spec": {
"ports": [
{
"name": "tcp-8085-8085-7vzsb",
"protocol": "TCP",
"port": 8085,
"targetPort": 8085,
"nodePort": 31859
}
],
"selector": {
"k8s-app": "test"
},
"clusterIP": "******",
"type": "LoadBalancer",
"sessionAffinity": "None",
"externalTrafficPolicy": "Cluster"
},
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "******"
}
]
}
}
}
Can anyone point me in the right direction.
What is the output from the below command
curl cluzterIP:8085
If you get Hello world message then it means that the service is routing the traffic Correctly to the backend pod.
curl HostIP:NODEPORT should also be working
Most likely that service is not bound to the backend pod. Did you define the below label on the pod?
labels: {
"k8s-app": "test"
}
You didn't mention what type of load balancer or cloud provider you are using but if your load balancer provisioned correctly which you should be able to see in your kube-controller-manager logs, then you should be able to access your service with what you see here:
"status": {
"loadBalancer": {
"ingress": [
{
"ip": "******"
}
]
}
Then you could check by running:
$ curl <ip>:<whatever external port your lb is fronting>
It's likely that this didn't provision if as described in other answers this works:
$ curl <clusterIP for svc>:8085
and
$ curl <NodeIP>:31859 # NodePort
Give a check on services on kuberntes, there are a few types:
https://kubernetes.io/docs/concepts/services-networking/service/
ClusterIP: creates access to service only inside the cluster.
NodePort: Access service through a given port on the nodes.
LoadBalancer: service externally acessible through a LB.
I am assuming you are running on GKE.
What kind of service is it, the one launched?

Kubernetes 1.5.2 ping can't find ClusterIP of a Service from inside Cluster

I have setup a Cluster with some deployments and services.
I can log into any of my pods and ping the pods from their pod network ips ( 172.x.x.x ) and they are successful.
But when I try to ping the services ClusterIP addresses from any of my pods they never respond, so I can't access my services.
Below is my Kibana deployment, and 10.254.77.135 is the IP I am trying to connect to from my other services, I also can't use this node port, it never responds
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "kibana",
"namespace": "default",
"selfLink": "/api/v1/namespaces/default/services/kibana",
"uid": "21498caf-569c-11e7-a801-0050568fc023",
"resourceVersion": "3282683",
"creationTimestamp": "2017-06-21T16:10:23Z",
"labels": {
"component": "elk",
"role": "kibana"
}
},
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 5601,
"targetPort": 5601,
"nodePort": 31671
}
],
"selector": {
"k8s-app": "kibana"
},
"clusterIP": "10.254.77.135",
"type": "NodePort",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
}
Not sure if this is your problem, but apparently ping doesn't work on services ClusterIP addresses because they are virtual addresses created by iptables rules that just redirect packets to the endpoints(pods).

How could a spring-boot application determine if it is running on cloud foundry?

I'm writting a micro service with spring-boot. The db is mongodb. The service works perfectly in my local environment. But after I deployed it to the cloud foundry it doesn't work. The reason is connecting mongodb time out.
I think the root cause is the application doesn't know it is running on cloud. Because it still connecting 127.0.0.1:27017, but not the redirected port.
How could it know it is running on cloud? Thank you!
EDIT:
There is a mongodb instance bound to the service. And when I checked the environment information, I got following info:
{
"VCAP_SERVICES": {
"mongodb": [
{
"credentials": {
"hostname": "10.11.241.1",
"ports": {
"27017/tcp": "43417",
"28017/tcp": "43135"
},
"port": "43417",
"username": "xxxxxxxxxx",
"password": "xxxxxxxxxx",
"dbname": "gwkp7glhw9tq9cwp",
"uri": "xxxxxxxxxx"
},
"syslog_drain_url": null,
"volume_mounts": [],
"label": "mongodb",
"provider": null,
"plan": "v3.0-container",
"name": "mongodb-business-configuration",
"tags": [
"mongodb",
"document"
]
}
]
}
}
{
"VCAP_APPLICATION": {
"cf_api": "xxxxxxxxxx",
"limits": {
"fds": 16384,
"mem": 1024,
"disk": 1024
},
"application_name": "mock-service",
"application_uris": [
"xxxxxxxxxx"
],
"name": "mock-service",
"space_name": "xxxxxxxxxx",
"space_id": "xxxxxxxxxx",
"uris": [
"xxxxxxxxxx"
],
"users": null,
"application_id": "xxxxxxxxxx",
"version": "c7569d23-f3ee-49d0-9875-8e595ee76522",
"application_version": "c7569d23-f3ee-49d0-9875-8e595ee76522"
}
}
From my understanding, I think my spring-boot service should try to connect the port 43417 but not 27017, right? Thank you!
Finally I found the reason is I didn't specify the profile. After adding following code in my manifest.yml it works:
env:
SPRING_PROFILES_ACTIVE: cloud