How to overcome the IllegalAccessError while start up of connector in Kafka - apache-kafka

I am writing a connector for Kafka Connect. The error I see during the start up of connector is
java.lang.IllegalAccessError: tried to access field org.apache.kafka.common.config.ConfigTransformer.DEFAULT_PATTERN from class org.apache.kafka.connect.runtime.AbstractHerder
The error seems to happen at https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/AbstractHerder.java#L449
Do I need to set this DEFAULT.PATTERN manually? Is this not set by default.
I am using the docker image confluentinc/cp-kafka:5.0.1. The version of connect-api I am using in my connector app is org.apache.kafka:connect-api:2.0.0. I am running my set up inside Kubernetes.

The issue was resolved when I changed the image to confluentinc/cp-kafka:5.0.0-2.
I already tried this option before posting the question, but the pod was in a Pending state and was refusing to start. I thought that it could have been an issue with the image. Upon doing some more research later, I came to know that sometimes Kubernetes is unable to allot enough resources and hence pods can stay in Pending state.
I tried the image confluentinc/cp-kafka:5.0.0-2 and it works fine.

Related

How to Set Max User Processes on Kubernetes

I built a docker container for my jboss application (wildfly11 in used). Container ran in AWS EKS Fargate. After running the container for several minutes, “java.lang.OutOfMemoryError: Unable to create new native thread ” occurred.
After reading this article “How to solve java.lang.OutOfMemoryError: unable to create new native thread”, I would like to change the max user processes from 1024 to 4096. However, I can't find any possible way to change it by reading the documentation in kubernetes.
I have tried those methods in this article How do I set ulimit for containers in Kubernetes? , but it seems cannot help.
I have also edited the file /etc/security/limits.conf in my Dockerfile, but the number still hasn't changed.
Anyone have an idea about this?
Thank you.

Rabbit MQ Shovel Plugin- Creating duplicate data in case of node failure

I am creating shovel plugin in rabbit mq, that is working fine with one pod, However, We are running on Kubernetes cluster with multiple pods and in case of pod restart, it is creating multiple instance of shovel on each pod independently, which is causing duplicate message replication on destination.
Detail steps are below
We are deploying rabbit mq on Kubernetes cluster using helm chart.
After that we are creating shovel using Rabbit MQ Management UI. Once we are creating it from UI, shovels are working fine and not replicating data multiple time on destination.
When any pod get restarted, it create separate shovel instance. That start causing issue of duplicate message replication on destination from different shovel instance.
When we saw shovel status on Rabbit MQ UI then we found that, there are multiple instance of same shovel running on each pod.
When we start shovel from Rabbit MQ UI manually, then it will resolved this issue and only once instance will be visible in UI.
So problem which we concluded that, in case of pod failure/restart, shovel is not able to sync with other node/pod, if any other shovel is already running on node/pod. Since we are able to solve this issue be restarting of shovel form UI, but this not a valid approach for production.
This issue we are not getting in case of queue and exchange.
Can anyone help us here to resolve this issue.
as we lately have seen similar problems - this seems to be an issue since some 3.8. version - https://github.com/rabbitmq/rabbitmq-server/discussions/3154
it should be fixed as far as I have understood from version 3.8.20 on. see
https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.19
and
https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.8.20
and
https://github.com/rabbitmq/rabbitmq-server/releases/tag/v3.9.2
didn't have time yet to check if this is really fixed with those versions.

AWS EC2 free tier instance is automatically stopping frequently

I am using ubuntu 18.04 on AWS EC2 instace free tier, running websites on apache server, NodeJS with PostgreSQL database. All deployments are done perfectly and webapps works fine without any exception or error details.
However I am facing an annoying issue: this instance is stopping frequently without any exception or error logs. After rebooting instance everything starts working fine but after some time it automatically stops either in few hrs. on same day when rebooted instance or in 1-2 days after that.
I created another free tier instance with seperate account and that is also giving same issue. I am not finding any logs or troubleshoot option to get rid of this problem.
I would like to know how it can be troubleshooted or where can i find logs of any errors or exception for this isntance?
Suggestion given by AWS in "Instance Status Checl" as attached below are not practicle solution to apply evertime.
Something with your VM itself is causing its health checks to fail.
Have a look at syslogs, and your application logs. Also take a look at CloudWatch metrics to see if any metrics have dramatic change close to time.
You can also add a CloudWatch alarm with a recovery action to automatically reboot if there’s an issue with your VM.

Deployed jobs stopped working with an image error?

In the last few hours I am no longer able to execute deployed Data Fusion pipeline jobs - they just end in an error state almost instantly.
I can run the jobs in Preview mode, but when trying to run deployed jobs this error appears in the logs:
com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Selected software image version '1.2.65-deb9' can no longer be used to create new clusters. Please select a more recent image
I've tried with both an existing instance and a new instance, and all deployed jobs including the sample jobs give this error.
Any ideas? I cannot find any config options for what image is used for execution
We are currently investigating an issue with the image for Cloud Dataproc used by Cloud Data Fusion. We had pinned a version of Dataproc VM image for the launch that is causing an issue.
We apologize for you inconvenience. We are working to resolve the issue as soon as possible for you.
Will provide update on this thread.
Nitin

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.