Command confluent local services start gives an error : Starting ZooKeeper Error: ZooKeeper failed to start - confluent-platform

I'm trying to run this command : confluent local services start
I don't know why each time it gives me an error before passing to the next step. So I had to run it again over and over until it passes all the steps.
what is the reason for the error and how to solve the problem?

You need to open the log files to inspect any errors that would be happening.
But, it's possible the services are having a race condition. Schema Registry requires Kafka, REST Proxy and Connect require the Schema Registry... Maybe they are not waiting for the previous components to start.
Or maybe your machine does not have enough resources to start all services. E.g. I believe at least 6GB of RAM are necessary. If you have 8GB on the machine, and Chrome and lots of other services are running, for example, then you wouldn't have 6GB readily available.

Related

Confluent Kafka services (local) do not start properly on wsl2 and seems to timeout communicating their status

I am seeing various different issues while trying to start Kafka services on wsl2. Details/symptoms below:
Confluent Kafka (7.0.0) platform
wsl2 - ubuntu 20.04LTS
When I use the command:
confluent local services start
Typically the system will take a long time and then exit with service failed (e.g. zookeeper, as that is the first service to start).
If I check the logs, it is actually started. So I again type the command and sure enough it immediately says zookeeper up, then proceed to try start kafka, which again after a min will say failed to start (but it really has started).
I suspect after starting the service (which is quite fast), system is not able to communicate back/exit and thus times out, I am not sure where the logs related to this are.
Can see this in the screenshot below
This means to start the whole stack (zookeeper/kafka/schema-registry/kafka-rest/kafka-connect/etc), takes forever, and in between I start getting other errors (sometimes, schema-registry is not able to find the cluster id, sometimes its a log file related error), which means I need to destroy and start again.
I have tried this over a couple of days and cant get this to work. Is confluent kafka that unstable on windows or I am missing some config change.
In terms of setup, I have not done any change in the config and am using the default config/ports.

How to send logs from Google Stackdriver to Kafka

I see many docs and posts about how to send logs to Stackdriver but almost no information about how to do the opposite - send logs from the Stackdriver to Kafka.
In my case, our Ops want to collect the logs from our web servers using Google's stackdriver agents and pushing them to stackdriver ... However, for my stream processing needs I want to get the logs into Kafka to use it's unparalleled abilities to retain and reprocess data by any number of consumers, something that I cannot do with PubSub.
So, what are the options for doing this? I only saw a couple of possible avenues - neither sounds too good:
based on this post: (https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76) push data into PubSub first, and then read from it using either Kafka connector or write my own Kafka consumer. I hate the thought of adding yet another hop (serialize/deserialize/ack/etc.) between the source of data and Kafka ....
I noticed a brief mentioning in passing on adding a plugin to Google's version of Fluentd (which is what stackdriver log collection agent is based on) here: https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76 . Not many details - so hard to tell how involved this approach is ...
Any other options?
Thank you!
Enter in to the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < . Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times.
You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH.
***Image1
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < a couple of times to see the output. Your output will look like this
*** image2
The Kafka plugin is deprecated. For more information, refer to https://cloud.google.com/stackdriver/docs/deprecations
Note: This functionality is only available for agents running on Linux. It is not available on Windows.
Kafka is monitored via JMX. Monitoring supports monitoring Kafka version 0.8.2 and higher.
On your VM instance, download kafka-082.conf from the GitHub configuration repository and place it in the directory /etc/stackdriver/collectd.d/:
(cd /etc/stackdriver/collectd.d/ && sudo curl -O https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/kafka-082.conf)
The downloaded plugin configuration file assumes that your Kafka server is configured to accept JMX connections on port 9999. If you have configured Kafka with a different JMX port, as root, edit the file and follow the instructions to change the JMX port settings.
After adding the configuration file, restart the Monitoring agent by running the following command:
sudo service stackdriver-agent restart
What is monitored:
https://cloud.google.com/monitoring/api/metrics_agent#agent-kafka

How to run MirrorMaker 2.0 in production?

From the documentation, I see MirrorMaker 2.0 like this on the command line -
./bin/connect-mirror-maker.sh mm2.properties
In my case I can go to an EC2 instance and enter this command.
But what is the correct practice if this needs to run for many days. It is possible that EC2 instance can get terminated etc.
Therefore, I am trying to find out what are best practices if you need to run MirrorMaker 2.0 for a long period and want to make sure that it is able to stay up and running without any manual intervention.
You have many options, these include:
Add it as a service to systemd. Then you can specify that it should be started automatically and restarted on failure. systemd is very common now, but if you're not running systemd, there are many other process managers. https://superuser.com/a/687188/80826.
Running in a Docker container, where you can specify a restart policy.

AWS EC2 free tier instance is automatically stopping frequently

I am using ubuntu 18.04 on AWS EC2 instace free tier, running websites on apache server, NodeJS with PostgreSQL database. All deployments are done perfectly and webapps works fine without any exception or error details.
However I am facing an annoying issue: this instance is stopping frequently without any exception or error logs. After rebooting instance everything starts working fine but after some time it automatically stops either in few hrs. on same day when rebooted instance or in 1-2 days after that.
I created another free tier instance with seperate account and that is also giving same issue. I am not finding any logs or troubleshoot option to get rid of this problem.
I would like to know how it can be troubleshooted or where can i find logs of any errors or exception for this isntance?
Suggestion given by AWS in "Instance Status Checl" as attached below are not practicle solution to apply evertime.
Something with your VM itself is causing its health checks to fail.
Have a look at syslogs, and your application logs. Also take a look at CloudWatch metrics to see if any metrics have dramatic change close to time.
You can also add a CloudWatch alarm with a recovery action to automatically reboot if there’s an issue with your VM.

Why would running a container on GCE get stuck Metadata request unsuccessful forbidden (403)

I'm trying to run a container in a custom VM on Google Compute Engine. This is to perform a heavy ETL process so I need a large machine but only for a couple of hours a month. I have two versions of my container with small startup changes. Both versions were built and pushed to the same google container registry by the same computer using the same Google login. The older one works fine but the newer one fails by getting stuck in an endless list of the following error:
E0927 09:10:13 7f5be3fff700 api_server.cc:184 Metadata request unsuccessful: Server responded with 'Forbidden' (403): Transport endpoint is not connected
Can anyone tell me exactly what's going on here? Can anyone please explain why one of my images doesn't have this problem (well it gives a few of these messages but gets past them) and the other does have this problem (thousands of this message and taking over 24 hours before I killed it).
If I ssh in to a GCE instance then both versions of the container pull and run just fine. I'm suspecting the INTEGRITY_RULE checking from the logs but I know nothing about how that works.
MORE INFO: this is down to "restart policy: never". Even a simple Centos:7 container that says "hello world" deployed from the console triggers this if the restart policy is never. At least in the short term I can fix this in the entrypoint script as the instance will be destroyed when the monitor realises that the process has finished
I suggest you try creating a 3rd container that's focused on the metadata service functionality to isolate the issue. It may be that there's a timing difference between the 2 containers that's not being overcome.
Make sure you can ‘curl’ the metadata service from the VM and that the request to the metadata service is using the VM's service account.