How to publish binary Protobuf to Google Cloud Pub/Sub with gcloud CLI tool? - command-line

It is possible to publish messages to Google Cloud Pub/Sub on (Unix/MacOS) command line with gcloud pubsub topics publish, at least strings.
Is there a way to publish binary Protobuf? The command seems to take the data as a command line argument (e.g. --message="My message"), but I haven't found a way to pass binary Protobuf content.

Unfortunately there is not a way to publish binary protobuf in gcloud.
I would recommend using the Pub/Sub python client library's quickstart as the quickest set-up to publish messages: https://cloud.google.com/pubsub/docs/publish-receive-messages-client-library

Related

Kafka Sink to Data Lake Storage without Confluent

I am trying to find options for Open Source Kafka writing directly to Azure Data Lake storage Gen2 . It seems I have few options and mostly circling around Confluent like below :
Use Confluent Cloud with Apache Kafka - Need to Subscribe with Confluent and pay charges (Confluent Cloud with ADLS
Use Azure VM with Confluent Hub and Install Confluent Platform
At present I am not wiling to pay Confluent licensing and not want to test with confluent package (more and more wrappers and hoops around)
Any option to use Open source Kafka directly to write data to ADLS Gen2 ? If yes how can we achieve this any useful information to share ?
Firstly, Kafka Connect is Apache2 licensed product and an open-platform consisting of plugins; Confluent Platform/Cloud is not a requirement to use it. You can download the Azure connector as a ZIP file and install it like any other
However, it is at Confluent's (or any developer) discretion to provide a paid license agreement for their software and any support, and there might otherwise be a limited trial period where you can use the plugin for some time.
That being said, you do not "need" Confluent Platform, and there are no "hoops" to using it if you did because it only adds extras to Apache Kafka+Zookeeper, it is not its own thing (you can use your existing Kafka installation with the other Confluent products)
Regarding other open-source things. StackOverflow is not the place for software recommendations or seeking tools/libraries. You can use Spark/Flink/Nifi, though, I'm sure to reimplement a similar pipeline as Kafka Connect, or you can write your own Kafka Connector based on the open-source kafka-connect-storage-cloud project that is used as a base for S3, GCS, and Azure, AFAIK.
There is Apache Camel Connectors, which has an Azure Datalake connector for sending and receiving data. (sink and source) Check this out: https://camel.apache.org/camel-kafka-connector/latest/connectors/camel-azure-storage-datalake-kafka-sink-connector.html
This is a free solution that doesnt require Confluent licenses or technologies to be used.

How to connect Kafka to Thingsboard Platform

I want to activate the Kafka Spark pipeline for the Thingsboard platform (community edition).
As per the mentioned Stack question "Couldn't able to find plugins in ThingsBoard 2.0.3 Home screen"
It's been said that we can do it via Rule chains itself since the plugin section has been removed, but I am not able to understand how to configure it using rule chains. I am not able to get the complete documentation to configure Kafka via rule chains. So need help on that.
I figured it out. By using this link it can be done easily "https://thingsboard.io/docs/samples/analytics/kafka-streams/"
The thing is that using the Thingsboard CE we can get data into Kafka-topic. However, to fetch data from Kafka you will need to have TB Professional Edition integration.
The alternate option to Thingsboard PE is to write your own REST API script to push the insights back to ThingsBoard.

How to Integrate Rest API Source Connector with Kafka Connect?

I have Confluent 5.0 on my local machine and trying to reading data from Rest API using Rest API Source Connect which is not part of confluent. till now i have used confluent inbuilt connectors only. Rest API source connect is open source and available on github https://github.com/llofberg/kafka-connect-rest
I have downloaded this connector from github and got stuck here.
Can anybody tell me the process to integrate this connector with confluent or how can i use this to pull the data from Rest API?
Disclaimer: There is no single answer to add an external Kafka Connect plugin; Confluent provides the Kafka Connect Maven plugin, but that doesn't mean people use it or even Maven to package their code.
If it is not on the Confluent Hub, then you'll have to build it by hand.
1) Clone the repo, and build it (install Git and Maven first)
git clone https://github.com/llofberg/kafka-connect-rest && cd kafka-connect-rest
mvn clean package
2) Create a directory for it on all Connect workers, similar to the other Connectors of Confluent Platform
mkdir $CONFLUENT_HOME/share/java/kafka-connect-rest
3) Find each of the shaded JARs (this connector happens to make multiple JARs, I don't know why...)
find . -iname "*shaded.jar" -type f
./kafka-connect-transform-from-json/kafka-connect-transform-from-json-plugin/target/kafka-connect-transform-from-json-plugin-1.0-SNAPSHOT-shaded.jar
./kafka-connect-transform-add-headers/target/kafka-connect-transform-add-headers-1.0-SNAPSHOT-shaded.jar
./kafka-connect-transform-velocity-eval/target/kafka-connect-transform-velocity-eval-1.0-SNAPSHOT-shaded.jar
./kafka-connect-rest-plugin/target/kafka-connect-rest-plugin-1.0-SNAPSHOT-shaded.jar
4) Copy each of these files into the $CONFLUENT_HOME/share/java/kafka-connect-rest folder created in step 2 for each Connect worker
5) Make sure your plugin.path of the connect-*.properties file points at the full path to $CONFLUENT_HOME/share/java
At this point, you've done all the steps that are listed in the README to build the thing and setup the plugin path, just not in Docker.
6) Start Connect (Distributed)
7) Hit GET /connector-plugins to verify the thing loaded.
8) Configure and send JSON payload to POST /connectors
I have not used this connector before, so I do not know how to configure it. Maybe see the examples or follow along with #rmoff's blog post before the KSQL stuff

Google Cloud Sdk from DataProc Cluster

What is the right way to use/install python google cloud apis such as pub-sub from a google-dataproc cluster? For example if im using zeppelin/pyspark on the cluster and i want to use the pub-sub api, how should i prepare it?
It is unclear to me what is installed and what is not installed during default cluster provisioning and if/how I should try to install python libraries for google cloud apis.
I realise additionally there may be scopes/authentication to setup.
To be clear, I can use the apis locally but I am not sure what is the cleanest way to make the apis accessible from the cluster and I dont want to perform any unnecessary steps.
In general, at the moment, you need to bring your own client libraries for the various Google APIs unless using the Google Cloud Storage connector or BigQuery connector from Java or via RDD methods in PySpark which automatically delegate into the Java implementations.
For authentication, you should simply use --scopes https://www.googleapis.com/auth/pubsub and/or --scopes https://www.googleapis.com/auth/cloud-platform and the service account on the Dataproc cluster's VMs will be able to authenticate to use PubSub via the default installed credentials flow.

is it possible to Use Kafka with Google cloud Dataflow

i have two question
1) I want to use Kafka with Google cloud Dataflow Pipeline program. in my pipeline program I want to read data from kafka is it possible?
2) I created Instance with BigQuery enabled now i want to enable Pubsub how can i do ?
(1) Ad mentioned by Raghu, support for writing to/reading from Kafka was added to Apache Beam in mid-2016 with the KafkaIO package. You can check the package's documentation[1] to see how to use it.
(2) I'm not quite sure what you mean. Can you provide more details?
[1] https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html
Kafka support was added to Dataflow (and Apache Beam) in mid 2016. You can read and write to Kafka streaming pipelines. See JavaDoc for KafkaIO in Apache Beam.
(2) As of April 27, 2015, you can enable Cloud Pub/Sub API as follows:
Go to your project page on the Developer Console
Click APIs & auth -> APIs
Click More within Google Cloud APIs
Click Cloud Pub/Sub API
Click Enable API