AWS MSK. Kafka Connect - plugin class loading - apache-kafka

I am using Kafka Connect in MSK.
I have defined a plugin that points to a zip file in s3 - this works fine.
I have implemented SMT and uploaded the SMT jar into the same bucket and folder as the zip file of the plugin.
I define a new connector and this time I add the SMT using
transforms
I get an error message that the Class com.x.y.z.MySMT could not be found.
I verified that the jar is valid and contains the SMT.
Where should I put the SMT jar in order to make Kafka connect loading it?

Pushing the SMT jar into the zip (under /lib) solved the class not found issue.

Related

Message transforms for the MirrorMaker 2.0

I am running a dedicated MirrorMaker cluster and want to perform my SMT transformation for the records. Could you advice where should I put the jar with my code, i. e. where should I define plugin.path property?
where should I define plugin.path property?
The worker property file when you start either connect-mirror-maker or connect-distributed
where should I put the jar
You need to make a subfolder under a directory listed in plugin.path, then put JARs there.

I want to overcome some limitations I faced while using Kafka Spooldir Connector

I have set up Kafka's spooldir connector on a unix machine and it seems to work well. I would like to know if there are a few things that can be done with spooldir
I want to create multiple directories inside the spooldir-scanning
file path, create files of the provided format inside them and scan
those too. How do i accomplish this?
I do not want the source files to move to different directories after
completion/error. I tried providing the same path for source, target
and error but the connector would not accept the value. Any way
around these?

Where to keep the zipped file of kafka connect s3-source connector

I have downloaded the s3-source connector zip file as given in the confluent web page. I am not getting where to place the extracted files. I am getting the following error
Please guide me. To load the connector, Iam using this command -
confluent local load s3-source -- -d /etc/kafka-connect-s3/confluentinc-kafka-connect-s3-source-1.3.2/etc/quickstart-s3-source.properties
I am not getting where to place the extracted files
If you used confluent-hub install, it would put them in the correct place for you.
Otherwise , you can put them whereever, if you update plugin.path in the Connect properties to include the parent directory of the JARs for the connector
Extract the zip file whether its a source connector or sink connector and place the whole folder with all the jars inside of it under: plugin.path which you have set

How to run mongodb connector in Kafka?

I have downloaded a binary Kafka's files then Jar file of conector. Then specified the path to connector jar and configured config file with connection to mongo.
After I have laucnhed zookkepper, run Kafka and created topic. How to lauch connector source?
There are connect-standalone and connect-distributed scripts in the Kafka bin folder for running Kafka Connect.
You'll also need to edit the respective properties files to make sure the Mongo connector plugin gets loaded

Google Spread Sheet Spark library

I am using https://github.com/potix2/spark-google-spreadsheets library for reading the spread sheet file in spark. It is working perfectly in my local.
val df = sqlContext.read.
format("com.github.potix2.spark.google.spreadsheets").
option("serviceAccountId", "xxxxxx#developer.gserviceaccount.com").
option("credentialPath", "/path/to/credentail.p12").
load("<spreadsheetId>/worksheet1")
I created a new assembly jar with included all the credentials and use that jar for reading the file. But I am facing issue with reading the credentialPath file. I tried using
getClass.getResourceAsStream("/resources/Aircraft/allAircraft.txt")
But library only supports absolute path. Please help me to resolve this issue.
You can use --files argument of spark-submit or SparkContext.addFile() to distribute a credential file. If you want to get a local path of the credential file in worker node, you should call SparkFiles.get("credential filename").
import org.apache.spark.SparkFiles
// you can also use `spark-submit --files=credential.p12`
sqlContext.sparkContext.addFile("credential.p12")
val credentialPath = SparkFiles.get("credential.p12")
val df = sqlContext.read.
format("com.github.potix2.spark.google.spreadsheets").
option("serviceAccountId", "xxxxxx#developer.gserviceaccount.com").
option("credentialPath", credentialPath).
load("<spreadsheetId>/worksheet1")
Use SBT and try typesafe config library.
Here is a simple but complete sample which reads some information from the config file placed in resources folder.
Then you can assemble a jar file using sbt-assembly plugin.
If you're working in the Databricks environment, you can upload the credentials file.
Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable, as described here, does not get you around this requirement because it's a link to the file path, not the actual credentials. See here for more details about getting the right credentials and using the library.