I am using Kafka Connect in MSK.
I have defined a plugin that points to a zip file in s3 - this works fine.
I have implemented SMT and uploaded the SMT jar into the same bucket and folder as the zip file of the plugin.
I define a new connector and this time I add the SMT using
transforms
I get an error message that the Class com.x.y.z.MySMT could not be found.
I verified that the jar is valid and contains the SMT.
Where should I put the SMT jar in order to make Kafka connect loading it?
Pushing the SMT jar into the zip (under /lib) solved the class not found issue.
Related
I am running a dedicated MirrorMaker cluster and want to perform my SMT transformation for the records. Could you advice where should I put the jar with my code, i. e. where should I define plugin.path property?
where should I define plugin.path property?
The worker property file when you start either connect-mirror-maker or connect-distributed
where should I put the jar
You need to make a subfolder under a directory listed in plugin.path, then put JARs there.
I have set up Kafka's spooldir connector on a unix machine and it seems to work well. I would like to know if there are a few things that can be done with spooldir
I want to create multiple directories inside the spooldir-scanning
file path, create files of the provided format inside them and scan
those too. How do i accomplish this?
I do not want the source files to move to different directories after
completion/error. I tried providing the same path for source, target
and error but the connector would not accept the value. Any way
around these?
I have downloaded the s3-source connector zip file as given in the confluent web page. I am not getting where to place the extracted files. I am getting the following error
Please guide me. To load the connector, Iam using this command -
confluent local load s3-source -- -d /etc/kafka-connect-s3/confluentinc-kafka-connect-s3-source-1.3.2/etc/quickstart-s3-source.properties
I am not getting where to place the extracted files
If you used confluent-hub install, it would put them in the correct place for you.
Otherwise , you can put them whereever, if you update plugin.path in the Connect properties to include the parent directory of the JARs for the connector
Extract the zip file whether its a source connector or sink connector and place the whole folder with all the jars inside of it under: plugin.path which you have set
I have downloaded a binary Kafka's files then Jar file of conector. Then specified the path to connector jar and configured config file with connection to mongo.
After I have laucnhed zookkepper, run Kafka and created topic. How to lauch connector source?
There are connect-standalone and connect-distributed scripts in the Kafka bin folder for running Kafka Connect.
You'll also need to edit the respective properties files to make sure the Mongo connector plugin gets loaded
I am using https://github.com/potix2/spark-google-spreadsheets library for reading the spread sheet file in spark. It is working perfectly in my local.
val df = sqlContext.read.
format("com.github.potix2.spark.google.spreadsheets").
option("serviceAccountId", "xxxxxx#developer.gserviceaccount.com").
option("credentialPath", "/path/to/credentail.p12").
load("<spreadsheetId>/worksheet1")
I created a new assembly jar with included all the credentials and use that jar for reading the file. But I am facing issue with reading the credentialPath file. I tried using
getClass.getResourceAsStream("/resources/Aircraft/allAircraft.txt")
But library only supports absolute path. Please help me to resolve this issue.
You can use --files argument of spark-submit or SparkContext.addFile() to distribute a credential file. If you want to get a local path of the credential file in worker node, you should call SparkFiles.get("credential filename").
import org.apache.spark.SparkFiles
// you can also use `spark-submit --files=credential.p12`
sqlContext.sparkContext.addFile("credential.p12")
val credentialPath = SparkFiles.get("credential.p12")
val df = sqlContext.read.
format("com.github.potix2.spark.google.spreadsheets").
option("serviceAccountId", "xxxxxx#developer.gserviceaccount.com").
option("credentialPath", credentialPath).
load("<spreadsheetId>/worksheet1")
Use SBT and try typesafe config library.
Here is a simple but complete sample which reads some information from the config file placed in resources folder.
Then you can assemble a jar file using sbt-assembly plugin.
If you're working in the Databricks environment, you can upload the credentials file.
Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable, as described here, does not get you around this requirement because it's a link to the file path, not the actual credentials. See here for more details about getting the right credentials and using the library.