Why does Kafka Connect Sftp source directory need to be writeable? - apache-kafka

using the connector: https://docs.confluent.io/current/connect/kafka-connect-sftp/source-connector/index.html
When I config the connector and check the status I get the bellow exception...
org.apache.kafka.connect.errors.ConnectException: Directory for 'input.path' '/FOO' it not writable.\n\tat io.confluent.connect.sftp.source.SftpDirectoryPermission.directoryWritable
This makes no sense from a source stand point especially if you are connecting to a 3rd party source you do NOT control.

You need write permissions because the Connector will move the files that it read to the configurable finished.path. This movement into finished.path is explained in the link you have provided:
Once a file has been read, it is placed into the configured finished.path directory.
The documentation on the configuration input.path states that you need write access to it:
input.path - The directory where Kafka Connect reads files that are processed. This directory must exist and be writable by the user running Connect.

Related

Kafka connect - File Source

It seems connect file source is reading from the beginning of the files when the connector is restarted .
I couldn't find the equivalent configuration.
How to specify to read only the appended data ( Please note this only happens if connect is restarted , in that case, it is only reading the data that got appended) .

Creating and using a custom kafka connect configuration provider

I have installed and tested kafka connect in distributed mode, it works now and it connects to the configured sink and reads from the configured source.
That being the case, I moved to enhance my installation. The one area I think needs immediate attention is the fact that to create a connector, the only available mean is through REST calls, this means I need to send my information through the wire, unprotected.
In order to secure this, kafka introduced the new ConfigProvider seen here.
This is helpful as it allows to set properties in the server and then reference them in the rest call, like so:
{
.
.
"property":"${file:/path/to/file:nameOfThePropertyInFile}"
.
.
}
This works really well, just by adding the property file on the server and adding the following config on the distributed.properties file:
config.providers=file # multiple comma-separated provider types can be specified here
config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider
While this solution works, it really does not help to easy my concerns regarding security, as the information now passed from being sent over the wire, to now be seating on a repository, with text on plain sight for everyone to see.
The kafka team foresaw this issue and allowed clients to produce their own configuration providers implementing the interface ConfigProvider.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
and added the following entry in the distributed file:
config.providers=cust
config.providers.cust.class=com.somename.configproviders.CustConfigProvider
However I am getting an error from connect, stating that a class implementing ConfigProvider, with the name:
com.somename.configproviders.CustConfigProvider
could not be found.
I am at a loss now, because the documentation on their site is not explicit about how to configure custom config providers very well.
Has someone worked on a similar issue and could provide some insight into this? Any help would be appreciated.
I just went through these to setup a custom ConfigProvider recently. The official doc is ambiguous and confusing.
I have created my own implementation and packaged in a jar, givin it the sugested final name:
META-INF/services/org.apache.kafka.common.config.ConfigProvider
You could name the final name of jar whatever you like, but needs to pack to jar format which has .jar suffix.
Here is the complete step by step. Suppose your custom ConfigProvider fully-qualified name is com.my.CustomConfigProvider.MyClass.
1. create a file under directory: META-INF/services/org.apache.kafka.common.config.ConfigProvider. File content is full qualified class name:
com.my.CustomConfigProvider.MyClass
Include your source code, and above META-INF folder to generate a Jar package. If you are using Maven, file structure looks like this
put your final Jar file, say custom-config-provider-1.0.jar, under the Kafka worker plugin folder. Default is /usr/share/java. PLUGIN_PATH in Kafka worker config file.
Upload all the dependency jars to PLUGIN_PATH as well. Use the META-INFO/MANIFEST.MF file inside your Jar file to configure the 'ClassPath' of dependent jars that your code will use.
In kafka worker config file, create two additional properties:
CONNECT_CONFIG_PROVIDERS: 'mycustom', // Alias name of your ConfigProvider
CONNECT_CONFIG_PROVIDERS_MYCUSTOM_CLASS:'com.my.CustomConfigProvider.MyClass',
Restart workers
Update your connector config file by curling POST to Kafka Restful API. In Connector config file, you could reference the value inside ConfigData returned from ConfigProvider:get(path, keys) by using the syntax like:
database.password=${mycustom:/path/pass/to/get/method:password}
ConfigData is a HashMap which contains {password: 123}
If you still seeing ClassNotFound exception, probably your ClassPath is not setup correctly.
Note:
• If you are using AWS ECS/EC2, you need to set the worker config file by setting the environment variable.
• worker config and connector config file are different.

local webIDE not connectig to es4 service

I have installed SAP WebIDE local on my machine and trying to connect with the below services:
https://sapes4.sapdevcenter.com/sap/opu/odata/IWBEP/GWSAMPLE_BASIC/?sap-ds-debug=true
http://services.odata.org/v3/northwind/northwind.svc/
I am getting two errors attached for reference.
Below is my destination file1:
Description=es4
Type=HTTP
TrustAll=true
Authentication=NoAuthentication
WebIDEUsage=odata_abap
Name=es4
WebIDEEnabled=true
URL=https\://sapes4.sapdevcenter.com\:443
ProxyType=Internet
WebIDESystem=es4
File 2:
Description=es4
Type=HTTP
TrustAll=true
Authentication=NoAuthentication
WebIDEUsage=odata_gen
Name=es4
WebIDEEnabled=true
URL=https\://sapes4.sapdevcenter.com
ProxyType=Internet
WebIDESystem=es4
Is there any configuration needed in my local Cloud connector?
First, you shouldn't have separate files for the same destination. Please have it in one file and separate the WebIDEUsage values with commas (make sure there are no spaces). More information can be found in the documentation Hofit has added.
Second, there's no need in a Cloud Connector, as there's no cloud here. If you install Web IDE locally then it's installed in your local station, there's no connectivity to the cloud.
I'm sure you can find all the needed information in both the documentation and SAP community.
I just tried to connect to es4- as you did in the first screenshot and it is working fine. (the name in the service catalog dropdown should be es4 as the name in the destination file 1- and not es4123).
Here is a link to the documentation.

Read many files from Kafka Connect FileStreamSourceTask

I am reading 1 log file in Kafka, and creating a topic. This is succesful. To read this file, I am editing the file config/connect-file-source.properties to this purpose, and according to Step 7 of Kafka Quickstart (http://kafka.apache.org/quickstart#quickstart_kafkaconnect).
But, now, I would like to read a lot of files. In the file config/connect-file-source.properties I have edited the variable file with a pattern, for instance:
file=/etc/logs/archive.log*
Because I want to read all the files of the directory logs, with the pattern archive*.log. But, this line doesn't work.
What is the best form to implement the reading of files with a pattern, using the file config/connect-file-source.properties ?
In config/connect-file-source.properties,
source class is FileStreamSource and it uses task class as FileStreamSourceTask.
It reads a file using FileInputStream, so it cannot open multiple files at once. (by passing a directory name or regex pattern..)
You should implement your own Source & SourceTask class, or use an existing one that supports this feature such as kafka-connect-spooldir

how to copy local directory with files to remote server talend

in Talend(data integration) i am trying to copy local directory to remote directory but when i am running the job only i can copy files but not folders from directory.please help me with this job.
In my talend job i am using local connection and remote connection components->
tfilelist->tfileproperties(to store path and name in one table)->tmssqlinput(extracting path from last table)->iteration-> tssh(if directory s not available then create)->finally sending it to tftpput to connect and copy to remote directory.
when i am storing in one table using tfileproperties in that for files it will generate some size but when folder s coming the size will be zero,using this condition m creating the directory using tssh component but unable to create folders,please help me.
Do you get an error message?
I believe the output of the TMSSqlInput should be a row based, rather than iteration. That might be the source of the problem.
tMSqlInput docs
tMSSqlInput executes a DB query with a strictly defined order which
must correspond to the schema definition. Then it passes on the field
list to the next component via a Main row link.