kafka-connect file-pulse connector standalone could not start(class not found exception on offset manager) - apache-kafka

I want to use filepulse connector to load xml files to kafka.
below are my environment:
Win10 WSL, installed Ubuntu
downloaded the confluent platform 5.5.1 (see "https://www.confluent.io/download/"), unpacked
downloaded zip file version 1.5.2 from github (https://github.com/streamthoughts/kafka-connect-file-pulse/releases), unzipped
modified the "connect-standalone.properties" located under confluent path (etc/kafka/connect-standalone.properties) to include path "/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib"
I did not create any topic.
I started zookeeper; kafka, and tried to start kafka-connect standalone like below:
$ zookeeper-server-start etc/kafka/zookeeper.properties
$ kafka-server-start etc/kafka/server.properties
$ connect-standalone \
etc/kafka/connect-standalone.properties \
/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/etc/quickstart-connect-file-pulse-csv.properties
but I have failures see below
[2020-09-08 15:57:45,522] INFO Loading plugin from: /home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib/kafka-connect-file-pulse-expression-1.5.2.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:239)
[2020-09-08 15:57:45,541] INFO Registered loader: PluginClassLoader{pluginLocation=file:/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib/kafka-connect-file-pulse-expression-1.5.2.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:262)
[2020-09-08 15:57:45,541] INFO Loading plugin from: /home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib/kafka-connect-file-pulse-filters-1.5.2.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:239)
[2020-09-08 15:57:45,553] INFO Registered loader: PluginClassLoader{pluginLocation=file:/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib/kafka-connect-file-pulse-filters-1.5.2.jar} (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:262)
[2020-09-08 15:57:45,554] INFO Loading plugin from: /home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib/kafka-connect-file-pulse-plugin-1.5.2.jar (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader:239)
[2020-09-08 15:57:45,575] ERROR Stopping due to error (org.apache.kafka.connect.cli.ConnectStandalone:130)
java.lang.NoClassDefFoundError: io/streamthoughts/kafka/connect/filepulse/offset/OffsetManager
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.versionFor(DelegatingClassLoader.java:385)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.getPluginDesc(DelegatingClassLoader.java:355)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanPluginPath(DelegatingClassLoader.java:328)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scanUrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.registerPlugin(DelegatingClassLoader.java:253)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initPluginLoader(DelegatingClassLoader.java:222)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.initLoaders(DelegatingClassLoader.java:199)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.java:60)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:79)
Caused by: java.lang.ClassNotFoundException: io.streamthoughts.kafka.connect.filepulse.offset.OffsetManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
question
Can you please help what could be the fix?
ps. below is what I tried for jdbc connector, just to see whether other connectors are working or not. It did not have exceptions.
connect-standalone \
etc/kafka/connect-standalone.properties \
/home/min/confluent-5.5.1/etc/kafka-connect-jdbc/sink-quickstart-sqlite.properties
pps. below is the content of file-pulse connector (I did not make changes)
connector.class=io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceConnector
topic=connect-file-pulse-quickstart-csv
tasks.max=1
filters=ParseDelimitedRow
# Delimited Row filter
filters.ParseDelimitedRow.extractColumnName=headers
filters.ParseDelimitedRow.trimColumn=true
filters.ParseDelimitedRow.type=io.streamthoughts.kafka.connect.filepulse.filter.DelimitedRowFilter
skip.headers=1
task.reader.class=io.streamthoughts.kafka.connect.filepulse.reader.RowFileInputReader
# File scanning
fs.cleanup.policy.class=io.streamthoughts.kafka.connect.filepulse.clean.LogCleanupPolicy
fs.scanner.class=io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker
fs.scan.directory.path=/tmp/kafka-connect/examples/
fs.scan.interval.ms=10000
# Internal Reporting
internal.kafka.reporter.bootstrap.servers=localhost:9092
internal.kafka.reporter.id=connect-file-pulse-quickstart-csv
internal.kafka.reporter.topic=connect-file-pulse-status
# Track file by name and hash
offset.strategy=name+hash
ppps. below are the key info for the connect-standalone.properties file (I added the /home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib)
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include
# any combination of:
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples:
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib
# plugin.path=/usr/share/java,/home/min/confluent-5.5.1/share/java/kafka-connect-jdbc

I would not call this one a perfect answer, but it is just the way I made it working.
I found the plugin.path property for the 'file-pulse' connector has to be the parent folder of the jar files.
so below worked.
$ cat etc/kafka/connect-standalone.properties
key info extracted:
plugin.path=/user/share/java,/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2
# plugin.path=/usr/share/java,/home/min/confluent-5.5.1/share/java/kafka-connect-jdbc
before it was like below, and throw exception as I mentioned.
plugin.path=/user/share/java,/home/min/streamthoughts-kafka-connect-file-pulse-1.5.2/lib
# plugin.path=/usr/share/java,/home/min/confluent-5.5.1/share/java/kafka-connect-jdbc
maybe I did not understand below instructions in the connect-standalone.properties property file, or maybe the file-pulse connectors did not follow the standards/instructions stated here. certainly, the JDBC connectors the path is the path containing the JAR files.
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies

Related

Unable to run Scala Play app throwing com.typesafe.config.ConfigException$Missing: No configuration setting found for key

I am trying run a Scala Play app and got this exception:
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'memo'
at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:170) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:264) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:270) ~[config-1.3.1.jar:na]
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37) ~[config-1.3.1.jar:na]
at lila.common.PlayApp$.loadConfig(PlayApp.scala:24) ~[na:na]
at lila.memo.Env$.lila$memo$Env$$$anonfun$1(Env.scala:28) ~[na:na]
at lila.common.Chronometer$.sync(Chronometer.scala:56) ~[na:na]
I put external reference to the configuration file such as :
run -Dconfig.resource=conf/base.conf
I am new to Scala.Tried to find out conifg-1.3.1-jar but unable to find out.Can you please suggest how to overcome this situation.
**In the base.conf there is configuration for the memo is present.
Try without conf/, run -Dconfig.resource=base.conf
From Play 2.6 docs
Using -Dconfig.resource
This will search for an alternative configuration file in the application classpath (you usually provide these alternative configuration files into your application conf/ directory before packaging). Play will look into conf/ so you don’t have to add conf/.
$ /path/to/bin/ -Dconfig.resource=prod.conf
Using -Dconfig.file
You can also specify another local configuration file not packaged into the application artifacts:
$ /path/to/bin/ -Dconfig.file=/opt/conf/prod.conf
If you are configuring an sbt task from Intellij you must quote the entire command.
"run -Dconfig.resource=prod.conf"

Spark executor is throwing error "java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver"

I am trying to import a table from my oracle database using spark and here I am using Scala to import the table.
My jdbc driver is ojdbc7.jar and it's added in both the parameter spark.driver.extraClassPath and spark.executor.extraClassPath in configuration file
spark.driver.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
hare/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/ojdbc7.jar
spark.driver.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
spark.executor.extraClassPath :/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/s
hare/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/ojdbc7.jar
spark.executor.extraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
I can successfully import the table. I can print the schema of the table. But while performing any operations like Count,show() it throws below error
`
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
at java.lang.ClassLoader.findClass(ClassLoader.java:530) at
org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
at
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:77)
... 21 more
`
This error was because Spark was not able to locate the ojdbc7.jar from every core node. So placing this jar in a shared location like /usr/lib/spark/jars will resolve this issue.
You can also do few other things including adding jar file full location as a dependency under spark section in the interpreter as an artifact
If you just want %jdbc to work, update the jdbc section under interpreter, add the jar file full location as a artifact under the dependencies and also update the default.driver, default.url, default.user, default.password accordingly

Karaf 3.0.x config:update command does not create .cfg file in /etc

I am using karaf 3.0.1 with my bundle (https://github.com/johanlelan/camel-cxfrs-blueprint-example). I want to manage properties at runtime but I see that config:update does not create file on /etc, why?
<cm:property-placeholder persistent-id="org.apache.camel.examples.cxfrs.blueprint"
update-strategy="reload">
<!-- list some properties for this test -->
<cm:default-properties>
<cm:property name="cxf.application.in"
value="cxfrs:bean:rest.endpoint?throwExceptionOnFailure=false&bindingStyle=SimpleConsumer&loggingFeatureEnabled=true"/>
<cm:property name="common.tenant.in" value="direct-vm:common.tenant.in"/>
<cm:property name="common.authentication.in" value="direct-vm:common.authentication.in"/>
<cm:property name="application.put.in" value="direct-vm:application.putById"/>
<cm:property name="application.post.in"
value="direct-vm:application.postApplications"/>
<cm:property name="log.trace.level" value="INFO"/>
</cm:default-properties>
</cm:property-placeholder>
In karaf I try to modify an endpoint url:
karaf#root()> config:edit org.apache.camel.examples.cxfrs.blueprint
karaf#root()> config:property-set common.tenant.in direct-vm:test
karaf#root()> config:property-list
service.pid = org.apache.camel.examples.cxfrs.blueprint
common.tenant.in = direct-vm:test
felix.fileinstall.filename = file:/F:/travail/servers/karaf-lan/etc/org.apache.camel.examples.cxfrs.blueprint.cfg
karaf#root()> config:update
karaf#root()>
I precise that my bundle is updated after config:update but no file exists in /etc... I think it works in karaf 2.3.5.
Configurations are persisted by the ConfigurationAdmin service. If you are using Karaf, it uses the implementation from Felix ConfigAdmin [1]. By default Karaf configures ConfigAdmin to store files in its local bundle storage area under /data, but that can be changed by editing the felix.cm.dir property.
Also, the support for the .cfg files comes from Felix FileInstall [2].
[1] http://felix.apache.org/documentation/subprojects/apache-felix-config-admin.html
[2] http://felix.apache.org/site/apache-felix-file-install.html
It is a known issue at karaf 3.0.1
You may use apache karaf 3.0.2 that this bug is fixed.

Scalding Tutorial with HDFS: Data is missing from one or more paths in: List(tutorial/data/hello.txt)

After configuring ssh and rsync when I try to run Scalding tutorial (https://github.com/Cascading/scalding-tutorial/) with command:
$ scripts/scald.rb --hdfs tutorial/Tutorial0.scala
I get the following error:
com.twitter.scalding.InvalidSourceException: [com.twitter.scalding.TextLineWrappedArray(tutorial/data/hello.txt)] Data is missing from one or more paths in: List(tutorial/data/hello.txt)
This error happens notwithstanding file tutorial/data/hello.txt really exists.
How to fix this?
Stdout:
$ scripts/scald.rb --hdfs tutorial/Tutorial0.scala
scripts/scald.rb:194: warning: already initialized constant SCALA_LIB_DIR
dkondratev#hadoop-n002.maxus.lan's password:
dkondratev#hadoop-n002.maxus.lan's password:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/phoenix/phoenix-4.0.0.2.1.2.1-471-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/07/07 19:05:45 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/07 19:05:45 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
Exception in thread "main" java.lang.Throwable: GUESS: Data is missing from the path you provided.
If you know what exactly caused this error, please consider contributing to GitHub via following link.
https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#comtwitterscaldinginvalidsourceexception
at com.twitter.scalding.Tool$.main(Tool.scala:132)
at com.twitter.scalding.Tool.main(Tool.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
I think the problem you are having is that you are telling Scalding to run in HDFS, but the file you are providing as input is in your local file system, not in HDFS. Before running the example, upload the file to your HDFS:
hadoop fs -mkdir tutorial
hadoop fs -mkdir tutorial/data
hadoop fs -put tutorial/data/hello.txt tutorial/data/hello.txt
Try to pack your job into a fat jar using Maven shade plugin and then run your Scalding job via the hadoop command:
hadoop jar your-uber.jar com.twitter.scalding.Tool bar.foo.MyClassJob --hdfs --input ... --output ...

mule service start script raises exception with log file

I'm trying to rewrite the mule start script so it works as a service on a RHEL.
Currently I have it mostly done.
It is starting and I have the most of the log files being successfully written where I want them.
But there's a file named literally .log that I do not know what is for, neither where to configure (its name and path).
Such file is adding the following nasty lines in the mule_ee.log upon start up:
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: .log (Permission denied)
at java.io.FileOutputStream.openAppend(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:207)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at org.apache.log4j.FileAppender.setFile(FileAppender.java:294)
at org.apache.log4j.FileAppender.activateOptions(FileAppender.java:165)
at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
at org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
at org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
at org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
at org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
at org.mule.module.launcher.log4j.ApplicationAwareRepositorySelector.configureFrom(ApplicationAwareRepositorySelector.java:166)
at org.mule.module.launcher.log4j.ApplicationAwareRepositorySelector.getLoggerRepository(ApplicationAwareRepositorySelector.java:95)
at org.apache.log4j.LogManager.getLoggerRepository(LogManager.java:208)
at org.apache.log4j.LogManager.getLogger(LogManager.java:228)
at org.mule.module.logging.MuleLoggerFactory.getLogger(MuleLoggerFactory.java:77)
at org.mule.module.logging.DispatchingLogger.getLogger(DispatchingLogger.java:419)
at org.mule.module.logging.DispatchingLogger.isInfoEnabled(DispatchingLogger.java:191)
at org.apache.commons.logging.impl.SLF4JLog.isInfoEnabled(SLF4JLog.java:78)
at org.mule.module.launcher.application.DefaultMuleApplication.init(DefaultMuleApplication.java:188)
at org.mule.module.launcher.application.PriviledgedMuleApplication.init(PriviledgedMuleApplication.java:46)
at org.mule.module.launcher.application.ApplicationWrapper.init(ApplicationWrapper.java:64)
at org.mule.module.launcher.DefaultMuleDeployer.deploy(DefaultMuleDeployer.java:46)
at org.mule.module.launcher.DeploymentService.guardedDeploy(DeploymentService.java:398)
at org.mule.module.launcher.DeploymentService.start(DeploymentService.java:181)
at org.mule.module.launcher.MuleContainer.start(MuleContainer.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.mule.module.reboot.MuleContainerWrapper.start(MuleContainerWrapper.java:56)
at org.tanukisoftware.wrapper.WrapperManager$12.run(WrapperManager.java:3925)`
What is that .log file for? Where is the conf file to set it up to be written in a place where the mule user has permissions to write?
It seems that log4j can't find that file because OS permissions. You can chmod to add permissions in your MULE_HOME dir. And also review your log4j config to see why is trying to read .log file.
The .log file seems to be from the 00_mmc-agent app.
The problem therefore was in the log4j.properties file located in ./apps/00_mmc-agent/classes such file has the following appender configured: log4j.appender.file.File=${app.name}.log
That ${app.name} variable seem to be not properly configured (not even in the default not modified mule starting script). hence the .log file name.
In ./apps/mmc/webapps/mmc/WEB-INF/classes/log4j.properties the appender is configured like this log4j.appender.R.File=${mule.home}/logs/mmc-console-app.log
So, to fix the startup error I modified the appender of the log4j.properties file located in ./apps/00_mmc-agent/classes to have this path:
log4j.appender.file.File=${mule.home}/logs/00_mmc-agent.log