Spark error with spark-cassandra-connector - scala

i'm trying to work with cassandra-mesos-spark and i would like to ask if someone can help me with this error, i used spark 2.2 try connector 1.6.11 and others but i cannot find out why i'm getting this
Environment:
spark-2.3.0-bin-hadoop2.7.tgz
datastax:spark-cassandra-connector:2.0.7-s_2.11
scala 11
Mesos cluster
Python application with pyspark
Code:
import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
sp_conf = SparkConf()
sp_conf.setAppName("spark_test")
sp_conf.setMaster("mesos://192.168.1.10:5050")
sp_conf.set("spark.local.dir", "/home/user/spark-temp")
sp_conf.set("spark.mesos.executor.home", "/home/user/spark")
sp_conf.set("spark.cassandra.connection.host", "192.168.1.51")
sp_conf.set("spark.jars.packages", "datastax:spark-cassandra-connector:2.0.7-s_2.11")
sp_conf.set("spark.mesos.coarse", "True")
sp_conf.set("spark.network.timeout","800")
sc = SparkContext(conf=sp_conf)
sqlContext = SQLContext(sc)
sys.stdout.write("\rGetting rows...")
sys.stdout.flush()
sqlContext.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="opt_instruments", keyspace="fxinstrumentsdb")\
.load().show()
ERROR:
datastax#spark-cassandra-connector added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found datastax#spark-cassandra-connector;2.0.7-s_2.11 in spark-packages
found com.twitter#jsr166e;1.1.0 in spark-list
found org.joda#joda-convert;1.2 in spark-list
found commons-beanutils#commons-beanutils;1.9.3 in central
found commons-collections#commons-collections;3.2.2 in spark-list
found joda-time#joda-time;2.3 in spark-list
found io.netty#netty-all;4.0.33.Final in central
found org.scala-lang#scala-reflect;2.11.8 in spark-list
:: resolution report :: resolve 1338ms :: artifacts dl 22ms
:: modules in use:
com.twitter#jsr166e;1.1.0 from spark-list in [default]
commons-beanutils#commons-beanutils;1.9.3 from central in [default]
commons-collections#commons-collections;3.2.2 from spark-list in [default]
datastax#spark-cassandra-connector;2.0.7-s_2.11 from spark-packages in [default]
io.netty#netty-all;4.0.33.Final from central in [default]
joda-time#joda-time;2.3 from spark-list in [default]
org.joda#joda-convert;1.2 from spark-list in [default]
org.scala-lang#scala-reflect;2.11.8 from spark-list in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 8 | 0 | 0 | 0 || 8 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 8 already retrieved (0kB/31ms)
2018-04-07 19:28:45 WARN Utils:66 - Your hostname, themachine resolves to a loopback address: 127.0.1.1; using 192.168.1.10 instead (on interface enp1s0)
2018-04-07 19:28:45 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-04-07 19:28:46 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2018-04-07 19:28:47 WARN SparkConf:66 - In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY instead. Future releases will not support JNI bindings via MESOS_NATIVE_LIBRARY.
I0407 19:28:51.387593 5128 sched.cpp:232] Version: 1.5.0
I0407 19:28:51.436372 5120 sched.cpp:336] New master detected at master#192.168.1.10:5050
I0407 19:28:51.447155 5120 sched.cpp:351] No credentials provided. Attempting to register without authentication
I0407 19:28:51.464504 5119 sched.cpp:751] Framework registered with 3c2a29b3-d69f-4982-802e-88342d5c42fd-0038
[Stage 0:> (0 + 1) / 1]2018-04-07 19:30:56 WARN TaskSetManager:66 - Lost task 0.0 in stage 0.0 (TID 0, 10.8.0.6, executor 1): java.io.IOException: Exception during preparation of SELECT "tradeunitsprecision", "minimumtrailingstopdistance", "displayprecision", "maximumtrailingstopdistance", "marginrate", "piplocation", "name", "type", "minimumtradesize", "displayname", "maximumpositionsize", "maximumorderunits" FROM "fxinstrumentsdb"."opt_instruments" WHERE token("name") > ? AND token("name") <= ? ALLOW FILTERING: org/apache/spark/sql/catalyst/package$ScalaReflectionLock$
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:323)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.com$datastax$spark$connector$rdd$CassandraTableScanRDD$$fetchTokenRange(CassandraTableScanRDD.scala:339)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$17.apply(CassandraTableScanRDD.scala:367)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at com.datastax.spark.connector.util.CountingIterator.hasNext(CountingIterator.scala:12)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/package$ScalaReflectionLock$
at org.apache.spark.sql.catalyst.ReflectionLock$.<init>(ReflectionLock.scala:5)
at org.apache.spark.sql.catalyst.ReflectionLock$.<clinit>(ReflectionLock.scala)
at com.datastax.spark.connector.types.TypeConverter$.<init>(TypeConverter.scala:75)
at com.datastax.spark.connector.types.TypeConverter$.<clinit>(TypeConverter.scala)
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:50)
at com.datastax.spark.connector.types.BigIntType$.converterToCassandra(PrimitiveColumnType.scala:46)
at com.datastax.spark.connector.types.ColumnType$.converterToCassandra(ColumnType.scala:231)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$11.apply(CassandraTableScanRDD.scala:312)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$11.apply(CassandraTableScanRDD.scala:312)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.createStatement(CassandraTableScanRDD.scala:312)
... 26 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.package$ScalaReflectionLock$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 44 more

My first guess is that you did not package in the spark sql dependency

Hi #panosd seems the connector doesnt work with 2.3.0 yet, there is an open pr here. +1 as well so that it can be brought to attention and merged asap.

I'm using HDFS and everything is better and faster, after that I can save the whole calculations to cassandra db for another part.

Related

Spark shell error while running num executors. YARN application has exited unexpectedly with state FAILED

I have just installed spark-3.3.1 and am trying to run the num executors but the job is getting failed.
I am doing it for the first time. I am unable to identify the cause of job failure here.
adminn#master:~$ spark-shell --master yarn --num-executors 1
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/01/07 07:13:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your latform... using builtin-java classes where applicable
23/01/07 07:13:40 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = yarn, app id = application_1673055142624_0003).
Spark session available as 'spark'.
23/01/07 07:14:17 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
23/01/07 07:14:17 ERROR YarnClientSchedulerBackend: Diagnostics message: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:102)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:110)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:558)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:277)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:926)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:925)
at java.security.AccessController.doPrivileged(Native Method)
at javax.secrrity.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:925)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:957)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:34213
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:204)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:202)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:198)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:34213
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
23/01/07 07:14:17 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to send shutdown message before the AM has registered!
23/01/07 07:14:17 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.3.1
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_352)
Type in expressions to have them evaluated.
Type :help for more information.
scala>

kafka ignite sink connector

I am trying to replicate data from Kafka topic to apache ignite cluster. I have added ignite-core.jar along with all other required files in plugins and it's loaded in connector as per the logs. Still I'm facing this error. Any suggestions would be helpful. Thank you.
INFO Instantiated connector string-ignite-connector with version 6.1.1-ccs of type class org.apache.ignite.stream.kafka.connect.IgniteSinkConnector (org.apache.kafka.connect.runtime.Worker:284)
[2021-05-15 08:52:43,717] INFO Finished creating connector string-ignite-connector (org.apache.kafka.connect.runtime.Worker:310)
[2021-05-15 08:52:43,719] ERROR WorkerConnector{id=string-ignite-connector} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:193)
java.lang.NoClassDefFoundError: org/apache/ignite/internal/util/typedef/internal/A
at org.apache.ignite.stream.kafka.connect.IgniteSinkConnector.start(IgniteSinkConnector.java:55)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:185)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:210)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:349)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:332)
at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:140)
at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.ignite.internal.util.typedef.internal.A
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 12 more
[2021-05-15 08:52:43,723] ERROR Failed to create job for myconfig/ignite-connector.properties (org.apache.kafka.connect.cli.ConnectStandalone:110)
[2021-05-15 08:52:43,723] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:121)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.errors.ConnectException: Failed to transition connector string-ignite-connector to state STARTED
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:115)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:99)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:118)
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to transition connector string-ignite-connector to state STARTED
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:335)
at org.apache.kafka.connect.runtime.WorkerConnector.doRun(WorkerConnector.java:140)
at org.apache.kafka.connect.runtime.WorkerConnector.run(WorkerConnector.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/ignite/internal/util/typedef/internal/A
at org.apache.ignite.stream.kafka.connect.IgniteSinkConnector.start(IgniteSinkConnector.java:55)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:185)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:210)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:349)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:332)
... 7 more
Caused by: java.lang.ClassNotFoundException: org.apache.ignite.internal.util.typedef.internal.A
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 12 more
It looks like you have ignite-kafka module in your classpath, but not ignite-core and its dependencies, which is also needed. My understanding ends there.

Getting so many Exceptions while running the kafka connect worker

Getting so many exceptions while running the Kafka Connect worker.
I have set all the worker properties and all the jar paths looks fine.
The exceptions are below:
2020-07-23 18:41:58 WARN Reflections:104 - could not create Dir
using jarFile from url
file:/kafka/bin/../clients/build/libs/kafka-clients*.jar. skipping.
java.lang.NullPointerException
at java.util.zip.ZipFile.<init>(ZipFile.java:213)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.reflections.vfs.Vfs$DefaultUrlTypes$1.createDir(Vfs.java:216)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:99)
at org.reflections.vfs.Vfs.fromURL(Vfs.java:91)
at org.reflections.Reflections.scan(Reflections.java:240)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader$InternalReflections.scan(DelegatingClassLoader.java:373)
at org.reflections.Reflections$1.run(Reflections.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2020-07-23 18:41:58 WARN Reflections:377 - could not create Vfs.Dir from url. ignoring the exception and continuing
org.reflections.ReflectionsException: Could not open url connection at org.reflections.vfs.JarInputDir$1$1.<init>(JarInputDir.java:37)
at org.reflections.vfs.JarInputDir$1.iterator(JarInputDir.java:33)
at org.reflections.Reflections.scan(Reflections.java:243)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader$InternalReflections.scan(DelegatingClassLoader.java:373)
at org.reflections.Reflections$1.run(Reflections.java:198)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: /kafka/bin/../clients/build/libs/kafka-clients*.jar (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138) -
kafkaconnectsladev.log
If you are seeing WARN Reflections, then this is a warning, and not an error, and it is safe to ignore.
You can edit the log4j.properties file to silence the warnings, if you want. Using the Confluent Docker images, this is done via the CONNECT_LOG4J_LOGGERS variable
Thanks guys, yes these are all warnings and with newer version of the kafka client library, i am not seeing these.

SocketTimeoutException when trying to run PySpark app from PyCharm

This is my first Python app that I'm trying to run on Spark. I have had no problems before running scala apps in server or standalone.
I start pyspark on another command window like the following:
C:\Users\jesaremi>conda activate py3.6
(py3.6) C:\Users\jesaremi>pyspark --master local[1]
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 18:50:55) [MSC v.1915 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
2019-01-23 09:05:45 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.0
/_/
Using Python version 3.6.8 (default, Dec 30 2018 18:50:55)
SparkSession available as 'spark'.
And here's my python script which is copied over from somewhere else:
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName('MyFirstStandaloneApp')
sc = SparkContext(conf=conf)
text_file = sc.textFile("./shakespeare.txt")
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)
print ("Number of elements: " + str(counts.count()))
counts.saveAsTextFile("./shakespeareWordCount")
The project interepreter in my PyCharm is set to Python 3.6 which I created my self and it contains important packages such as pyspark and py4j
The result of the run is the following:
C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\envs\py3.6\python.exe D:/Projects/HelloSpark/Main.py
2019-01-23 09:17:27 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2019-01-23 09:17:28 WARN Utils:66 - Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
[Stage 0:> (0 + 1) / 1]Traceback (most recent call last):
File "C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\jesaremi\AppData\Local\Continuum\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\spark-2.4.0-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\worker.py", line 25, in <module>
ModuleNotFoundError: No module named 'resource'
2019-01-23 09:17:40 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:103)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Accept timed out
at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:135)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:199)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:164)
... 18 more
thanks
Apparently PySpark 2.4 is messed up and you need to downgrade to 2.3.2 on Windows.
See this for more details:
No module named 'resource' installing Apache Spark on Windows

Apache Toree and Spark Scala Not Working in Jupyter

I'm having problems running Scala Spark on Jupyter. Below is my error message when I load Apache Toree - Scala notebook in jupyter.
root#ubuntu-2gb-sgp1-01:~# jupyter notebook --ip 0.0.0.0 --port 8888
[I 03:14:54.281 NotebookApp] Serving notebooks from local directory: /root
[I 03:14:54.281 NotebookApp] 0 active kernels
[I 03:14:54.281 NotebookApp] The Jupyter Notebook is running at: http://0.0.0.0:8888/
[I 03:14:54.281 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[W 03:14:54.282 NotebookApp] No web browser found: could not locate runnable browser.
[I 03:15:09.976 NotebookApp] 302 GET / (61.6.68.44) 1.21ms
[I 03:15:15.924 NotebookApp] Creating new notebook in
[W 03:15:16.592 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20161120031454 (61.6.68.44) 15.49ms referer=http://188.166.235.21:8888/notebooks/Untitled2.ipynb?kernel_name=apache_toree_scala
[I 03:15:16.677 NotebookApp] Kernel started: 94a63354-d294-4de7-a12c-2e05905e0c45
Starting Spark Kernel with SPARK_HOME=/usr/local/spark
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Kernel version: 0.1.0.dev8-incubating-SNAPSHOT
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Scala version: Some(2.10.4)
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - ZeroMQ (JeroMQ) version: 3.2.2
16/11/20 03:15:18 [INFO] o.a.t.Main$$anon$1 - Initializing internal actor system
Exception in thread "main" java.lang.NoSuchMethodError: scala.collection.immutable.HashSet$.empty()Lscala/collection/immutable/HashSet;
at akka.actor.ActorCell$.<init>(ActorCell.scala:336)
at akka.actor.ActorCell$.<clinit>(ActorCell.scala)
at akka.actor.RootActorPath.$div(ActorPath.scala:185)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:465)
at akka.actor.LocalActorRefProvider.<init>(ActorRefProvider.scala:453)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$2.apply(DynamicAccess.scala:78)
at scala.util.Try$.apply(Try.scala:192)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:73)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at akka.actor.ReflectiveDynamicAccess$$anonfun$createInstanceFor$3.apply(DynamicAccess.scala:84)
at scala.util.Success.flatMap(Try.scala:231)
at akka.actor.ReflectiveDynamicAccess.createInstanceFor(DynamicAccess.scala:84)
at akka.actor.ActorSystemImpl.liftedTree1$1(ActorSystem.scala:585)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:578)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:109)
at org.apache.toree.boot.layer.StandardBareInitialization$class.createActorSystem(BareInitialization.scala:71)
at org.apache.toree.Main$$anon$1.createActorSystem(Main.scala:35)
at org.apache.toree.boot.layer.StandardBareInitialization$class.initializeBare(BareInitialization.scala:60)
at org.apache.toree.Main$$anon$1.initializeBare(Main.scala:35)
at org.apache.toree.boot.KernelBootstrap.initialize(KernelBootstrap.scala:72)
at org.apache.toree.Main$delayedInit$body.apply(Main.scala:40)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at org.apache.toree.Main$.main(Main.scala:24)
at org.apache.toree.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
[W 03:15:26.738 NotebookApp] Timeout waiting for kernel_info reply from 94a63354-d294-4de7-a12c-2e05905e0c45
When running Scala shell, this is my output logs
root#ubuntu-2gb-sgp1-01:~# spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/11/20 03:17:11 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/20 03:17:12 WARN Utils: Your hostname, ubuntu-2gb-sgp1-01 resolves to a loopback address: 127.0.1.1; using 10.15.0.5 instead (on interface eth0)
16/11/20 03:17:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/11/20 03:17:13 WARN SparkContext: Use an existing SparkContext, some configuration may not take effect.
Spark context Web UI available at http://10.15.0.5:4040
Spark context available as 'sc' (master = local[*], app id = local-1479611833426).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.2
/_/
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
This problem was highlighted before in jira https://issues.apache.org/jira/browse/TOREE-336 . However, I'm still unable to get it working for some reason.
I followed the instructions listed on their official site.
https://toree.apache.org/documentation/user/quick-start
This is my path
scala> root#ubuntu-2gb-sgp1-01:~# echo $PATH
/root/bin:/root/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/spark:/usr/local/spark/bin
Please note I didnt install Scala as it comes with spark.
Thanks
We haven't used Spark 2.0 in production yet with Scala 2.11 and notebooks.
The root cause you your error is in compatibility. Based on GitHub Toree description, the latest Scala version that is supported is Scala 2.10.4 and you have 2.11.8.
Try to downgrade it to 2.10 if it is not a production need to use only 2.11