I am trying to access the streaming tweets from Spark Streaming.
This is the software configuration.
Ubuntu 14.04.2 LTS
scala -version
Scala code runner version 2.11.7 -- Copyright 2002-2013, LAMP/EPFL
spark-submit --version
Spark version 1.6.0
Following is the code.
object PrintTweets
{
def main(args: Array[String]) {
// Configure Twitter credentials using twitter.txt
setupTwitter()
// Set up a Spark streaming context named "PrintTweets" that runs locally using
// all CPU cores and one-second batches of data
val ssc = new StreamingContext("local[*]", "PrintTweets", Seconds(1))
// Get rid of log spam (should be called after the context is set up)
setupLogging()
// Create a DStream from Twitter using our streaming context
val tweets = TwitterUtils.createStream(ssc, None)
// Now extract the text of each status update into RDD's using map()
val statuses = tweets.map(status => status.getText())
// Print out the first ten
statuses.print()
// Kick it all off
ssc.start()
ssc.awaitTermination()
}
}
Utilities.scala
object Utilities {
/** Makes sure only ERROR messages get logged to avoid log spam. */
def setupLogging() = {
import org.apache.log4j.{Level, Logger}
val rootLogger = Logger.getRootLogger()
rootLogger.setLevel(Level.ERROR)
}
/** Configures Twitter service credentials using twiter.txt in the main workspace directory */
def setupTwitter() = {
import scala.io.Source
for (line <- Source.fromFile("./data/twitter.txt").getLines) {
val fields = line.split(" ")
if (fields.length == 2) {
System.setProperty("twitter4j.oauth." + fields(0), fields(1))
}
}
}
}
Issues:
Since it needs the twitter4j library, i have added
twitter4j-core-4.0.4, twitter4j-stream-4.0.4 in eclipse build path as external jars.
Then i ran the program, it didnt throw any error. But the tweets not appearing in console. It were empty.
So i see some forums and downgraded twitter4j to 3.0.3. Also in Eclipse i chosen Scala 2.10 Library container in Build Path window.
After that i got java.lang.NoSuchMethodError run-time error.
16/05/14 11:46:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: twitter4j.TwitterStream.addListener(Ltwitter4j/StreamListener;)V
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:72)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:148)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:130)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:575)
at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:565)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1992)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Please help me to resolve this. Initially i have installed spark by built using Scala 2.11. Is that the problem. Do i need uninstall everything and re-install Scala 2.10, then Spark pre-compiled package.
Or apart from Scala 2.11, do i need to have Scala 2.10 in my system?
The above exception seems to be caused by the incompatibility of spark version 1.6.0 and twitter4j 3.0.3 version.
twitter4j.TwitterStream which is being passed in the onStart method of org.apache.spark.streaming.twitter.TwitterReceiver, has method addListener which takes instance of twitter4j.StreamListener.
twitter4j 3.0.3 version has no method twitter4j.TwitterStream.addListener(StreamListener), instead it has few other addListener methods, which take the subclass of StreamListener.
twitter4j 4.0.4 version has the desired method, so that's why no error comes with this library. So changing to twitter4j 3.0.3 version will not solve the problem.
Problem is somewhere else.
In my case.
I had spark java project.
I cleaned pom file and start adding in order. First resolved spark related errors, then spark launcher, next on ward based on bigger library.
Note I was using cdh6.2.0 environment
Related
I am using spark 3.0.1 in my kotlin project. Compilation fails with the following error:
e: org.jetbrains.kotlin.util.KotlinFrontEndException: Exception while analyzing expression at (51,45) in /home/user/project/src/main/kotlin/ModelBuilder.kt
...
Caused by: java.lang.IllegalStateException: No parameter with index 0-0 (name=reverser$module$1 access=16) in method scala.collection.TraversableOnce.reverser$2
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.AnnotationsAndParameterCollectorMethodVisitor.visitParameter(Annotations.kt:48)
at org.jetbrains.org.objectweb.asm.ClassReader.readMethod(ClassReader.java:1149)
at org.jetbrains.org.objectweb.asm.ClassReader.accept(ClassReader.java:680)
at org.jetbrains.org.objectweb.asm.ClassReader.accept(ClassReader.java:392)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:77)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:40)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findClass(KotlinCliJavaFileManagerImpl.kt:115)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findClass(KotlinCliJavaFileManagerImpl.kt:85)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClass$$inlined$getOrPut$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:113)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClass$$inlined$getOrPut$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:48)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.ClassifierResolutionContext.resolveClass(ClassifierResolutionContext.kt:60)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.ClassifierResolutionContext.resolveByInternalName$frontend_java(ClassifierResolutionContext.kt:101)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryClassSignatureParser$parseParameterizedClassRefSignature$1.invoke(BinaryClassSignatureParser.kt:141)
I've cleaned/rebuilt the project several times, removed the build directory and tried building from the command line with gradle.
The code where this happens:
val data = listOf(...)
val schema = StructType(arrayOf(
StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
StructField("sentence", DataTypes.StringType, false, Metadata.empty())
))
val dataframe = spark.createDataFrame(data, schema) // <- offending line.
Was using kotlin version 1.4.0, upgraded to 1.4.10 without any change, still same error.
Looks like this bug (and this) already reported to JetBrains, but is it really not possible to use spark 3 (local mode) in kotlin 1.4?
I managed to get it working with Spring Boot (2.3.5) by adding the following to the dependencyManagement:
dependencies {
dependencySet("org.scala-lang:2.12.10") {
entry("scala-library")
}
}
This will downgrade the scala-library jar from 2.12.12 to 2.12.10 version, which is the same version of the scala-reflect jar in my project. I'm also using Kotlin 1.4.10
Are you trying to use this API?
https://spark.apache.org/docs/2.3.0/api/java/org/apache/spark/sql/SparkSession.html#createDataFrame-java.util.List-java.lang.Class-
There is no method which takes java.util.List as well as Schema object AFAIK..
I am trying to run my Scala job on my local machine (a MacBook pro osx10.13.3) and I am having an error at runtime.
My versions:
scala: 2.11.12
spark: 2.3.0
hadoop: 3.0.0
I installed everything through brew.
The exception is:
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
Happening at those line:
val conf = new SparkConf()
.setAppName(getName)
.setMaster("local[2]")
val context = new SparkContext(conf)
The last line is where the exception is thrown.
My theory is that Hadoop and spark version arent working together but I can't find online what version should Hadoop be for spark 2.3.0.
Thank you.
So I figured out my problem.
So first, yes, I don't need Hadoop installed. Thanks for pointing that out.
And second I had java10 installed instead of java8. Removing it solved the rest of the problems.
Thank you everyone !
I am using spark streaming in scala, where I am working with streaming twitter data. I have the following code:
val ssc = new StreamingContext(new SparkConf(), Seconds(5))
val tweets = TwitterUtils.createStream(ssc, None)
val user = tweets.map(x=> x.getText())
val lang = tweets.map(x=> x.getLang())
I am getting the following error:
[error] /home/user/Lab1.1/Twitterstats.scala:103: value getLang is not a member of twitter4j.Status
[error] val lang = tweets.map(x=> x.getLang())
[error] ^
[error] one error found
What is wrong with the above code? Could someone please help.
spark-streaming-twitter uses Twitter4j. getLang() is only supported since version 3.0.6 of Twitter4J. If you are using a version 1.5.2 (or lower) of spark-streaming-twitter you won't be able to call getLang() because it uses version 3.0.3 of twitter4j. Since 1.6.0 version 4.0.4 is supported as wel as the getLang() function.
So you could upgrade spark-streaming-twitter to 1.6.0 or higher. Or you could use another 3rd party library to detect the language of your tweets.
(possible duplicate)
I am new to Scala and am trying to code read a file using the following code
scala> val textFile = sc.textFile("README.md")
scala> textFile.count()
But I keep getting the following error
error: not found: value sc
I have tried everything, but nothing seems to work. I am using Scala version 2.10.4 and Spark 1.1.0 (I have even tried Spark 1.2.0 but it doesn't work either). I have sbt installed and compiled yet not able to run sbt/sbt assembly. Is the error because of this?
You should run this code using ./spark-shell. It's scala repl with provided sparkContext. You can find it in your apache spark distribution in folder spark-1.4.1/bin.
I am trying to run a simple streaming job on HDP 2.2 Sandbox but facing java.lang.NoSuchMethodError error. I am able to run SparkPi example on this machine without an issue.
Following are the versions I am using-
<kafka.version>0.8.2.0</kafka.version>
<twitter4j.version>4.0.2</twitter4j.version>
<spark-version>1.2.1</spark-version>
<scala.version>2.11</scala.version>
Code Snippet -
val sparkConf = new SparkConf().setAppName("TweetSenseKafkaConsumer").setMaster("yarn-cluster");
val ssc = new StreamingContext(sparkConf, Durations.seconds(5));
Error text from Node Manager UI-
Exception in thread "Driver" scala.MatchError:
java.lang.NoSuchMethodError:
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less; (of class
java.lang.NoSuchMethodError) at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:432)
15/02/12 15:07:23 INFO yarn.ApplicationMaster: Waiting for spark
context initialization ... 1 15/02/12 15:07:33 INFO
yarn.ApplicationMaster: Waiting for spark context initialization ... 2
Job is accepted in YARN but it never goes into RUNNING status.
I was thinking it is due to Scala version differences. I tried changing POM configuration but still not able to fix the error.
Thank you for your help in advance.
Earlier I specified dependency for spark-streaming_2.10 ( Spark compiled with Scala 2.10). I did not specify dependency for Scala compiler itself. It seems Maven automatically pulled 2.11 (Maybe due to some other dependency). When trying to debug this issue, I added a dependency on Scala compiler 2.11. Now after Paul's comment I changed that Scala dependency version to 2.10 and it is working.