Apache Spark: Can't resolve constructor StreamingContext - scala

I've been trying to establish a StreamingContext in my program but I can't for the life of me figure out what's going on. I added the spark-streaming jar file to the dependencies and imported it in the code but I can't help feeling like I'm missing some small detail somewhere. How should I proceed?
picture of code

You forgot to import StreamingContext in your case.
Use
import org.apache.spark.streaming.StreamingContext
Not
import org.apache.spark.streaming.StreamingContext._
It will import inner objects not the class.

Related

Why won't scalatest compile?

I have a simple test setup like
package unit
import net.kolotyluk.leaderboard.scorekeeping._
import net.kolotyluk.leaderboard.telemetry.Metrics
import net.kolotyluk.scala.extras.Logging
import org.scalatest.{FlatSpec, GivenWhenThen, Matchers, SequentialNestedSuiteExecution}
import scala.collection.mutable.ArrayBuffer
import scala.concurrent.{Await,Future}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.{Failure, Random, Success}
class LeaderboardSpec
extends FlatSpec
with SequentialNestedSuiteExecution
with GivenWhenThen
with Matchers
with Logging {
behavior of "Leaderboard"
it must "handle initial conditions correctly" in {
but when I try to compile my tests I get 53 errors like
[IJ]sbt:leaderboard> test
[info] Compiling 1 Scala source to C:\Users\ERIC\Documents\git\repos\leaderboard\target\scala-2.12\test-classes ...
[error] C:\Users\ERIC\Documents\git\repos\leaderboard\src\test\scala\unit\LeaderboardSpec.scala:21:12: could not find implicit value for parameter pos: org.scalactic.source.Position
[error] behavior of "Leaderboard"
[error] ^
Which does not actually convey any useful information on what the problem is. I can only assume that something is not configured correctly, either in my build.sbt file, or somewhere else.
This code did work at one time, and somewhere along the way I was cleaning things up, things changed, and now it's broken with no good diagnostics.
Can anyone suggest things to look for?
So one workaround that seems to compile and run correctly is to stop using SBT and use Maven instead.
I think this is the third major defect I have found in SBT so far.

Dataframe methods within SBT project

I have the following code that works on the spark-shell
df1.withColumn("tags_splitted", split($"tags", ",")).withColumn("tag_exploded", explode($"tags_splitted")).select("id", "tag_exploded").show()
But fails in sbt with the following errors:
not found: value split
not found: value explode
My scala code has the following
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().appName("Books").getOrCreate()
import spark.implicits._
Can someone give me a pointer to what is wrong in the sbt enviroment?
Thanks
The split and explode function are available in the package org.apache.spark.sql inside functions.
So you need to import both
org.apache.spark.sql.functions.split
org.apache.spark.sql.functions.explode
Or
org.apache.spark.sql.functions._
Hope this helps!

Why am I importing so many classes?

I'm looking at example Spark code and I'm a bit confused as to why the sample code I'm looking at requires two import statements:
import org.apache.spark._
import org.apache.spark.SparkContext._
This is Scala. As I understand it, _ is the wildcard character. So this looks like I'm importing SparkContext twice. Can anybody shed light on this?
This first line says to import all of the classes in the package org.apache.spark. This means you can use all of those classes without prefixing them with the package name.
The second line says to import all of the static members of the class SparkContext. This means you can use those members without prefixing their names with the class name.
Remember import doesn't really do anything at run time; it just lets you write less code. You aren't actually "importing" anything twice. The use of the term import comes from Java, and admittedly it is confusing.
This might help:
Without the first line, you would have to say
org.apache.spark.SparkContext
but the first import line lets you say
SparkContext
If you had only the first line and not the second, you would have to write
SparkContext.getOrCreate
but with both import lines you can just write
getOrCreate

Cats: how to find the specific type from implicits

I have this code which compiles and works fine
import cats.implicits._
Cartesian[ValidResponse].product(
getName(map).toValidated,
readAge(map).toValidated
).map(User.tupled)
However I don't like the import of cats.implicits._ because there is just too many classes there. I tried importing specific things related to Cartesians like
import cats.implicits.catsSyntaxCartesian
import cats.implicits.catsSyntaxUCartesian
import cats.implicits.catsSyntaxTuple2Cartesian
But these did not work. As a newbie I find the implicit imports very confusing because there are simply 1000s of them and the names are not very obvious. My only alternative is to import the entire universe by import cats.implicits._ and stop thinking about it.
In fact I have a broader confusion about cats.implicits, cats.instances._ and cats.syntax._. So far I am just importing these via trial and error. I am not really sure of when to import what.
Do not try to pick out specific things from cats.implicits. You either import the entire thing, or you don't use it at all. Further, there's no reason to be afraid of importing it all. It can't interfere with anything.
Ok, I lied. It will interfere if you import cats.instances.<x>._ and/or cats.syntax.<x>._ alongside cats.implicits._. These groups are meant to be mutually exclusive: you either import everything and forget about it with cats.implicits._, or you specifically select what you want to import with cats.instances and cats.syntax.
These two packages are not meant to be imported completely like cats.implicits. Instead, they include a bunch of objects. Each object contains some implicit instances/syntax, and you are meant to import from those.
import cats.implicits._ // Good, nothing to fear
// RESET IMPORTS
import cats.implicits.catsSyntaxCartesian // Bad, don't pick and choose
// RESET IMPORTS
import cats.instances._ // Bad, is useless
import cats.syntax._ // Ditto
// RESET IMPORTS
import cats.instances.list._ // ok
import cats.syntax.cartesian._ // ok
// RESET IMPORTS
import cats.implicits._
import cats.syntax.monad._ // Bad, don't mix these two
Additionally each of cats.{ instances, syntax } contains an all object, with the obvious function. The import cats.implicits._ is really a shortcut for import cats.syntax.all._, cats.instances.all._.
I'll start by saying that import cats.implicits._ is safe, reasonable and the recommended approach when starting. So if the only reason for this question is that you don't like importing too many classes, then I think you should just bite the bulled at leave that as is.
Additionally, I recommend you take a look at the official cats import guide. It tries to explain the package/logical structure of cats code and might make it easier to understand.
The "cats" library is organized in several "areas" that can be easily distinguished by their package name:
cats._ - This is where most of the typeclasses live (e.g. Monad, Foldable etc.)
cats.data._ - This is the home of data structures like Validated and State.
cats.instances._ - This is where the instances for the typeclasses defined in 1 are. For example if you import cats.instances.list._ you'll bring into scope the Show, Monad etc. instances for the standard List. This is what you're most likely interested in.
cats.syntax._ - has some syntax enrichment that makes code easier to write and read.
An example of ussing cats.syntax._ would be:
import cats.Applicative
import cats.syntax.applicative._
val listOfInt = 5.pure[List]
//instead of
val otherList = Applicative[List[Int]].pure(5)

object play.http.HttpEntity.Streamed is not a value

Using Scala and Play 2.5.10 (according to plugin.sbt) I have this code:
import akka.stream.scaladsl.Source
import play.api.libs.streams.Streams
import play.http._
val source = Source.fromPublisher(Streams.enumeratorToPublisher(enumerator))
Ok.sendEntity(HttpEntity.Streamed(source, None, Some("application/zip")))
The imports there are mostly from testing because no matter what I try I can't get the framework to accept HttpEntity.Streamed. With this setup the error is what the title says. Or taken from the console:
Looking at the documentation here I can't really figure out why it doesn't work: https://www.playframework.com/documentation/2.5.10/api/java/play/http/HttpEntity.Streamed.html
This is also what the official examples use: https://www.playframework.com/documentation/2.5.x/ScalaStream
Does anyone at least have some pointers on where to start looking? I've never used Scala or Play before so any hints are welcome.
you should import this one
import play.api.http.HttpEntity
import play.api.libs.streams.Streams
val entity: HttpEntity = HttpEntity.Streamed(fileContent, None, None)
Result(ResponseHeader(200), entity).as(MemeFoRTheFile)
It means that HttpEntity.Streamed is not a value so you should wrap it in a Result() with its ResponseHeader and its extension