I am using the below code to run in Qubole Notebook and the code is running successfully.
case class cls_Sch(Id:String, Name:String)
class myClass {
implicit val sparkSession = org.apache.spark.sql.SparkSession.builder().enableHiveSupport().getOrCreate()
sparkSession.sql("set spark.sql.crossJoin.enabled = true")
sparkSession.sql("set spark.sql.caseSensitive=false")
import sparkSession.sqlContext.implicits._
import org.apache.hadoop.fs.{FileSystem, Path, LocatedFileStatus, RemoteIterator, FileUtil}
import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql.DataFrame
def my_Methd() {
var my_df = Seq(("1","Sarath"),("2","Amal")).toDF("Id","Name")
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
}
}
val obj_myClass = new myClass()
obj_myClass.my_Methd()
However when I run in the same code in Qubole's Analyze, I am getting the below error.
When I take out the below code, its running fine in Qubole's Anlayze.
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
I believe somewhere I have to change the usage of case class.
I am using Spark 2.3.
Can someone please let me know how to solve this issue.
Please let me know if you need any other details.
All you have to do is have the import spark.implicits._ inside the my_Methd() function.
def my_Methd() {
import spark.implicits._
var my_df = Seq(("1","Sarath"),("2","Amal")).toDF("Id","Name")
my_df.as[cls_Sch].take(my_df.count.toInt).foreach(t => {
println(s"${t.Name}")
})
}
For any reason the kernel finds problems when working with dataset. I made two tests that worked with Apache Toree:
Related
I would like to use Monix Observable with Doobie (fs2) stream, but can't seem to get it working properly. Without streaming, my test app exits just fine but after using streaming, my TaskApp seems to hang on shutdown and can't figure out why.
Here is a minimal example to re-produce the problem:
package example
import java.util.concurrent.Executors
import doobie.implicits._
import cats.effect.{Blocker, ContextShift, ExitCode, Resource}
import doobie.hikari.HikariTransactor
import monix.eval.{Task, TaskApp}
import com.typesafe.scalalogging.StrictLogging
import fs2.interop.reactivestreams._
import monix.reactive.Observable
import scala.concurrent.ExecutionContext
object Hello extends TaskApp with StrictLogging {
private def resources()(implicit contextShift: ContextShift[Task]): Resource[Task, Resources] = {
for {
transactor <- Database.transactor("org.postgresql.Driver", "jdbc:postgresql://localhost/fubar", "fubar", "fubar")
} yield Resources(transactor)
}
def run(args: List[String]): Task[ExitCode] = resources().use(task)
.flatMap(_ => Task { println("All Done!") })
.flatMap(_ => Task(ExitCode.Success))
def task(resources: Resources): Task[Unit] = {
val publisher =
sql"""select id from message;"""
.query[(Long)]
.stream
.transact(resources.transactor)
.toUnicastPublisher()
Observable.fromReactivePublisher(publisher)
.foreachL(id => logger.info(id.toString))
}
}
case class Resources(transactor: HikariTransactor[Task])
object Database {
val ecBlocking = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(8))
def transactor(dbDriver: String, dbUrl: String, dbUser: String, dbPassword: String)(implicit contextShift: ContextShift[Task]): Resource[Task, HikariTransactor[Task]] = {
HikariTransactor.newHikariTransactor[Task](dbDriver, dbUrl, dbUser, dbPassword, ecBlocking, Blocker.liftExecutionContext(ecBlocking))
}
}
I have converted fs2 stream to Monix observable according to Monix documentation: https://monix.io/docs/current/reactive/observable.html#fs2
Do I need to somehow close the fs2 stream or the Observable to get the application exit cleanly?
Appreciate any tips to get this working or tips how to properly debug this.
The problem was that ExecutionContext needs to be shutdown. See the authors' answer here.
Correct usage can been seen in the documentation.
I am trying to use a ProcessWindowFunction in my Apache Flink project using Scala. Unfortunately, I already fail at implementing a basic ProcessWindowFunction like it is used in the Apache Flink Documentation.
This is my code:
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}
import org.apache.flink.streaming.api.windowing.time.Time
import org.fiware.cosmos.orion.flink.connector.{NgsiEvent, OrionSource}
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction
import org.apache.flink.streaming.api.windowing.windows.TimeWindow
import org.apache.flink.streaming.api.windowing.assigners.SlidingProcessingTimeWindows
import org.apache.flink.util.Collector
import scala.collection.TraversableOnce
object StreamingJob {
def main(args: Array[String]) {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val eventStream = env.addSource(new OrionSource(9001))
val processedDataStream = eventStream.flatMap(event => event.entities)
.map(entity => (entity.id, entity.attrs("temperature").value.asInstanceOf[String]))
.keyBy(_._1)
.window(SlidingProcessingTimeWindows.of(Time.seconds(10), Time.seconds(5)))
.process(new MyProcessWindowFunction())
env.execute("Socket Window NgsiEvent")
}
}
private class MyProcessWindowFunction extends ProcessWindowFunction[(String, String), String, String, TimeWindow] {
def process(key: String, context: Context, input: Iterable[(String, String)], out: Collector[String]): Unit = {
var count: Int = 0
for (in <- input) {
count = count + 1
}
out.collect(s"Window ${context.window} count: $count")
}
}
From IntelliJ I get the following hints:
1) This is shown where the new class object is created:
Type mismatch, expected: ProcessWindowFunction[(String, String), NotInferedR, String, TimeWindow], actual: MyProcessWindowFunction
2) This is shown directly at the class:
Class 'MyProcessWindowFunction' must either be declared abstract or implement abstract member 'process(key:KEY, context:ProcessWindowFunction.Context, iterable:Iterable<IN>, collector:Collector<OUT>):void' in 'org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction'
Building the code shows me the following error:
Error:(51, 16) type mismatch;
found : org.apache.flink.MyProcessWindowFunction
required:
org.apache.flink.streaming.api.scala.function.ProcessWindowFunction[(String, String),?,String,org.apache.flink.streaming.api.windowing.windows.TimeWindow]
.process(new MyProcessWindowFunction())
I am grateful for every help.
After spending some time debugging with 2 more people we finally managed to find the problem.
In my code I used the following import:
import org.apache.flink.streaming.api.functions.windowing.ProcessWindowFunction
But the correct import when using Scala seems to be:
import org.apache.flink.streaming.api.scala.function.ProcessWindowFunction
//package of ProcessWindowFunction is
import org.apache.flink.streaming.api.scala.function.ProcessWindowFunction
//The correct way to call this method
new MyProcessWindowFunction()[(String, String), String, String, TimeWindow]
//I know the official documents don't.This may be a bug
What I'm really trying to do is monitor multiple files and when any of them is modified I'd like to update some state and produce a side effect using this state. I imagine what I want is a scan over a Traversable that produces a Traversable[IO[_]]. But I don't see the path there.
as a minimal attempt to produce this I wrote
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
IO.async[File] { cb =>
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = cb(Right(file))
}
}
watchers.toList.foreach(_.start)
}
.flatMap(f => IO { println(f) })
.unsafeRunSync
}
}
)
but this has two major flaws. One it creates a thread for each file I'm watching, which is a little heavy. But more importantly the program finishes as soon as a single file is modified, even though onModify would be called more times if the program stayed running.
I'm not married to using better-files, it just seemed like the path of least resistance. But I do require using Cats IO.
This solution doesn't solve the issue of creating a bunch of threads, and it doesn't strictly produce a Traversable, but it solves the underlying use case. I'm very open to this being critiqued and a better solution provided.
package example
import better.files.{File, FileMonitor}
import cats.implicits._
import com.monovore.decline._
import cats.effect.IO
import java.nio.file.{Files, Path}
import java.util.concurrent.LinkedBlockingQueue
import scala.concurrent.ExecutionContext.Implicits.global
object Hello extends CommandApp(
name = "cats-effects-playground",
header = "welcome",
main = {
val filesOpts = Opts.options[Path]("input", help = "input files")
filesOpts.map { files =>
val bq: LinkedBlockingQueue[IO[File]] = new LinkedBlockingQueue()
val watchers = files.map { path =>
new FileMonitor(path, recursive = false) {
override def onModify(file: File, count: Int) = bq.put(IO(file))
}
}
def ioLoop(): IO[Unit] = bq.take()
.flatMap(f => IO(println(f)))
.flatMap(_ => ioLoop())
watchers.toList.foreach(_.start)
ioLoop.unsafeRunSync
}
}
)
I am trying to read lines from a text file into a foreach loop but I keep getting this error: value getLines is not a member of org.xml.sax.InputSource. Can someone explain what this error means, and how I can resolve it?
import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
Source.fromFile(noDupFile).getLines().par.foreach((res:String){
//....
})
import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
io.Source.fromFile(noDupFile).getLines().foreach(res => {
//....
})
without the right import you are refering to org.xml.sax.InputSource
io.Source.fromFile returns a io.BufferedSource and does not have the par method (defined on Parallelizable).
You can write
import scala.xml._
import collection.mutable.HashMap
val noDupFile="nodup_steam_out.txt"
io.Source.fromFile(noDupFile).getLines().toStream.par.foreach(res => {
//....
})
You seem to be trying to use the scala.xml library to iterate over you txt file. If you import scala.io._ instead, you should be able to do something like:
import scala.io._
val noDupFile="nodup_steam_out.txt"
Source.fromFile(noDupFile).getLines().foreach{ res =>
//....
}
I have copied Spray Client's example code into my own project, to have it easily available. I use IntelliJ 13.
Here is the code I have:
package mypackage
import scala.util.Success
import scala.concurrent.duration._
import akka.actor.ActorSystem
import akka.pattern.ask
import akka.event.Logging
import akka.io.IO
import spray.json.{JsonFormat, DefaultJsonProtocol}
import spray.can.Http
import spray.util._
import spray.client.pipelining._
import scala.util.Success
import scala.util.Failure
case class Elevation(location: Location, elevation: Double)
case class Location(lat: Double, lng: Double)
case class GoogleApiResult[T](status: String, results: List[T])
object ElevationJsonProtocol extends DefaultJsonProtocol {
implicit val locationFormat = jsonFormat2(Location)
implicit val elevationFormat = jsonFormat2(Elevation)
implicit def googleApiResultFormat[T :JsonFormat] = jsonFormat2(GoogleApiResult.apply[T])
}
object SprayExample extends App {
// we need an ActorSystem to host our application in
implicit val system = ActorSystem("simple-spray-client")
import system.dispatcher // execution context for futures below
val log = Logging(system, getClass)
log.info("Requesting the elevation of Mt. Everest from Googles Elevation API...")
val pipeline = sendReceive ~> unmarshal[GoogleApiResult[Elevation]]
val responseFuture = pipeline {
Get("http://maps.googleapis.com/maps/api/elevation/json?locations=27.988056,86.925278&sensor=false")
}
responseFuture onComplete {
case Success(GoogleApiResult(_, Elevation(_, elevation) :: _)) =>
log.info("The elevation of Mt. Everest is: {} m", elevation)
shutdown()
case Success(somethingUnexpected) =>
log.warning("The Google API call was successful but returned something unexpected: '{}'.", somethingUnexpected)
shutdown()
case Failure(error) =>
log.error(error, "Couldn't get elevation")
shutdown()
}
def shutdown(): Unit = {
IO(Http).ask(Http.CloseAll)(1.second).await
system.shutdown()
}
}
As it stands, this works perfectly and it prints the height of Mt.Everest as expected.
The strange thing happens if I move the file down one level in the package structure, that is I create a mypackage.myinnerpackage and move the file inside it.
IDEA changes my first line of code into package mypackage.myinnerpackage and that's it.
Then I try to run the app and the compilation will fail with the following message:
could not find implicit value for evidence parameter of type spray.httpx.unmarshalling.FromResponseUnmarshaller[courserahelper.sprayexamples.GoogleApiResult[courserahelper.sprayexamples.Elevation]]
val pipeline = sendReceive ~> unmarshal[GoogleApiResult[Elevation]]
^
I did not change anything in the code, I effectively just changed the package! Additionally, this code is self contained, it does not rely on any other implicit I declared in any other part of my code....
What am I missing?
Thanks!
(Replaced the comment by this answer which supports proper formatting.)
The code you posted is missing these two imports before the usage of unmarshal:
import ElevationJsonProtocol._
import SprayJsonSupport._
val pipeline = sendReceive ~> unmarshal[GoogleApiResult[Elevation]]
which exist in the original code. IntelliJ is sometimes messing with imports so that may be the reason they got lost in the transition.
You need to provide a Json Formatter for your case class.
case class Foo(whatever: Option[String])
object FooProtocol extends DefaultJsonProtocol {
implicit val fooJsonFormat = jsonFormat1(Foo)
}
Then include the following near the implementation...
import SprayJsonSupport._
import co.xxx.FooProtocol._