Flink ES connection Not compiling as expected - scala

My problem is somewhat as described here.
Part of Code (actually took from apache site) is as below
val httpHosts = new java.util.ArrayList[HttpHost]
httpHosts.add(new HttpHost("127.0.0.1", 9200, "http"))
httpHosts.add(new HttpHost("10.2.3.1", 9200, "http"))
val esSinkBuilder = new ElasticsearchSink.Builder[String](
httpHosts,
new ElasticsearchSinkFunction[String] {
def createIndexRequest(element: String): IndexRequest = {
val json = new java.util.HashMap[String, String]
json.put("data", element)
return Requests.indexRequest()
.index("my-index")
.`type`("my-type")
.source(json)
If I add these three statements, I am getting error as below
import org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkFunction
import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer
import org.apache.flink.streaming.connectors.elasticsearch6.ElasticsearchSink
Error I am getting
object elasticsearch is not a member of package org.apache.flink.streaming.connectors
object elasticsearch6 is not a member of package org.apache.flink.streaming.connectors
If I do not add those import statements, I get error as below
Compiling 1 Scala source to E:\sar\scala\practice\readstbdata\target\scala-2.11\classes ...
[error] E:\sar\scala\practice\readstbdata\src\main\scala\example\readcsv.scala:35:25: not found: value ElasticsearchSink
[error] val esSinkBuilder = new ElasticsearchSink.Builder[String](
[error] ^
[error] E:\sar\scala\practice\readstbdata\src\main\scala\example\readcsv.scala:37:7: not found: type ElasticsearchSinkFunction
[error] new ElasticsearchSinkFunction[String] {
[error] ^
[error] two errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 1 s, completed 10 Feb, 2020 2:15:04 PM
Stackflow question I referred above, some function has been extended. My understanding is, flink.streaming.connectors.elasticsearch have to be extended into REST libraries. 1) Is my understanding correct 2) if Yes, can I have complete extensions 3)If my understanding is wrong, please give me a solution.
Note: I added the following statements in build.sbt
libraryDependencies += "org.elasticsearch.client" % "elasticsearch-rest-high-level-client" % "7.5.2" ,
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "7.5.2",

The streaming connectors are not part of the flink binary distribution. You have to package them with your application.
For elasticsearch6 you need to add flink-connector-elasticsearch6_2.11, which you can do as
libraryDependencies += "org.apache.flink" %% "flink-connector-elasticsearch6" % "1.6.0"
Once this jar is part of your build, then the compiler will find the missing components. However, I don't know if this ES6 client will work with version 7.5.2.

Flink Elasticsearch Connector 7
Please look at the working and detailed answer which I have provided here, Which is written in Scala.

Related

How to use Flink's KafkaSource with Scala in 2022

I've checked out this similar but 7 year old question but it does not apply to newer Flink versions.
I'm trying to get a simple Flink Kafka job running and have tried various versions getting different compile errors for each. I'm using sbt to manage my dependencies:
val flinkDependencies = Seq(
"org.apache.flink" %% "flink-clients" % flinkVersion,
"org.apache.flink" %% "flink-scala" % flinkVersion,
"org.apache.flink" %% "flink-streaming-scala" % flinkVersion,
"org.apache.flink" %% "flink-connector-kafka" % flinkVersion
)
Versions tried:
scala 2.11.12 and 2.12.15
flink 1.14.6
The code I'm trying to compile (relevant bits):
import org.apache.flink.streaming.util.serialization.SimpleStringSchema
import org.apache.flink.connector.kafka.source.KafkaSource
...
val env = ExecutionEnvironment.getExecutionEnvironment
val kafkaConsumer = new KafkaSource.builder[String]
.setBootstrapservers("localhost:9092")
.setGroupId("flink")
.setTopics("test")
.build()
val text = env.fromSource(kafkaConsumer)
I did not find an official example that this is indeed how one is supposed to use the KafkaSource but I found this setup here and here. To my still very new Java eyes this looks aligned with the API docs. But yeah can't get it to work with either Scala version:
[error] somepathwithmyfile: type builder is not a member of object org.apache.flink.connector.kafka.source.KafkaSource
[error] val kafkaConsumer = new KafkaSource.builder[String]
[error] ^
[error] somepathwithmyfile: value fromSource is not a member of org.apache.flink.api.scala.ExecutionEnvironment
[error] val text = env.fromSource(kafkaConsumer)
[error] ^
[error] two errors found
For the first problem, drop the new:
val kafkaConsumer = KafkaSource.builder[String]
...
For the second problem, fromSource requires three arguments:
/** Create a DataStream using a [[Source]]. */
#Experimental
def fromSource[T: TypeInformation](
source: Source[T, _ <: SourceSplit, _],
watermarkStrategy: WatermarkStrategy[T],
sourceName: String): DataStream[T] = {
val typeInfo = implicitly[TypeInformation[T]]
asScalaStream(javaEnv.fromSource(source, watermarkStrategy, sourceName, typeInfo))
}
Also, note that Flink does not (yet) support scala 2.12.15. See https://issues.apache.org/jira/browse/FLINK-20969. However, Flink 1.15 can be used with newer versions of Scala (including Scala 3), if you exclude Flink's built-in scala API support. See https://flink.apache.org/2022/02/22/scala-free.html for more on this.

Can't find SttpBackends + "Error occurred in an application involving default arguments."

I'm trying to create a extremely simple Telegram bot in Scala using bot4s. I'm pretty much following the example there. Here's the code:
package info.jjmerelo.BoBot
import cats.instances.future._
import cats.syntax.functor._
import com.bot4s.telegram.api.RequestHandler
import com.bot4s.telegram.api.declarative.Commands
import com.bot4s.telegram.clients.{FutureSttpClient, ScalajHttpClient}
import com.bot4s.telegram.future.{Polling, TelegramBot}
import scala.util.Try
import scala.concurrent.Future
import com.typesafe.scalalogging.Logger
object BoBot extends TelegramBot
with Polling
with Commands[Future] {
implicit val backend = SttpBackends.default
def token = sys.env("BOBOT_TOKEN")
override val client: RequestHandler[Future] = new FutureSttpClient(token)
val log = Logger("BoBot")
// val lines = scala.io.Source.fromFile("hitos.json").mkString
// val hitos = JSON.parseFull( lines )
// val solo_hitos = hitos.getOrElse( hitos )
onCommand("hey") { implicit msg =>
log.info("Hello")
reply("Conseguí que funcionara").void
}
}
And here's the build.sbt
name := "bobot"
version := "0.0.1"
organization := "info.jjmerelo"
libraryDependencies += "com.bot4s" %% "telegram-core" % "4.4.0-RC2"
val circeVersion = "0.12.3"
libraryDependencies ++= Seq(
"io.circe" %% "circe-core",
"io.circe" %% "circe-generic",
"io.circe" %% "circe-parser"
).map(_ % circeVersion)
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.9.2"
retrieveManaged := true
Circe is for later
Anyway, I managed to compile most of it, but I still get these two errors:
[info] compiling 2 Scala sources to /home/jmerelo/Asignaturas/cloud-computing/BoBot/target/scala-2.12/classes ...
[error] /home/jmerelo/Asignaturas/cloud-computing/BoBot/src/main/scala/info/jjmerelo/BoBot.scala:21:26: not found: value SttpBackends
[error] implicit val backend = SttpBackends.default
[error] ^
[error] /home/jmerelo/Asignaturas/cloud-computing/BoBot/src/main/scala/info/jjmerelo/BoBot.scala:23:49: could not find implicit value for parameter backend: com.softwaremill.sttp.SttpBackend[scala.concurrent.Future,Nothing]
[error] Error occurred in an application involving default arguments.
[error] override val client: RequestHandler[Future] = new FutureSttpClient(token)
[error] ^
[error] two errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 5 s, completed 11 nov. 2020 8:19:38
I can't figure out either of the two. SttpBackends is missing, that's clear, but there's nothing in the example that indicates it's needed, or, for that matter, what library should be included. The second one about the default arguments I simply can't figure it out, even if I define token as String or if I change def to val. Any idea?
Your error messages is associated with each other.
First error tells us that compiler couldn't find object SttpBackends which has field of SttpBackend.
The second one tells us that compiler couldn't find implicit backend: SttpBackend for constructing FutureSttpClient. It requires two implicits: SttpBackend and ExecutionContext.
class FutureSttpClient(token : _root_.scala.Predef.String,
telegramHost : _root_.scala.Predef.String = { /* compiled code */ })
(implicit backend : com.softwaremill.sttp.SttpBackend[scala.concurrent.Future, scala.Nothing],
ec : scala.concurrent.ExecutionContext)
extends com.bot4s.telegram.clients.SttpClient[scala.concurrent.Future] {...}
You can create it by yourself as in bot4s examples.
If you will try to find SttpBackends object in bot4s library you would found this code in bot4s examples:
import com.softwaremill.sttp.okhttp._
object SttpBackends {
val default: SttpBackend[Future, Nothing] = OkHttpFutureBackend()
}
add this object to your project to make it compilable.

Hazelcast server with scala client issue

I am trying to setup hazelcast server and client on my local machine. I am also trying to connect to local Hazelcast server by scala-client.
For server I used below code,
import com.hazelcast.config._
import com.hazelcast.Scala._
object HazelcastServer {
def main(args: Array[String]): Unit = {
val conf = new Config
serialization.Defaults.register(conf.getSerializationConfig)
serialization.DynamicExecution.register(conf.getSerializationConfig)
val hz = conf.newInstance()
val cmap = hz.getMap[String, String]("test")
cmap.put("a","A")
cmap.put("b","B")
}
}
and hazelcast client as,
import com.hazelcast.Scala._
import client._
import com.hazelcast.client._
import com.hazelcast.config._
object Hazelcast_Client {
def main(args:Array[String]): Unit = {
val conf = new Config
serialization.Defaults.register(conf.getSerializationConfig)
serialization.DynamicExecution.register(conf.getSerializationConfig)
val hz = conf.newClient()
val cmap = hz.getMap("test")
println(cmap.size())
}
}
In my build.sbt,
libraryDependencies += "com.hazelcast" % "hazelcast" % "3.7.2"
libraryDependencies += "com.hazelcast" %% "hazelcast-scala" % "3.7.2"
I am getting below error and stuck in dependency issues.
Symbol 'type <none>.config.ClientConfig' is missing from the classpath.
[error] This symbol is required by 'value com.hazelcast.Scala.client.package.conf'.
[error] Make sure that type ClientConfig is in your classpath and check for conflicting dependencies with `-Ylog-classpath`.
[error] A full rebuild may help if 'package.class' was compiled against an incompatible version of <none>.config.
[error] val conf = new Config
I referred hazelcast documentation. I am not able to find any good hazelcast scala examples to understand the setup and to start playing with. If anybody can help in solving this issue, or share really good scala examples that would be helpful.
I've done a Scala+Akka Hazelcast before. My build.sbt included
libraryDependencies += "com.hazelcast" % "hazelcast-all" % "3.7.2"
I seem to remember that hazelcast-all was required.

No RowReaderFactory can be found for this type error when trying to map Cassandra row to case object using spark-cassandra-connector

I am trying to get a simple example working mapping rows from Cassandra to a scala case class using Apache Spark 1.1.1, Cassandra 2.0.11, & the spark-cassandra-connector (v1.1.0). I have reviewed the documentation at the spark-cassandra-connector github page, planetcassandra.org, datastax, and generally searched around; but have not found anyone else encountering this issue. So here goes...
Building a tiny spark application using sbt (0.13.5), scala 2.10.4, spark 1.1.1 against Cassandra 2.0.11. Modelling the example from the spark-cassandra-connector docs the following two lines present an error in my IDE and fail to compile.
case class SubHuman(id:String, firstname:String, lastname:String, isGoodPerson:Boolean)
val foo = sc.cassandraTable[SubHuman]("nicecase", "human").select("id","firstname","lastname","isGoodPerson").toArray
The simple error presented by eclipse is:
No RowReaderFactory can be found for this type
The compile error is only slightly more verbose:
> compile
[info] Compiling 1 Scala source to /home/bkarels/dev/simple-case/target/scala-2.10/classes...
[error] /home/bkarels/dev/simple-case/src/main/scala/com/bradkarels/simple/SimpleApp.scala:82: No RowReaderFactory can be found for this type
[error] val foo = sc.cassandraTable[SubHuman]("nicecase", "human").select("id","firstname","lastname","isGoodPerson").toArray
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
[error] Total time: 1 s, completed Dec 10, 2014 9:01:30 AM
>
Scala source:
package com.bradkarels.simple
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._
import com.datastax.spark.connector.rdd._
// Likely don't need this import - but throwing darts hits the bullseye once in a while...
import com.datastax.spark.connector.rdd.reader.RowReaderFactory
object CaseStudy {
def main(args: Array[String]) {
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("spark://127.0.0.1:7077", "simple", conf)
case class SubHuman(id:String, firstname:String, lastname:String, isGoodPerson:Boolean)
val foo = sc.cassandraTable[SubHuman]("nicecase", "human").select("id","firstname","lastname","isGoodPerson").toArray
}
}
With the bothersome lines removed, everything compiles fine, assembly works, and I can perform other Spark operations normally. For example, if I remove the problem lines and drop in:
val rdd:CassandraRDD[CassandraRow] = sc.cassandraTable("nicecase", "human")
I get back the RDD and work with it as expected. That said, I suspect that my sbt project, assembly plugin, etc. are not contributing to the issues. The working source (less the new attempt to map to a case class as the connector as intended) can be found on github here.
But, to be more thorough, my build.sbt:
name := "Simple Case"
version := "0.0.1"
scalaVersion := "2.10.4"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.1.1",
"org.apache.spark" %% "spark-sql" % "1.1.1",
"com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0" withSources() withJavadoc()
)
So the question is what have I missed? Hoping this is something silly, but if you have encountered this and can help me get past this puzzling little issue I would very much appreciate it. Please let me know if there are any other details that would be helpful in troubleshooting.
Thank you.
This may be my newness with Scala in general, but I resolved this issue by moving the case class declaration out of the main method. So the simplified source now looks like this:
package com.bradkarels.simple
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._
import com.datastax.spark.connector.rdd._
object CaseStudy {
case class SubHuman(id:String, firstname:String, lastname:String, isGoodPerson:Boolean)
def main(args: Array[String]) {
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
val sc = new SparkContext("spark://127.0.0.1:7077", "simple", conf)
val foo = sc.cassandraTable[SubHuman]("nicecase", "human").select("id","firstname","lastname","isGoodPerson").toArray
}
}
The complete source (updated & fixed) can be found on github https://github.com/bradkarels/spark-cassandra-to-scala-case-class

Reading file contents with casbah gridfs throws MalformedInputException

Consider the following sample code: it writes a file to mongodb and then tries to reread it
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.gridfs.Imports._
object TestGridFS{
def main(args: Array[String]){
val mongoConn = MongoConnection()
val mongoDB = mongoConn("gridfs_test")
val gridfs = GridFS(mongoDB) // creates a GridFS handle on ``fs``
val xls = new java.io.FileInputStream("ok.xls")
val savedFile=gridfs.createFile(xls)
savedFile.filename="ok.xls"
savedFile.save
println("savedfile id: %s".format(savedFile._id.get))
val file=gridfs.findOne(savedFile._id.get)
val bytes=file.get.source.map(_.toByte).toArray
println(bytes)
}
}
this yields
gridfs $ sbt run
[info] Loading global plugins from /Users/jean/.sbt/plugins
[info] Set current project to gridfs-test (in build file:/Users/jean/dev/sdev/src/perso/gridfs/)
[info] Running TestGridFS
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
savedfile id: 504c8cce0364a7cd145d5dc1
[error] (run-main) java.nio.charset.MalformedInputException: Input length = 1
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.read(BufferedReader.java:157)
at scala.io.BufferedSource$$anonfun$iter$1$$anonfun$apply$mcI$sp$1.apply$mcI$sp(BufferedSource.scala:38)
at scala.io.Codec.wrap(Codec.scala:64)
at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
at scala.io.BufferedSource$$anonfun$iter$1.apply(BufferedSource.scala:38)
at scala.collection.Iterator$$anon$14.next(Iterator.scala:148)
at scala.collection.Iterator$$anon$25.hasNext(Iterator.scala:463)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:334)
at scala.io.Source.hasNext(Source.scala:238)
at scala.collection.Iterator$$anon$19.hasNext(Iterator.scala:334)
at scala.collection.Iterator$class.foreach(Iterator.scala:660)
at scala.collection.Iterator$$anon$19.foreach(Iterator.scala:333)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:99)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)
at scala.collection.Iterator$$anon$19.toBuffer(Iterator.scala:333)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)
at scala.collection.Iterator$$anon$19.toArray(Iterator.scala:333)
at TestGridFS$.main(test.scala:15)
at TestGridFS.main(test.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
java.lang.RuntimeException: Nonzero exit code: 1
at scala.sys.package$.error(package.scala:27)
[error] {file:/Users/jean/dev/sdev/src/perso/gridfs/}default-b6ab90/compile:run: Nonzero exit code: 1
[error] Total time: 1 s, completed 9 sept. 2012 14:34:22
I don't understand what the charset problem can be, I just wrote the file to the database. when querying the base I DO see the files and chunks in there, but can't seem to be able to read them.
I tried this with mongo 2.0 and 2.2, casbah 2.4 and 3.0.0-M2 to no avail, and don't see what I could do to get the bytes, on mac OSX mountain lion.
PS: To run the test, you can use the following build.sbt
name := "gridfs-test"
version := "1.0"
scalaVersion := "2.9.1"
libraryDependencies += "org.mongodb" %% "casbah" % "2.4.1"
libraryDependencies += "org.mongodb" %% "casbah-gridfs" % "2.4.1"
resolvers ++= Seq("Typesafe Releases" at "http://repo.typesafe.com/typesafe/releases/",
"sonatype release" at "https://oss.sonatype.org/content/repositories/releases",
"OSS Snapshots" at "https://oss.sonatype.org/content/repositories/snapshots/")
Here is the stacktrace I get :
I found a way to read the file contents back from mongodb. The source method relies on underlying.inpustream which is defined in GridFSDBFile.
Every test I did which uses underlying.inpustream failed with the same error.
However, the API proposes another way to access the files : writeTo. writeTo does not use underlying.inpustream.
Here is the "fixed" code from the question :
import com.mongodb.casbah.Imports._
import com.mongodb.casbah.gridfs.Imports._
object TestGridFS{
def main(args: Array[String]){
val mongoConn = MongoConnection()
val mongoDB = mongoConn("gridfs_test")
val gridfs = GridFS(mongoDB) // creates a GridFS handle on ``fs``
val xls = new java.io.File("ok.xls")
val savedFile=gridfs.createFile(xls)
savedFile.filename="ok.xls"
savedFile.save
println("savedfile id: %s".format(savedFile._id.get))
val file=gridfs.findOne(savedFile._id.get)
val byteArrayOutputStream = new java.io.ByteArrayOutputStream()
file.map(_.writeTo(byteArrayOutputStream))
byteArrayOutputStream.toByteArray
}
}
the last line, byteArrayOutputStream.toByteArray gives you an array of bytes which can then be used however you see fit.