How to call batchWriteItemAsync to load DynamoDB in scala/spark - scala

I am trying to run batchwriteAsync to load data into DynamoDB by Scala(Spark), but meet error below when it is run, may I ask how to process batchwriteAsync correctly in Scala?
Code
import com.amazonaws.services.dynamodbv2.model._
import com.amazonaws.services.dynamodbv2.{AmazonDynamoDBAsyncClientBuilder, AmazonDynamoDBAsync, AmazonDynamoDBClientBuilder, AmazonDynamoDB}
import java.util.concurrent.Executors
import scala.concurrent.{ExecutionContext,Future}
val POOL_SIZE = 3
val client: AmazonDynamoDBAsync = AmazonDynamoDBAsyncClientBuilder.standard().withRegion(REGION).build()
// init threads pool
val jobExecutorPool = Executors.newFixedThreadPool(POOL_SIZE)
// create the implicit ExecutionContext based on our thread pool
implicit val xc: ExecutionContext = ExecutionContext.fromExecutorService(jobExecutorPool)
var contentSeq = ...
val batchWriteItems: java.util.List[WriteRequest] = contentSeq.asJava
def batchWriteFuture(tableName: String, batchWriteItems: java.util.List[WriteRequest])(implicit xc: ExecutionContext): Future[BatchWriteItemResult] = {
client.batchWriteItemAsync(
(new BatchWriteItemRequest()).withRequestItems(Map(tableName -> batchWriteItems).asJava)
)
}
batchWriteFuture(tableName, batchWriteItems)
Error:
error: type mismatch;
found : java.util.concurrent.Future[com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult]
required: scala.concurrent.Future[com.amazonaws.services.dynamodbv2.model.BatchWriteItemResult]
client.batchWriteItemAsync(

client.batchWriteItemAsync is a Java API so it returns a java.util.concurrent.Future
Your method batchWriteFuture has a return type of scala.concurrent.Future
Try converting your DynamoDB call to scala like
def batchWriteFuture(tableName: String, batchWriteItems: java.util.List[WriteRequest])(implicit xc: ExecutionContext): Future[BatchWriteItemResult] = {
client.batchWriteItemAsync(
(new BatchWriteItemRequest()).withRequestItems(Map(tableName -> batchWriteItems).asJava)
).asScala // <---------- converting to scala future
}

Related

ReactiveMongo with generics: "ambiguous implicit values"

I'm not really sure where this error is coming from. I'm trying to migrate from an old .find() method where I could pass Pattern: query: BSONDocument to find and call .one[T] on it. This worked perfectly fine, but I'd like to follow the recommendations.
However, I have a feeling that my desire to create a generic database access object is causing some of this headache... take a look below.
The object that i'm trying to read from:
package Models
import reactivemongo.api.bson.{BSONDocumentHandler, Macros}
case class Person(name: String)
object Person {
implicit val handler: BSONDocumentHandler[Person] = Macros.handler[Person]
}
The class containing the method I'm trying to execute (ignore the Await, etc.)
package Data
import reactivemongo.api.AsyncDriver
import reactivemongo.api.bson._
import reactivemongo.api.bson.collection.BSONCollection
import reactivemongo.api.commands.WriteResult
import scala.concurrent.{Await, Future}
import scala.concurrent.duration._
import scala.reflect.ClassTag
object DataService extends DataService
trait DataService {
implicit val ec: scala.concurrent.ExecutionContext = scala.concurrent.ExecutionContext.global
private val driver = new AsyncDriver
private val connection = Await.result(driver.connect(List("localhost:32768")), 10.seconds)
private val database = Await.result(connection.database("database"), 10.seconds)
private def getCollection[T]()(implicit tag: ClassTag[T]): BSONCollection = database.collection(tag.runtimeClass.getSimpleName)
def Get[T]()(implicit handler: BSONDocumentHandler[T], tag: ClassTag[T]): Future[Option[T]] = {
val collection = getCollection[T]
collection.find(BSONDocument("name" -> "hello"), Option.empty).one[T]
}
}
How I'm trying to get it (ignore the Await, this is just for test purposes):
val personName = Await.result(dataService.Get[Person](), 10.seconds).map(_.name).getOrElse("Not found")
The error that I'm getting:
Error:(32, 19) ambiguous implicit values:
both object BSONDocumentIdentity in trait BSONIdentityHandlers of type reactivemongo.api.bson.package.BSONDocumentIdentity.type
and value handler of type reactivemongo.api.bson.BSONDocumentHandler[T]
match expected type collection.pack.Writer[J]
collection.find(BSONDocument("name" -> "hello"), Option.empty).cursor[T]
Thank you all for your help.
Try to add type to your option in your second argument find method, for example:
def Get[T]()(implicit handler: BSONDocumentHandler[T], tag: ClassTag[T]): Future[Option[T]] = {
val collection = getCollection[T]
collection.find(BSONDocument("name" -> "hello"), Option.empty[BSONDocument]).one[T]
}

udf No TypeTag available for type string

I don't understand a behavior of spark.
I create an udf which returns an Integer like below
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object Show {
def main(args: Array[String]): Unit = {
val (sc,sqlContext) = iniSparkConf("test")
val testInt_udf = sqlContext.udf.register("testInt_udf", testInt _)
}
def iniSparkConf(appName: String): (SparkContext, SQLContext) = {
val conf = new SparkConf().setAppName(appName)//.setExecutorEnv("spark.ui.port", "4046")
val sc = new SparkContext(conf)
sc.setLogLevel("WARN")
val sqlContext = new SQLContext(sc)
(sc, sqlContext)
}
def testInt() : Int= {
return 2
}
}
I work perfectly but if I change the return type of method test from Int to String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
def testString() : String = {
return "myString"
}
I get the following error
Error:(34, 43) No TypeTag available for String
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
Error:(34, 43) not enough arguments for method register: (implicit evidence$1: reflect.runtime.universe.TypeTag[String])org.apache.spark.sql.UserDefinedFunction.
Unspecified value parameter evidence$1.
val testString_udf = sqlContext.udf.register("testString_udf", testString _)
here are my embedded jars:
datanucleus-api-jdo-3.2.6
datanucleus-core-3.2.10
datanucleus-rdbms-3.2.9
spark-1.6.1-yarn-shuffle
spark-assembly-1.6.1-hadoop2.6.0
spark-examples-1.6.1-hadoop2.6.0
I am a little bit lost... Do you have any idea?
Since I can't reproduce the issue copy-pasting just your example code into a new file, I bet that in your real code String is actually shadowed by something else. To verify this theory you can try to change you signature to
def testString() : scala.Predef.String = {
return "myString"
}
or
def testString() : java.lang.String = {
return "myString"
}
If this one compiles, search for "String" to see how you shadowed the standard type. If you use IntelliJ Idea, you can try to use "Ctrl+B" (GoTo) to find it out. The most obvious candidate is that you used String as a name of generic type parameter but there might be some other choices.

Code working in Spark-Shell not in eclipse

I have a small Scala code which works properly on Spark-Shell but not in Eclipse with Scala plugin. I can access hdfs using plugin tried writing another file and it worked..
FirstSpark.scala
package bigdata.spark
import org.apache.spark.SparkConf
import java. io. _
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object FirstSpark {
def main(args: Array[String])={
val conf = new SparkConf().setMaster("local").setAppName("FirstSparkProgram")
val sparkcontext = new SparkContext(conf)
val textFile =sparkcontext.textFile("hdfs://pranay:8020/spark/linkage")
val m = new Methods()
val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x))
q.saveAsTextFile("hdfs://pranay:8020/output") }
}
Methods.scala
package bigdata.spark
import java.util.function.ToDoubleFunction
class Methods {
def isHeader(s:String):Boolean={
s.contains("id_1")
}
def parse(line:String) ={
val pieces = line.split(',')
val id1=pieces(0).toInt
val id2=pieces(1).toInt
val matches=pieces(11).toBoolean
val mapArray=pieces.slice(2, 11).map(toDouble)
MatchData(id1,id2,mapArray,matches)
}
def toDouble(s: String) = {
if ("?".equals(s)) Double.NaN else s.toDouble
}
}
case class MatchData(id1: Int, id2: Int,
scores: Array[Double], matched: Boolean)
Error Message:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2032)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:335)
at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:334)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
Can anyone please help me with this
Try changing class Methods { .. } to object Methods { .. }.
I think the problem is at val q =textFile.filter(x => !m.isHeader(x)).map(x=> m.parse(x)). When Spark sees the filter and map functions it tries to serialize the functions passed to them (x => !m.isHeader(x) and x=> m.parse(x)) so that it can dispatch the work of executing them to all of the executors (this is the Task referred to). However, to do this, it needs to serialize m, since this object is referenced inside the function (it is in the closure of the two anonymous methods) - but it cannot do this since Methods is not serializable. You could add extends Serializable to the Methods class, but in this case an object is more appropriate (and is already Serializable).

Op-Rabbit with Spray-Json in Akka Http

I am trying to use the library Op-Rabbit to consume a RabbitMQ queue in an Akka-Http project.
I want to use Spray-Json for the marshalling/ un marshalling.
import com.spingo.op_rabbit.SprayJsonSupport._
import com.spingo.op_rabbit.stream.RabbitSource
import com.spingo.op_rabbit.{Directives, RabbitControl}
object Boot extends App with Config with BootedCore with ApiService {
this: ApiService with Core =>
implicit val materializer = ActorMaterializer()
Http().bindAndHandle(routes, httpInterface, httpPort)
log.info("Http Server started")
implicit val rabbitControl = system.actorOf(Props[RabbitControl])
import Directives._
RabbitSource(
rabbitControl,
channel(qos = 3),
consume(queue(
"such-queue",
durable = true,
exclusive = false,
autoDelete = false)),
body(as[User])).
runForeach { user =>
log.info(user)
} // after each successful iteration the message is acknowledged.
}
In a separate file:
case class User(id: Long,name: String)
object JsonFormat extends DefaultJsonProtocol {
implicit val format = jsonFormat2(User)
}
The error I am getting is:
could not find implicit value for parameter um: akka.http.scaladsl.unmarshalling.FromRequestUnmarshaller[*.*.models.User]
[error] body(as[User])). // marshalling is automatically hooked up using implicits
[error] ^
[error]could not find implicit value for parameter um: com.spingo.op_rabbit.RabbitUnmarshaller[*.*.models.User]
[error] body(as[User])
[error] ^
[error] two errors found
Im not sure how to get the op-rabbit spray-json support working properly.
Thanks for any help.
Try to provide an implicit marshaller for your User class like they do it for Int (in RabbitTestHelpers.scala):
implicit val simpleIntMarshaller = new RabbitMarshaller[Int] with RabbitUnmarshaller[Int] {
val contentType = "text/plain"
val contentEncoding = Some("UTF-8")
def marshall(value: Int) =
value.toString.getBytes
def unmarshall(value: Array[Byte], contentType: Option[String], charset: Option[String]) = {
new String(value).toInt
}
}

How do I turn a Scala case class into a mongo Document

I'd like to build a generic method for transforming Scala Case Classes to Mongo Documents.
A promising Document constructor is
fromSeq(ts: Seq[(String, BsonValue)]): Document
I can turn a case class into a Map[String -> Any], but then I've lost the type information I need to use the implicit conversions to BsonValues. Maybe TypeTags can help with this?
Here's what I've tried:
import org.mongodb.scala.bson.BsonTransformer
import org.mongodb.scala.bson.collection.immutable.Document
import org.mongodb.scala.bson.BsonValue
case class Person(age: Int, name: String)
//transform scala values into BsonValues
def transform[T](v: T)(implicit transformer: BsonTransformer[T]): BsonValue = transformer(v)
// turn any case class into a Map[String, Any]
def caseClassToMap(cc: Product) = {
val values = cc.productIterator
cc.getClass.getDeclaredFields.map( _.getName -> values.next).toMap
}
// transform a Person into a Document
def personToDocument(person: Person): Document = {
val map = caseClassToMap(person)
val bsonValues = map.toSeq.map { case (key, value) =>
(key, transform(value))
}
Document.fromSeq(bsonValues)
}
<console>:24: error: No bson implicit transformer found for type Any. Implement or import an implicit BsonTransformer for this type.
(key, transform(value))
def personToDocument(person: Person): Document = {
Document("age" -> person.age, "name" -> person.name)
}
Below code works without manual conversion of an object.
import reactivemongo.api.bson.{BSON, BSONDocument, Macros}
case class Person(name:String = "SomeName", age:Int = 20)
implicit val personHandler = Macros.handler[Person]
val bsonPerson = BSON.writeDocument[Person](Person())
println(s"${BSONDocument.pretty(bsonPerson.getOrElse(BSONDocument.empty))}")
You can use Salat https://github.com/salat/salat. A nice example can be found here - https://gist.github.com/bhameyie/8276017. This is the piece of code that will help you -
import salat._
val dBObject = grater[Artist].asDBObject(artist)
artistsCollection.save(dBObject, WriteConcern.Safe)
I was able to serialize a case class to a BsonDocument using the org.bson.BsonDocumentWriter. The below code runs using scala 2.12 and mongo-scala-driver_2.12 version 2.6.0
My quest for this solution was aided by this answer (where they are trying to serialize in the opposite direction): Serialize to object using scala mongo driver?
import org.mongodb.scala.bson.codecs.Macros
import org.mongodb.scala.bson.codecs.DEFAULT_CODEC_REGISTRY
import org.bson.codecs.configuration.CodecRegistries.{fromRegistries, fromProviders}
import org.bson.codecs.EncoderContext
import org.bson.BsonDocumentWriter
import org.mongodb.scala.bson.BsonDocument
import org.bson.codecs.configuration.CodecRegistry
import org.bson.codecs.Codec
case class Animal(name : String, species: String, genus: String, weight: Int)
object TempApp {
def main(args: Array[String]) {
val jaguar = Animal("Jenny", "Jaguar", "Panthera", 190)
val codecProvider = Macros.createCodecProvider[Animal]()
val codecRegistry: CodecRegistry = fromRegistries(fromProviders(codecProvider), DEFAULT_CODEC_REGISTRY)
val codec = Macros.createCodec[Animal](codecRegistry)
val encoderContext = EncoderContext.builder.isEncodingCollectibleDocument(true).build()
var doc = BsonDocument()
val writr = new BsonDocumentWriter(doc) // need to call new since Java lib w/o companion object
codec.encode(writr, jaguar, encoderContext)
print(doc)
}
};