MongoDB Codec in Scala broken? - mongodb

I have an app written in Scala to access a Mongo database in which I can query and insert data perfectly fine.
However when I try to use collection.distinct("user") I get the error:
Unknown Error org.bson.codecs.configuration.CodecConfigurationException:
Can't find a codec for class scala.runtime.Nothing$.
After doing some googling I found that I would need to specify a codec, so I initalised the MongoClient with one:
val clusterSettings = ClusterSettings
.builder()
.hosts(scala.collection.immutable.List(new com.mongodb.ServerAddress(addr)).asJava)
.build()
val settings = MongoClientSettings.builder()
.codecRegistry(MongoClient.DEFAULT_CODEC_REGISTRY)
.clusterSettings(clusterSettings)
.build()
val mongoClient: MongoClient = MongoClient(settings)
Still no luck, got the same error.
I figured I would need to make a custom codec as per this web page:
class NothingCodec extends Codec[Nothing] {
override def encode(writer:BsonWriter, value:Nothing, encoderContext:EncoderContext)={}
override def decode(reader: BsonReader, decoderContext: DecoderContext): Nothing = {
throw new Exception
}
override def getEncoderClass(): Class[Nothing] = {
classOf[Nothing]
}
}
But this doesn't work nor doesn't make sense, Nothing is not a valid return type; there exist no instances of this type.
As a result this obvisouly does not work with a new error of Unknown Error java.lang.Exception(Obviously!)
Am I missing something with the Mongo Scala Driver? Or is it just broken?

Use Document instead:
collection.distinct[Document]("user")
// i.e.
case class User(name: String, id: String)
records.distinct[Document]("user")
.toFuture()
.map(_ map { doc => getUserFromBson(doc).get })
def getUserFromBson(doc: Document): Option[User] = {
for {
name <- doc.get("name") map { x => x.asString().getValue }
id <- doc.get("id") map { x => x.asString().getValue }
}yield(User(name, id))
}
hope this helps.

Related

TypeInformation in Flink

I have a pipeline in a place where data is being sent from Flink to Kafka topic in a JSON format. I was also able to get it from the Kafka topic and was able to get the JSON attributes as well. Now, like scala reflect classes where I can also compare the data type at runtime, I was trying to do the same thing in Fink using TypeInformation where I can set some predefined format and whatever data is being read from topic should go under this Validation and should be passed or failed accordingly.
I have a data like below:.
{"policyName":"String", "premium":2400, "eventTime":"2021-12-22 00:00:00" }
For my problem, I came across a couple of examples in Flink's book where it is mentioned how to create a TypeInformation variable but there was nothing mentioned on how to use it so I tried my way:
val objectMapper = new ObjectMapper()
val tupleType: TypeInformation[(String, String, String)] =
Types.TUPLE[(String, Int, String)]
println(tupleType.getTypeClass)
src.map(v => v)
.map { x =>
val policyName: String = objectMapper.readTree(x).get("policyName").toString()
val premium: Int = objectMapper.readTree(x).get("premium").toString().toInt
val eventTime: String = objectMapper.readTree(x).get("eventTime").toString()
if ((policyName, premium, eventTime)== tupleType.getTypeClass) {
println("Good Record: " + (policyName, premium, eventTime))
}
else {
println("Bad Record: " + (id, category, eventTime))
}
}
Now if I pass the input as below to the flink kafka producer:
{"policyName":"whatever you feel like","premium":"4000","eventTime":"2021-12-20 00:00:00"}
It should give me the expected output as a "Bad record" and the tuple since the datatype of premium is String and not Long/Int.
If a pass the input as below:
{"policyName":"whatever you feel like","premium":4000,"eventTime":"2021-12-20 00:00:00"}
It should give me the output as "Good Record" and the tuple
But according to my code, it is always giving me the else part.
If I create a datastream variable and store the results of the above map and then compare like below then it gives me the correct result:
if (tupleType == datas.getType()) { //where 'datas' is a datastream
print("Good Records")
} else {
println("Bad Records")
}
But I want to send the good/bad records to a different stream or maybe can directly be inserted in the Cassandra table. So, that is why I am using loops for identifying the records one by one. Is my way correct? What would be the best practice considering what I am trying to achieve?
Based on Dominik's inputs, I tried creating my ow CustomDeserializer class:
import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.flink.api.common.serialization.DeserializationSchema
import org.apache.flink.api.common.typeinfo.TypeInformation
import java.nio.charset.StandardCharsets
class sample extends DeserializationSchema[String] {
override def deserialize(message: Array[Byte]): Tuple3[Int, String, String] = {
val data = new String(message,
StandardCharsets.UTF_8)
val objectMapper = new ObjectMapper()
val id: Int = objectMapper.readTree(data).get("id").toString().toInt
val category: String = objectMapper.readTree(data).get("Category").toString()
val eventTime: String = objectMapper.readTree(data).get("eventTime").toString()
return (id, category, eventTime)
}
override def isEndOfStream(t: String): Boolean = ???
override def getProducedType: TypeInformation[String] = return TypeInformation.of(classOf[String])
}
I wanna try to implement something like below:
src.map(v => v)
.map { x =>
if (new sample().deserialize(x)==true) {
println("Good Record: " + (id, category, eventTime))
}
else {
println("Bad Record: " + (id, category, eventTime))
}
}
But the input is in Array[Bytes] form. So how can I implement it? Where exactly I am going wrong? What needs to be modified? This is my first ever attempt in Flink Scala custom classes.
Inputs Passed: Inputs
I don't really think that using TypeInformation to do what You want is best idea. You can simply use something like ProcessFunction that will accept a JSON String and then use the ObjectMapper to deserialize JSON to class with the expected structure. You can output the correctly deserialized objects from the ProcessFunction and the Strings that failed deserialization can be apassed as side output since they will be Your Bad Records.
This could look like below, note that this uses Jackson scala to perform deserialization to case class. You can find more info here
case class Premium(policyName: String, premium: Long, eventTime: String)
class Splitter extends ProcessFunction[String, Premium] {
val outputTag = new OutputTag[String]("failed")
def fromJson[T](json: String)(implicit m: Manifest[T]): Either[String, T] = {
Try {
lazy val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
mapper.readValue[T](json)
} match {
case Success(x) => Right(x)
case Failure(err) => {
Left(json)
}
}
}
override def processElement(i: String, context: ProcessFunction[String, Premium]#Context, collector: Collector[Premium]): Unit = {
fromJson(i) match {
case Right(data) => collector.collect(data)
case Left(json) => context.output(outputTag, json)
}
}
}
Then You can use the outputTag to get the side output data from the stream to get incorrect records.

Assign dynamically injected database name in Play Slick

I have the following Play Slick DAO class. Note that the database configuration is a constant control0001. The DAO has a function readUser that reads a user based on its user id:
class UsersDAO #Inject()(#NamedDatabase("control0001")
protected val dbConfigProvider: DatabaseConfigProvider)
extends HasDatabaseConfigProvider[JdbcProfile] {
import driver.api._
def readUser (userid: String) = {
val users = TableQuery[UserDB]
val action = users.filter(_.userid === userid).result
val future = db.run(action.asTry)
future.map{
case Success(s) =>
if (s.length>0)
Some(s(0))
else
None
case Failure(e) => throw new Exception ("Failure in readUser: " + e.getMessage)
}
}
}
Instead of having a constant in #NamedDatabase("control0001"), I need the database to be variable. In the application, I have multiple databases (control0001, control002 and so on) configured in application.conf. Depending on a variable value, I need to determine the database to be used in the DAO. All the databases are similar and have the same tables (the data in each database differs).
The following Play class calls the DAO function, but first it needs to determine the database name to be injected:
class TestSlick #Inject()(dao: UsersDAO) extends Controller {
def test(someCode: Int, userId: String) = Action { request =>
val databaseName = if (someCode == 1) "control0001" else "control0002"
// Run the method in UsersDAO accessing the database set by databaseName
val future = dao.readUser(userId)
future.map { result =>
result match {
case Some(user) => Ok(user.firstName)
case _ => Ok("user not found")
}
}
}
}
How can this be achieved in Play Slick?
You can try to initialize slick db object overriding default config:
val db = Database.forURL("jdbc:mysql://localhost/" + databaseName, driver="org.h2.Driver")
more information in slick docs http://slick.lightbend.com/doc/3.0.0/database.html
Instead of trying to use Play's runtime dependency injection utilities in this case, use the SlickApi directly in your DAO and pass the datasource name to the dbConfig(DbName(name)) method. To obtain the SlickApi, mix in the SlickComponents trait:
class UsersDAO extends SlickComponents {
def readUser(userid: String, dbName: String) = {
val users = TableQuery[UserDB]
val action = users.filter(_.userid === userid).result
val dbConfig = slickApi.dbConfig(DbName(dbName))
val future = dbConfig.db.run(action.asTry)
...
}
}
Then in your controller:
def test(someCode: Int, userId: String) = Action { request =>
val databaseName = if (someCode == 1) "control0001" else "control0002"
val future = dao.readUser(userId, databaseName)
...
}

missing FromRequestUnmarshaller[Entity] on akka post route

Let me start by saying that i am very new to akka-http, none of the books i have covered the marsheling topic well. So it is bit of a blackbox for me. I was able to obtain the following (Un)Marsheller which is capable of returning both json and protobuf based on a request header.
This part of the code works fine and i have a get route defined in akka-http and it works fine.
trait PBMarshaller {
private val protobufContentType = ContentType(MediaType.applicationBinary("octet-stream", Compressible, "proto"))
private val applicationJsonContentType = ContentTypes.`application/json`
implicit def PBFromRequestUnmarshaller[T <: GeneratedMessage with Message[T]](companion: GeneratedMessageCompanion[T]): FromEntityUnmarshaller[T] = {
Unmarshaller.withMaterializer[HttpEntity, T](_ => implicit mat => {
case entity#HttpEntity.Strict(`applicationJsonContentType`, data) =>
val charBuffer = Unmarshaller.bestUnmarshallingCharsetFor(entity)
FastFuture.successful(JsonFormat.fromJsonString(data.decodeString(charBuffer.nioCharset().name()))(companion))
case entity#HttpEntity.Strict(`protobufContentType`, data) =>
FastFuture.successful(companion.parseFrom(CodedInputStream.newInstance(data.asByteBuffer)))
case entity =>
Future.failed(UnsupportedContentTypeException(applicationJsonContentType, protobufContentType))
})
}
implicit def PBToEntityMarshaller[T <: GeneratedMessage]: ToEntityMarshaller[T] = {
def jsonMarshaller(): ToEntityMarshaller[T] = {
val contentType = applicationJsonContentType
Marshaller.withFixedContentType(contentType) { value =>
HttpEntity(contentType, JsonFormat.toJsonString(value))
}
}
def protobufMarshaller(): ToEntityMarshaller[T] = {
Marshaller.withFixedContentType(protobufContentType) { value =>
HttpEntity(protobufContentType, value.toByteArray)
}
}
Marshaller.oneOf(protobufMarshaller(), jsonMarshaller())
}
}
the issue i am facing is on the post route.
(post & entity(as[PropertyEntity])) { propertyEntity =>
complete {
saveProperty(propertyEntity)
}
}
During compilation time, i get the following error
Error:(20, 24) could not find implicit value for parameter um: akka.http.scaladsl.unmarshalling.FromRequestUnmarshaller[PropertyEntity]
(post & entity(as[PropertyEntity])) { propertyEntity =>
I am not sure exactly what i am missing. Do i need to define an implicit FromRequestUnmarshaller ? if so what should it have?
i was able to hack something together that works for the moment, but i still don't know how to create a general Unmarshaller that can decode any ScalaPB case class
implicit val um:Unmarshaller[HttpEntity, PropertyEntity] = {
Unmarshaller.byteStringUnmarshaller.mapWithCharset { (data, charset) =>
val charBuffer = Unmarshaller.bestUnmarshallingCharsetFor(data)
JsonFormat.fromJsonString(data.decodeString(charBuffer.nioCharset().name()))(PropertyEntity)
/*PropertyEntity.parseFrom(CodedInputStream.newInstance(data.asByteBuffer))*/
}
}
even this i don't know how to have both decoders enabled at the same time. so i have commented one out.

How to query OrientDB asynchronously from Play controller?

I am writing a Play (2.2) controller in Scala, which should return the result of a query against OrientDB. Now, I have succeeded in writing a synchronous version of said controller, but I'd like to re-write it to work asynchronously.
My question is; given the below code (just put together for demonstration purposes), how do I re-write my controller to interact asynchronously with OrientDB (connecting and querying)?
import play.api.mvc.{Action, Controller}
import play.api.libs.json._
import com.orientechnologies.orient.`object`.db.OObjectDatabasePool
import java.util
import com.orientechnologies.orient.core.sql.query.OSQLSynchQuery
import scala.collection.JavaConverters._
object Packages extends Controller {
def packages() = Action { implicit request =>
val db = OObjectDatabasePool.global().acquire("http://localhost:2480", "reader", "reader")
try {
db.getEntityManager().registerEntityClass(classOf[models.Package])
val packages = db.query[util.List[models.Package]](new OSQLSynchQuery[models.Package]("select from Package")).asScala.toSeq
Ok(Json.obj(
"packages" -> Json.toJson(packages)
))
}
finally {
db.close()
}
}
}
EDIT:
Specifically, I wish to use OrientDB's asynchronous API. I know that asynchronous queries are supported by the API, though I'm not sure if you can connect asynchronously as well.
Attempted Solution
Based on Jean's answer, I've tried the following asynchronous implementation, but it fails due to a compilation error value execute is not a member of Nothing possible cause: maybe a semicolon is missing before 'value execute'?:
def getPackages(): Future[Seq[models.Package]] = {
val db = openDb
try {
val p = promise[Seq[models.Package]]
val f = p.future
db.command(
new OSQLAsynchQuery[ODocument]("select from Package",
new OCommandResultListener() {
var acc = List[ODocument]()
#Override
def result(iRecord: Any): Boolean = {
val doc = iRecord.asInstanceOf[ODocument]
acc = doc :: acc
true
}
#Override
def end() {
// This is just a dummy
p.success(Seq[models.Package]())
}
// Fails
})).execute()
f
}
finally {
db.close()
}
}
One way could be to start a promise, return the future representing the result of that promise, locally accumulate the results as they come and complete de promise ( thus resolving the future ) when orient db notifies you that the command has completed.
def executeAsync(osql: String, params: Map[String, String] = Map()): Future[List[ODocument]] = {
import scala.concurrent._
val p = promise[List[ODocument]]
val f =p.future
val req: OCommandRequest = database.command(
new OSQLAsynchQuery[ODocument]("select * from animal where name = 'Gipsy'",
new OCommandResultListener() {
var acc = List[ODocument]()
#Override
def result(iRecord:Any):Boolean= {
val doc = iRecord.asInstanceOf[ODocument]
acc=doc::acc
true
}
#Override
def end() {
p.success(acc)
}
}))
req.execute()
f
}
Be careful though, to enable graph navigation and field lazy loading, orientdb objects used to keep an internal reference to the database instance they were loaded from ( or to depend on a threadlocal database connected instance ) for lazily loading elements from the database. Manipulating these objects asynchronously may result in loading errors. I haven't checked changes from 1.6 but that seemed to be deeply embedded in the design.
It's as simple as wrapping the blocking call in a Future.
import play.api.libs.concurrent.Execution.Implicits.defaultContext
import scala.concurrent.Future
object Packages extends Controller {
def packages = Action.async { implicit request =>
val db = OObjectDatabasePool.global().acquire("http://localhost:2480", "reader", "reader")
db.getEntityManager().registerEntityClass(classOf[models.Package])
val futureResult: Future[Result] = Future(
db.query[util.List[models.Package]](new OSQLSynchQuery[models.Package]("select from Package")).asScala.toSeq
).map(
queryResult => Ok(Json.obj("packages" -> Json.toJson(packages)))
).recover {
// Handle each of the exception cases legitimately
case e: UnsupportedOperationException => UnsupportedMediaType(e.getMessage)
case e: MappingException => BadRequest(e.getMessage)
case e: MyServiceException => ServiceUnavailable(e.toString)
case e: Throwable => InternalServerError(e.toString + "\n" + e.getStackTraceString)
}
futureResult.onComplete { case _ =>
db.close()
}
futureResult
}
}
Note that I did not compile the code. There is a lot of room to improve the code.

How could I know if a database table is exists in ScalaQuery

I'm trying ScalaQuery, it is really amazing. I could defined the database table using Scala class, and query it easily.
But I would like to know, in the following code, how could I check if a table is exists, so I won't call 'Table.ddl.create' twice and get a exception when I run this program twice?
object Users extends Table[(Int, String, String)]("Users") {
def id = column[Int]("id")
def first = column[String]("first")
def last = column[String]("last")
def * = id ~ first ~ last
}
object Main
{
val database = Database.forURL("jdbc:sqlite:sample.db", driver = "org.sqlite.JDBC")
def main(args: Array[String]) {
database withSession {
// How could I know table Users is alrady in the DB?
if ( ??? ) {
Users.ddl.create
}
}
}
}
ScalaQuery version 0.9.4 includes a number of helpful SQL metadata wrapper classes in the org.scalaquery.meta package, such as MTable:
http://scalaquery.org/doc/api/scalaquery-0.9.4/#org.scalaquery.meta.MTable
In the test code for ScalaQuery, we can see examples of these classes being used. In particular, see org.scalaquery.test.MetaTest.
I wrote this little function to give me a map of all the known tables, keyed by table name.
import org.scalaquery.meta.{MTable}
def makeTableMap(dbsess: Session) : Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess);
val tableMap = tableList.map{t => (t.name.name, t)}.toMap;
tableMap;
}
So now, before I create an SQL table, I can check "if (!tableMap.contains(tableName))".
This thread is a bit old, but maybe someone will find this useful. All my DAOs include this:
def create = db withSession {
if (!MTable.getTables.list.exists(_.name.name == MyTable.tableName))
MyTable.ddl.create
}
Here's a full solution that checks on application start using a PostGreSQL DB for PlayFramework
import globals.DBGlobal
import models.UsersTable
import org.scalaquery.meta.MTable
import org.scalaquery.session.Session
import play.api.GlobalSettings
import play.api.Application
object Global extends GlobalSettings {
override def onStart(app: Application) {
DBGlobal.db.withSession { session : Session =>
import org.scalaquery.session.Database.threadLocalSession
import org.scalaquery.ql.extended.PostgresDriver.Implicit._
if (!makeTableMap(session).contains("tableName")) {
UsersTable.ddl.create(session)
}
}
}
def makeTableMap(dbsess: Session): Map[String, MTable] = {
val tableList = MTable.getTables.list()(dbsess)
val tableMap = tableList.map {
t => (t.name.name, t)
}.toMap
tableMap
}
}
With java.sql.DatabaseMetaData (Interface). Depending on your Database, more or less functions might be implemented.
See also the related discussion here.I personally prefer hezamu's suggestion and extend it as follows to keep it DRY:
def createIfNotExists(tables: TableQuery[_ <: Table[_]]*)(implicit session: Session) {
tables foreach {table => if(MTable.getTables(table.baseTableRow.tableName).list.isEmpty) table.ddl.create}
}
Then you can just create your tables with the implicit session:
db withSession {
implicit session =>
createIfNotExists(table1, table2, ..., tablen)
}
You can define in your DAO impl the following method (taken from Slick MTable.getTables always fails with Unexpected exception[JdbcSQLException: Invalid value 7 for parameter columnIndex [90008-60]]) that gives you a true o false depending if there a defined table in your db:
def checkTable() : Boolean = {
val action = MTable.getTables
val future = db.run(action)
val retVal = future map {result =>
result map {x => x}
}
val x = Await.result(retVal, Duration.Inf)
if (x.length > 0) {
true
} else {
false
}
}
Or, you can check if some "GIVENTABLENAME" or something exists with println method:
def printTable() ={
val q = db.run(MTable.getTables)
println(Await.result(q, Duration.Inf).toList(0)) //prints first MTable element
println(Await.result(q, Duration.Inf).toList(1))//prints second MTable element
println(Await.result(q, Duration.Inf).toList.toString.contains("MTable(MQName(public.GIVENTABLENAME_pkey),INDEX,null,None,None,None)"))
}
Don't forget to add
import slick.jdbc.meta._
Then call the methods from anywhere with the usual #Inject(). Using
play 2.4 and play-slick 1.0.0.
Cheers,