How to print contents of a BSONDocument - scala

I am tyring to implement CRUD operations using reactiveMongo, and here is my find function from a tutorial online.
def findTicker(ticker: String) = {
val query = BSONDocument("firstName" -> ticker)
val future = collection.find(query).one
future.onComplete {
case Failure(e) => throw e
case Success(result) => {
println(result)
}
}
}
However I am getting this Result:
Some(BSONDocument(<non-empty>))
How can I actually see an actual readable JSON data I am looing for:
{ "_id" : ObjectId("569914557b85c62b49634c1d"), "firstName" : "Stephane", "lastName" : "Godbillon", "age" : 29 }

You can do this without playframework module. There is a pretty function specialy for this:
result match{
case Some(document) => println(BSONDocument.pretty(document))
case None => println("No document")
}

With Play-ReactiveMongo
So you have a few options. It looks like your using the Play framework and then I assume Play-ReactiveMongo Plugin. If thats the case checkout this question Its a bit different but I think you can re-use the ideas from the submitted answer.
import play.modules.reactivemongo.json.BSONFormats._
and then in your success case
case Success(result) => {
result.map { data =>
Json.toJson(data)
}
There are other options to convert BSONDocuments to JSON but Play-ReactiveMongo makes things easier.
Without the Play-ReactiveMongo plugin you will need to tell ReactiveMongo how to Write and Read your data. To do this ReactiveMongo uses BSONDocumentReaders & BSONDocumentWriters. They do provide a Macro to generate these for most classes this link has more info
import reactivemongo.bson._
//lets say your domain/case class is called Person
implicit val personHandler:BSONHandler[BSONDocument,Person] = Macros.handler[Person]
A BSONHandler gathers both BSONReader and BSONWriter traits and you can place this implicit in Persons companion object.
ReactiveMongos one method is generic on the type of entity it is looking for and takes an implicit reader for your entity.
def one[T](readPreference: ReadPreference)(implicit reader: Reader[T], ec: ExecutionContext): Future[Option[T]]
So in this example it would use the Reader generated from the Macro above to return a Future[Option[Person]] instead of Future[Option[BSONDocument]]. Then you can use the Play JSON to write your domain in JSON
For full disclosure you can write your own customer writers rather than use the Macro and these end up being similar to writing Play JSON writers and readers

THIS ANSWER IS BASED ON #Barry's PREVIOUS ANSWER BEFORE THE EDITS:
I got it to work using the play-reactivemongo updated version:
"org.reactivemongo" %% "play2-reactivemongo" % "0.11.9",
Now,
result.map { data =>
println(Json.toJson(data))
}
returns what I want:
{"_id":0,"name":"MongoDB","type":"database","count":1,"info":{"x":203,"y":102}}

Related

How to return successfully parsed rows that converted into my case class

I have a file, each row is a json array.
I reading each line of the file, and trying to convert the rows into a json array, and then for each element I am converting to a case class using json spray.
I have this so far:
for (line <- source.getLines().take(10)) {
val jsonArr = line.parseJson.convertTo[JsArray]
for (ele <- jsonArr.elements) {
val tryUser = Try(ele.convertTo[User])
}
}
How could I convert this entire process into a single line statement?
val users: Seq[User] = source.getLines.take(10).map(line => line.parseJson.convertTo[JsonArray].elements.map(ele => Try(ele.convertTo[User])
The error is:
found : Iterator[Nothing]
Note: I used Scala 2.13.6 for all my examples.
There is a lot to unpack in these few lines of code. First of all, I'll share some code that we can use to generate some meaningful input to play around with.
object User {
import scala.util.Random
private def randomId: Int = Random.nextInt(900000) + 100000
private def randomName: String = Iterator
.continually(Random.nextInt(26) + 'a')
.map(_.toChar)
.take(6)
.mkString
def randomJson(): String = s"""{"id":$randomId,"name":"$randomName"}"""
def randomJsonArray(size: Int): String =
Iterator.continually(randomJson()).take(size).mkString("[", ",", "]")
}
final case class User(id: Int, name: String)
import scala.util.{Try, Success, Failure}
import spray.json._
import DefaultJsonProtocol._
implicit val UserFormat = jsonFormat2(User.apply)
This is just some scaffolding to define some User domain object and come up with a way to generate a JSON representation of an array of such objects so that we can then use a JSON library (spray-json in this case) to parse it back into what we want.
Now, going back to your question. This is a possible way to massage your data into its parsed representation. It may not fit 100% what your are trying to do, but there's some nuance in the data types involved and how they work:
val parsedUsers: Iterator[Try[User]] =
for {
line <- Iterator.continually(User.randomJsonArray(4)).take(10)
element <- line.parseJson.convertTo[JsArray].elements
} yield Try(element.convertTo[User])
First difference: notice that I use the for comprehension in a form in which the "outcome" of an iteration is not a side effect (for (something) { do something }) but an actual value for (something) yield { return a value }).
Second difference: I explicitly asked for an Iterator[Try[User]] rather than a Seq[User]. We can go very down into a rabbit hole on the topic of why the types are what they are here, but the simple explanation is that a for ... yield expression:
returns the same type as the one in the first line of the generation -- if you start with a val ns: Iterator[Int]; for (n<- ns) ... you'll get an iterator at the end
if you nest generators, they need to be of the same type as the "outermost" one
You can read more on for comprehensions on the Tour of Scala and the Scala Book.
One possible way of consuming this is the following:
for (user <- parsedUsers) {
user match {
case Success(user) => println(s"parsed object $user")
case Failure(error) => println(s"ERROR: '${error.getMessage}'")
}
As for how to turn this into a "one liner", for comprehensions are syntactic sugar applied by the compiler which turns every nested call into a flatMap and the final one into map, as in the following example (which yields an equivalent result as the for comprehension above and very close to what the compiler does automatically):
val parsedUsers: Iterator[Try[User]] = Iterator
.continually(User.randomJsonArray(4))
.take(10)
.flatMap(line =>
line.parseJson
.convertTo[JsArray]
.elements
.map(element => Try(element.convertTo[User]))
)
One note that I would like to add is that you should be mindful of readability. Some teams prefer for comprehensions, others manually rolling out their own flatMap/map chains. Coders discretion is advised.
You can play around with this code here on Scastie (and here is the version with the flatMap/map calls).

Scala, couchbase - convert AsyncN1qlQueryResult into custom object

I have a case class with simple data:
case class MyClass(
details: Details,
names: List[String],
id: String,
)
I have created a couchbase query which should retrieve all documents from database:
val query = s"SELECT * from `docs`"
for {
docs<- bucket
.query(N1qlQuery.simple(query))
.flatMap((rows: AsyncN1qlQueryResult) => rows.rows())
.toList
.parse[F]
.map(_.asScala.toList)
} yield docs
parse[F] is a simple function to convert from Observable. The problem here is that I got an error type mismatch which says that found List[AsyncN1qlQueryResult] instead of required List[MyClass]. How should I convert from AsyncN1qlQueryResult into MyClass objects?
I'm using Circe to parse documents.
I'm happy to report that there is now an early release of the native Couchbase Scala SDK available, which does include support for converting each row result of a N1QL query directly into your case class:
case class Address(line1: String)
case class User(name: String, age: Int, addresses: Seq[Address])
object User {
// Define a Codec so SDK knows how to convert User to/from JSON
implicit val codec: Codec[User] = Codecs.codec[User]
}
val statement = """select * from `users`;"""
val rows: Try[Seq[User]] = cluster.query(statement)
.map(result => result
.rows.flatMap(row =>
row.contentAs[User].toOption))
rows match {
case Success(rows: Seq[User]) =>
rows.foreach(row => println(row))
case Failure(err) =>
println(s"Error: $err")
}
This is the blocking API. There's also APIs to allow getting the results as Futures, or as Flux/Monos from Reactive Programming, so you have a lot of flexibility on how to get the data.
You can see how to get started here: https://docs.couchbase.com/scala-sdk/1.0alpha/hello-world/start-using-sdk.html
Please note that this is an alpha release to let the community get an idea where we're heading with it and give them an opportunity to provide feedback. It shouldn't be used in production. The forums (https://forums.couchbase.com/) are the best place to raise any feedback you have.

Await.result on HttpService

I've a scala project with http4s 0.15.16a and slick 3.2.1 with these steps:
Receive a ID by rest call
passing ID to MySlickDAO that responds with a Future
Call Await.result(res, Duration.Inf) on Future returned by MySlickDAO
Create the json
The problem is that I use a Await.result and this is bad practices
is there a better solution ?
Here the code:
val service = HttpService {
//http://localhost:8080/rest/id/9008E75A-F112-396B-E050-A8C08D26075F
case GET -> Root / "rest" / "id" / id =>
val res = MySlickDAO.load(id)
Await.result(res, Duration.Inf)
val ll = res.value.get.get
ll match {
case Failure(x) =>
InternalServerError(x)
case Success(record) =>
val r = record.map(x => MyEntity(x._1, x._2, x._3))
jsonOK(r.asJson)
}
case ....
}
Instead of awaiting, you can chain the result of one Future into another:
val resFut = MySlickDAO.load(id)
resFut.map { record =>
val r = record.map(x => MyEntity(x._1, x._2, x._3))
jsonOK(r.asJson)
} recover { x =>
InternalServerError(x)
}
The result of this will be a Future of a common supertype of jsonOK and InternalServerError (not familiar with the libraries you're using; so I may have the type of load wrong: it's not a Future[Try[_]] is it?).
BTW: your original code has a very problematic line:
val ll = res.value.get.get
res.value is an Option[Try[T]]. Calling get on an Option or a Try is generally a bad idea (even though in this case because of the Await, the Option should never be None, so the get is technically safe) because it can throw an exception. You're much better off using map, flatMap, and friends.
The issue is that http4s 0.15 uses the Scalaz concurrency constructs, while Slick uses the native Scala ones, and the two aren't designed to work with each other. My understanding is that http4s 0.17+ has switched from Scalaz to Cats, which might entail using native Scala Futures, so if you can upgrade that might be worth a shot. If not, you can handle the conversion by manually creating a Task that wraps your future:
def scalaFutureRes = MySlickDAO.load(id)
val scalazTaskRes = Task.async { register =>
scalaFutureRes.onComplete {
case Success(success) => register(success.right)
case Failure(ex) => register(ex.left)
}
}
At this point you've got a Task[ResultType] from the Future[ResultType] which you can map/flatMap with the rest of your logic like in Levi's answer.
You can also use the delorean library for this which has this logic and the opposite direction defined on the classes in question via implicit conversions, so that you can just call .toTask on a Future to get it in a compatible form. Their readme also has a lot of useful information on the conversion and what pitfalls there are.

Conversion from bson.Document to JsObject and applying reads on it with joda.DateTime

I am currently integrating part of our system with MongoDB and we decided to use the official scala driver for it.
We have case class with joda.DateTime as parameters:
case class Schema(templateId: Muid,
createdDate: DateTime,
updatedDate: DateTime,
columns: Seq[Column])
We also defined format for it:
implicit lazy val checklistSchemaFormat : Format[Schema] = (
(__ \ "templateId").format[Muid] and
(__ \ "createdDate").format[DateTime] and
(__ \ "updatedDate").format[DateTime] and
(__ \ "columns").format[Seq[Column]]
)((Schema.apply _), unlift(Schema.unapply))
When I serialize this object to json and write to mongo, the createdDate and updatedDate getting converted to Long (which is technically fine). And this is how we do it:
val bsonDoc = Document(Json.toJson(schema).toString())
collection(DbViewSchemasCollectionName).replaceOne(filter, bsonDoc, new UpdateOptions().upsert(true)).subscribe(new Observer[UpdateResult] {
override def onNext(result: UpdateResult): Unit = logger.info(s"Successfully updates checklist template schema with result: $result")
override def onError(e: Throwable): Unit = logger.info(s"Failed to update checklist template schema with error: $e")
override def onComplete(): Unit = {}
})
as a result Mongo has this type of object:
{
"_id": ObjectId("56fc4247eb3c740d31b04f05"),
"templateId": "gaQP3JIB3ppJtro9rO9BAw",
"createdDate": NumberLong(1459372615507),
"updatedDate": NumberLong(1459372615507),
"columns": [
...
]
}
Now, I am trying to read it like so:
collection(DbViewSchemasCollectionName).find(filter).first().head() map { document =>
ChecklistSchema.checklistSchemaFormat reads Json.parse(document.toJson()) match {
case JsSuccess(value, _) => {
Some(value)
}
case JsError(e) => throw JsResultException(e)
}
} recover { case e =>
logger.info("ERROR " + e)
None
}
And at this point the reads always failing, since the createdDate and updatedDate now look like this:
"createdDate" : { "$numberLong" : "1459372615507" }, "updatedDate" :
{"$numberLong" : "1459372615507" }
How do I deal with this situation? Is there any easier conversion between bson.Document and JsObject? Or I am completely digging into the wrong direction...
Thanks,
You can use the following approach to resolve your issue.
Firstly, I used json4s for reading the writing json to case classes
example:
case class User(_id: Option[Int], username: String, firstName: String, createdDate: DateTime , updatedDate: DateTime )
// A small wapper to convert case class to Document
def toBson[A <: AnyRef](x : A):Document = {
val json = write[A](x)
Document(json) }
def today() = DateTime.now
val user = User(Some(212),"binkabir","lacmalndl", today , today)
val bson = toBson(user)
usersCollection.insertOne(bson).subscribe((x: Completed) => println(x))
val f = usersCollection.find(equal("_id",212)).toFuture()
f.map(_.head.toJson).map( x => read[User](x)).foreach(println)
The code above will create a user case class, convert to Document, save to mongo db, query the db and print the returned User case class
I hope this makes sense!
To answer your second question (bson.Document <-> JsObject) - YES; this is a solved problem, check out Play2-ReactiveMongo, which makes it seem like you're storing/retrieving JsObject instances - plus it's fully asynchronous and super-easy to get going in a Play application.
You can even go a step further and use a library like Mondrian (full disclosure: I wrote it!) to get the basic CRUD operations on top of ReactiveMongo for your Play-JSON domain case-classes.
Obviously I'm biased, but I think these solutions are a great fit if you've already defined your models as case classes in Play - you can forget about the whole BSONDocument family and stick to Json._ and JsObject etc that you already know well.
EDIT:
At the risk of further downvotes, I will demonstrate how I would go about storing and retrieving the OP's Schema object, using Mondrian. I'm going to show pretty-much everything for completeness; bear in mind you've actually already done most of this. Your final code will have fewer lines than your current code, as you'd expect when you use a library.
Model Objects
There's a Column class here that's never mentioned, for simplicity let's just say that's:
case class Column (name:String, width:Int)
Now we can get on with the Schema, which is just:
import com.themillhousegroup.mondrian._
case class Schema(_id: Option[MongoId],
createdDate: DateTime,
updatedDate: DateTime,
columns: Seq[Column]) extends MongoEntity
So far, we've just implemented the MongoEntity trait, which just required the templateId field to be renamed and given the required type.
JSON Converters
import com.themillhousegroup.mondrian._
import play.api.libs.json._
import play.api.libs.functional.syntax._
object SchemaJson extends MongoJson {
implicit lazy val columnFormat = Json.format[Column]
// Pick one - easy:
implicit lazy val schemaFormat = Json.format[Schema]
// Pick one - backwards-compatible (uses "templateId"):
implicit lazy val checklistSchemaFormat : Format[Schema] = (
(__ \ "templateId").formatNullable[MongoId] and
(__ \ "createdDate").format[DateTime] and
(__ \ "updatedDate").format[DateTime] and
(__ \ "columns").format[Seq[Column]]
)((Schema.apply _), unlift(Schema.unapply))
}
The JSON converters are standard Play-JSON stuff; we pick up the MongoId Format by extending MongoJson. I've shown two different ways of defining the Format for Schema. If you have clients out in the wild using templateId (or if you prefer it) then use the second, more verbose declaration.
Service layer
For brevity I'll skip the application-configuration, you can read the Mondrian README.md for that. Let's define the SchemaService that is responsible for persistence operations on Schema instances:
import com.themillhousegroup.mondrian._
import SchemaJson._
class SchemaService extends TypedMongoService[Schema]("schemas")
That's it. We've linked the model object, the name of the MongoDB collection ("schemas") and (implicitly) the necessary converters.
Saving a Schema and finding a Schema based on some criteria
Now we start to realize the value of Mondrian. save and findOne are standard operations - we get them for free in our Service, which we inject into our controllers in the standard way:
class SchemaController #Inject (schemaService:SchemaService) extends Controller {
...
// Returns a Future[Boolean]
schemaService.save(mySchema).map { saveOk =>
...
}
...
...
// Define a filter criteria using standard Play-JSON:
val targetDate = new DateTime()
val criteria = Json.obj("createdDate" -> Json.obj("$gt" ->targetDate.getMillis))
// Returns a Future[Option[Schema]]
schemaService.findOne(criteria).map { maybeFoundSchema =>
...
}
}
So there we go. No sign of the BSON family, just the Play JSON that, as you say, we all know and love. You'll only need to reach for the Mongo documentation when you need to construct a JSON query (that $gt stuff) although in some cases you can use Mondrian's overloaded findOne(example:Schema) method if you are just looking for a simple object match, and avoid even that :-)

Scalding TypedPipe API External Operations pattern

I have a copy of Programming MapReduce with Scalding by Antonios Chalkiopoulos. In the book he discusses the External Operations design pattern for Scalding code. You can see an example on his website here. I have made a choice to use the Type Safe API. Naturally, this introduces new challenges but I prefer it over the Fields API which is what is heavily discussed in the book I have previously mentioned and the site.
I am wondering how people have implemented the external operations pattern with the Type Safe API. My initial implementation is as follows:
I create a class that extends com.twitter.scalding.Job which will
serve as my Scalding job class where I will 'manage arguments, define
taps, and use external operations to construct data processing
pipelines'.
I create an object where I define my functions to be used in the Type
Safe pipes. Because the Type Safe pipes take as arguments a function,
I can then just pass the functions in the object as arguments to the
pipes.
This creates code that looks like this:
class MyJob(args: Args) extends Job(args) {
import MyOperations._
val input_path = args(MyJob.inputArgPath)
val output_path = args(MyJob.outputArgPath)
val eventInput: TypedPipe[(LongWritable, Text)] = this.mode match {
case m: HadoopMode => TypedPipe.from(WritableSequenceFile[LongWritable, Text](input_path))
case _ => TypedPipe.from(WritableSequenceFile[LongWritable, Text](input_path))
}
val eventOutput: FixedPathSource with TypedSink[(LongWritable, Text)] with TypedSource[(LongWritable, Text)] = this.mode match {
case m: HadoopMode => WritableSequenceFile[LongWritable, Text](output_path)
case _ => TypedTsv[(LongWritable, Text)](output_path)
}
val validatedEvents: TypedPipe[(LongWritable, Either[Text, Event])] = eventInput.map(convertTextToEither).fork
validatedEvents.filter(isEvent).map(removeEitherWrapper).write(eventOutput)
}
object MyOperations {
def convertTextToEither(v: (LongWritable, Text)): (LongWritable, Either[Text, Event]) = {
...
}
def isEvent(v: (LongWritable, Either[Text, Event])): Boolean = {
...
}
def removeEitherWrapper(v: (LongWritable, Either[Text, Event])): (LongWritable, Text) = {
...
}
}
As you can see, the functions that are passed to the Scalding Type Safe operations are kept separate from the job itself. While this is not as 'clean' as the external operations pattern presented, this is a quick way to write this kind of code. Additionally, I can use JUnitRunner for doing job level integration tests and ScalaTest for function level unit tests.
The main point of this post though is to ask how people are doing this sort of thing? The documentation around the internet for Scalding Type Safe API is sparse. Are there more functional Scala friendly ways for doing this? Am I missing a key component here for the design pattern? I sort of feel nervous about this because with the Fields API you can write unit tests on pipes with ScaldingTest. As far as I know, you can't do that with TypedPipes. Please let me know if there is a generally agreed upon pattern for Scalding Type Safe API or how you create reusable, modular, and testable Type Safe API code. Thanks for the help!
Update 2 after Antonios' reply
Thank you for the reply. That was basically the answer I was looking for. I wanted to continue the conversation. The main issue I see in your answer as I commented was that this implementation expects a specific type implementation but what if the types change throughout your job? I have explored this code and it seems to work but it seems hacked on.
def self: TypedPipe[Any]
def testingPipe: TypedPipe[(LongWritable, Text)] = self.map(
(firstVar: Any) => {
val tester = firstVar.asInstanceOf[(LongWritable, Text)]
(tester._1, tester._2)
}
)
The upside to this is I declare one implementation of self but the downside is this ugly type casting. Additionally, I have not tested this out in depth with a more complex pipeline. So basically, what are your thoughts on how to handle types as they change with only one self implementation for cleanliness/brevity?
Scala extension methods are implemented using implicit classes.
You add to the compiler the capability of converting a TypedPipe into a (wrapper) class that contains your external operations:
import com.twitter.scalding.TypedPipe
import com.twitter.scalding._
import cascading.flow.FlowDef
class MyJob(args: Args) extends Job(args) {
implicit class MyOperationsWrapper(val self: TypedPipe[Double]) extends MyOperations with Serializable
val pipe = TypedPipe.from(TypedTsv[Double](args("input")))
val result = pipe
.operation1
.operation2(x => x*2)
.write(TypedTsv[Double](args("output")))
}
trait MyOperations {
def self: TypedPipe[Double]
def operation1(implicit fd: FlowDef): TypedPipe[Double] =
self.map { x =>
println(s"Input: $x")
x / 100
}
def operation2(datafn:Double => Double)(implicit fd: FlowDef): TypedPipe[Double] =
self.map { x=>
val result = datafn(x)
println(s"Result: $result")
result
}
}
import org.apache.hadoop.util.ToolRunner
import org.apache.hadoop.conf.Configuration
object MyRunner extends App {
ToolRunner.run(new Configuration(), new Tool, (classOf[MyJob].getName :: "--local" ::
"--input" :: "doubles.tsv" ::
"--output":: "result.tsv" :: args.toList).toArray)
}
Regarding how to manage types across the pipes, my recommendation would be to try to work out some basic types that make sense and use case classes. To use your example i would rename the method convertTextToEither into extractEvents :
case class LogInput(l : Long, text: Text)
case class Event(data: String)
def extractEvents( line : LogInput ): TypedPipe[Event] =
self.filter( isEvent(line) )
.map ( getEvent(line.text) )
Then you would have
LogInputOperations for LogInput types
EventOperations for Event types
I am not sure what is the problem you see with the snippet you showed, and why you think it is "less clean". It looks fine to me.
As for the unit testing jobs using typed API question, take a look at JobTest, it seems to be just what you are looking for.