I am not currently able to run a Raw Command in ReactiveMongo 0.12.5 using the Play JSON Plugin. The documentation (Run a raw command) is not currently accessible but from a cached page in my browser I can see the following:
import scala.concurrent.{ ExecutionContext, Future }
import play.api.libs.json.{ JsObject, Json }
import reactivemongo.play.json._
import reactivemongo.api.commands.Command
def rawResult(db: reactivemongo.api.DefaultDB)(implicit ec: ExecutionContext): Future[JsObject] = {
val commandDoc = Json.obj(
"aggregate" -> "orders", // we aggregate on collection `orders`
"pipeline" -> List(
Json.obj("$match" -> Json.obj("status" -> "A")),
Json.obj(
"$group" -> Json.obj(
"_id" -> "$cust_id",
"total" -> Json.obj("$sum" -> "$amount"))),
Json.obj("$sort" -> Json.obj("total" -> -1))
)
)
val runner = Command.run(JSONSerializationPack) // run is since deprecated
runner.apply(db, runner.rawCommand(commandDoc)).one[JsObject] // one is since deprecated
}
However I am not looking to return a JsObject (or anything in fact) - I actually want to update all documents in another collection as this previous answer illustrates. My issue is that both methods contain deprecated functions and so I have put together a combination to (possibly) work with JSON Collections (as mentioned):
def bulkUpdateScoreBA(scoreBAs: List[ScoreBA]) = {
def singleUpdate(scoreBA: ScoreBA) = Json.obj(
("q" -> Json.obj("_id" ->
Json.obj("$oid" -> scoreBA.idAsString(scoreBA._id))
)),
("u" ->
Json.obj("$set" ->
Json.obj("scoreBA" -> scoreBA.scoreBA)
)
)
)
val commandJson = Json.obj(
"update" -> "rst",
"updates" -> Json.arr(scoreBAs.map(singleUpdate)),
"ordered" -> false,
"writeConcern" -> Json.obj("w" -> "majority", "wtimeout" -> 5000)
)
val runner = Command.CommandWithPackRunner(JSONSerializationPack)
runner.apply(db, runner.rawCommand(commandJson)) // ?? how to get a Future[Unit] here
}
However I need this to return a Future[Unit] so that I can call it from the controller but I cannot find how this is done or even if what I have done so far is the best way. Any help is appreciated!
The Scaladoc for bulk update is available (since 0.12.7), with example in tests.
Related
I have a UDF in spark (running on EMR), written in scala that parses device from user agent using uaparser library for scala (uap-scala). When working on small sets it works fine (5000 rows) but when running on larger sets (2M) it works very slow.
I tried collecting the Dataframe to list and looping over it on the driver, and that was also very slow, what makes me believe that the UDF runs on the driver and not the workers
How can I establish this? does anyone have another theory?
if that is the case, why can this happen?
This is the udf code:
def calcDevice(userAgent: String): String = {
val userAgentVal = Option(userAgent).getOrElse("")
Parser.get.parse(userAgentVal).device.family
}
val calcDeviceValUDF: UserDefinedFunction = udf(calcDevice _)
usage:
.withColumn("agentDevice", udfDefinitions.calcDeviceValUDF($"userAgent"))
Thanks
Nir
Problem was with instantiating the builder within the UDF itelf. The solution is to create the object outside the udf and use it at row level:
val userAgentAnalyzerUAParser = Parser.get
def calcDevice(userAgent: String): String = {
val userAgentVal = Option(userAgent).getOrElse("")
userAgentAnalyzerUAParser.parse(userAgentVal).device.family
}
val calcDeviceValUDF: UserDefinedFunction = udf(calcDevice _)
We ran into the same issue where Spark jobs were hanging. One additional thing we did was to use a broadcast variable. This UDF is actually very slow after all the changes so your mileage may vary. One other caveat is that of acquiring the SparkSession; we run in Databricks and if the SparkSession isn't available then it will crash; if you need the job to continue then you have to deal with that failure case.
object UDFs extends Serializable {
val uaParser = SparkSession.getActiveSession.map(_.sparkContext.broadcast(CachingParser.default(100000)))
val parseUserAgent = udf { (userAgent: String) =>
// We will simply return an empty map if uaParser is None because that would mean
// there is no active spark session to broadcast the parser.
//
// Also if you wrap the potentially null value in an Option and use flatMap and map to
// add type safety it becomes slower.
if (userAgent == null || uaParser.isEmpty) {
Map[String, Map[String, String]]()
} else {
val parsed = uaParser.get.value.parse(userAgent)
Map(
"browser" -> Map(
"family" -> parsed.userAgent.family,
"major" -> parsed.userAgent.major.getOrElse(""),
"minor" -> parsed.userAgent.minor.getOrElse(""),
"patch" -> parsed.userAgent.patch.getOrElse("")
),
"os" -> Map(
"family" -> parsed.os.family,
"major" -> parsed.os.major.getOrElse(""),
"minor" -> parsed.os.minor.getOrElse(""),
"patch" -> parsed.os.patch.getOrElse(""),
"patch-minor" -> parsed.os.patchMinor.getOrElse("")
),
"device" -> Map(
"family" -> parsed.device.family,
"brand" -> parsed.device.brand.getOrElse(""),
"model" -> parsed.device.model.getOrElse("")
)
)
}
}
}
You might also want to play with the size of the CachingParser.
Given Parser.get.parse is missing from the question, it is possible to judge only udf part.
For performance you can remove Option:
def calcDevice(userAgent: String): String = {
val userAgentVal = if(userAgent == null) "" else userAgent
Parser.get.parse(userAgentVal).device.family
}
I am parsing a Json using JSON.parseFull.
Before parsing, Json was like this
{
"response":
{
"status":"ok",
"userTier":"developer",
"total":1,
"content":
{
"id":"technology/2014/feb/18/doge-such-questions-very-answered",
"type":"article",
"sectionId":"technology",
"sectionName":"Technology",
"webPublicationDate":"2014-02-18T10:25:30Z",
"webTitle":"What is Doge?",
"webUrl":"https://www.theguardian.com/technology/2014/feb/18/doge-such-questions-very-answered",
"apiUrl":"https://content.guardianapis.com/technology/2014/feb/18/doge-such-questions-very-answered",
"isHosted":false
}
}
}
After parsing, it becomes this,
Map(response ->
Map(status -> ok,
userTier -> developer,
total -> 1.0,
content ->
Map(webUrl ->
https://www.theguardian.com/technology/2014/feb/18/doge-such-questions-very-answered,
webPublicationDate -> 2014-02-18T10:25:30Z,
webTitle -> What is Doge?,
sectionName -> Technology,
apiUrl -> https://content.guardianapis.com/technology/2014/feb/18/doge-such-questions-very-answered,
id -> technology/2014/feb/18/doge-such-questions-very-answered,
isHosted -> false,
sectionId -> technology,
type -> article
)
)
)
I need to get the values like webUrl and webtitle.
Anyone knows how to achieve that?
Scala Json library is a bit brittle, in the sense it doesn't strongly type your parsed Json. Therefore, my guess would be to drop asInstanceOf calls here and there. Example below
type JsonMap = Map[String, Any]
val maybeParsedMap: Option[JsonMap] = JSON.parseFull(jsonString).map(_.asInstanceOf[JsonMap])
val content: Option[JsonMap] = for {
parsedMap <- maybeParsedMap
response <- parsedMap.get("response")
content <- response.asInstanceOf[JsonMap].get("content")
} yield content.asInstanceOf[JsonMap]
val webUrl: Option[String] = content.asInstanceOf[JsonMap].get("webUrl").map(_.asInstanceOf[String])
I get the following list of documents back from MongoDB when I find for "campaignID":"DEMO-1".
[
{
"_id": {
"$oid": "56be0e8b3cf8a2d4f87ddb97"
},
"campaignID": "DEMO-1",
"revision": 1,
"action": [
"kick",
"punch"
],
"transactionID": 20160212095539543
},
{
"_id": {
"$oid": "56c178215886447ea261710f"
},
"transactionID": 20160215000257159,
"campaignID": "DEMO-1",
"revision": 2,
"action": [
"kick"
],
"transactionID": 20160212095539578
}
]
Now, what I am trying to do here is for a given campaignID I need to find all its versions (revision in my case) and modify the action field to dead of type String. I read the docs and the examples they have is too simple not too helpful in my case. This is what the docs say:
val selector = BSONDocument("name" -> "Jack")
val modifier = BSONDocument(
"$set" -> BSONDocument(
"lastName" -> "London",
"firstName" -> "Jack"),
"$unset" -> BSONDocument(
"name" -> 1))
// get a future update
val futureUpdate = collection.update(selector, modifier)
I can't just follow the docs because its easy to create a new BSON document and use it to modify following the BSON structure by hardcoding the exact fields. In my case I need to find the documents first and then modify the action field on the fly because unlike the docs, my action field can have different values.
Here's my code so far which obviously does not compile:
def updateDocument(campaignID: String) ={
val timeout = scala.concurrent.duration.Duration(5, "seconds")
val collection = db.collection[BSONCollection](collectionName)
val selector = BSONDocument("action" -> "dead")
val modifier = collection.find(BSONDocument("campaignID" -> campaignID)).cursor[BSONDocument]().collect[List]()
val updatedResults = Await.result(modifier, timeout)
val mod = BSONDocument(
"$set" -> updatedResults(0),
"$unset" -> BSONDocument(
"action" -> **<???>** ))
val futureUpdate = collection.update(selector, updatedResults(0))
futureUpdate
}
This worked for me as an answer to my own question. Thanks #cchantep for helping me out.
val collection = db.collection[BSONCollection](collectionName)
val selector = BSONDocument("campaignID" -> campaignID)
val mod = BSONDocument("$set" -> BSONDocument("action" -> "dead"))
val futureUpdate = collection.update(selector, mod, multi = true)
If you have a look at the BSON documentation, you can see BSONArray can be used to pass sequence of BSON values.
BSONDocument("action" -> BSONArray("kick", "punch"))
If you have List[T] as values, with T being provided a BSONWriter[_ <: BSONValue, T], then this list can be converted as BSONArray.
BSONDocument("action" -> List("kick", "punch"))
// as `String` is provided a `BSONWriter`
I have no idea how I should use play-reactivemongo's JSONFindAndModifyCommand.
I need to make an upsert query by some field. So I can first remove any existing entry and then insert. But Google says that FindAndModify command has upsert: Boolean option to achieve the same result.
Suppose I have two play.api.libs.json.JsObjects: query and object.
val q = (k: String) => Json.obj("sha256" -> k)
val obj = (k: String, v: String) => Json.obj(
"sha256" -> k,
"value" -> v
)
Then I do:
db.collection.findAndModify(
q(someSha256),
what?!,
...
)
I use play2-reactivemongo 0.11.9
Thanks!
The simpler is to use the collection operations findAndUpdate or findAndRemove, e.g.
val person: Future[BSONDocument] = collection.findAndUpdate( BSONDocument("name" -> "James"), BSONDocument("$set" -> BSONDocument("age" -> 17)), fetchNewObject = true) // on success, return the update document: // { "age": 17 }
I have a project set up with playframework 2.2.0 and play2-reactivemongo 0.10.0-SNAPSHOT. I'd like to query for few documents by their ids, in a fashion similar to this:
def usersCollection = db.collection[JSONCollection]("users")
val ids: List[String] = /* fetched from somewhere else */
val query = ??
val users = usersCollection.find(query).cursor[User].collect[List]()
As a query I tried:
Json.obj("_id" -> Json.obj("$in" -> ids)) // 1
Json.obj("_id.$oid" -> Json.obj("$in" -> ids)) // 2
Json.obj("_id" -> Json.obj("$oid" -> Json.obj("$in" -> ids))) // 3
for which first and second return empty lists and the third fails with error assertion 10068 invalid operator: $oid.
NOTE: copy of my response on the ReactiveMongo mailing list.
First, sorry for the delay of my answer, I may have missed your question.
Play-ReactiveMongo cannot guess on its own that the values of a Json array are ObjectIds. That's why you have to make a Json object for each id that looks like this: {"$oid": "526fda0f9205b10c00c82e34"}. When the ReactiveMongo Play plugin sees an object which first field is $oid, it treats it as an ObjectId so that the driver can send the right type for this value (BSONObjectID in this case.)
This is a more general problem actually: the JSON format does not match exactly the BSON one. That's the case for numeric types (BSONInteger, BSONLong, BSONDouble), BSONRegex, BSONDateTime, and BSONObjectID. You may find more detailed information in the MongoDB documentation: http://docs.mongodb.org/manual/reference/mongodb-extended-json/ .
I managed to solve it with:
val objectIds = ids.map(id => Json.obj("$oid" -> id))
val query = Json.obj("_id" -> Json.obj("$in" -> objectIds))
usersCollection.find(query).cursor[User].collect[List]()
since play-reactivemongo format considers BSONObjectID only when "$oid" is followed by string
implicit object BSONObjectIDFormat extends PartialFormat[BSONObjectID] {
def partialReads: PartialFunction[JsValue, JsResult[BSONObjectID]] = {
case JsObject(("$oid", JsString(v)) +: Nil) => JsSuccess(BSONObjectID(v))
}
val partialWrites: PartialFunction[BSONValue, JsValue] = {
case oid: BSONObjectID => Json.obj("$oid" -> oid.stringify)
}
}
Still, I hope there is a cleaner solution. If not, I guess it makes it a nice pull request.
I'm wondering if transforming id to BSONObjectID isn't more secure this way :
val ids: List[String] = ???
val bsonObjectIds = ids.map(BSONObjectID.parse(_)).collect{case Success(t) => t}
this will only generate valid BSONObjectIDs (and discard invalid ones)
If you do it this way :
val objectIds = ids.map(id => Json.obj("$oid" -> id))
your objectIds may not be valid ones depending on string id really being the stringify version of a BSONObjectID or not
If you import play.modules.reactivemongo.json._ it work without any $oid formatters.
import play.modules.reactivemongo.json._
...
val ids: Seq[BSONObjectID] = ???
val selector = Json.obj("_id" -> Json.obj("$in" -> ids))
usersCollection.find(selector).cursor[User].collect[Seq]()
I tried with the following and it worked for me:
val listOfItems = BSONArray(51, 61)
val query = BSONDocument("_id" -> BSONDocument("$in" -> listOfItems))
val ruleListFuture = bsonFutureColl.flatMap(_.find(query, Option.empty[BSONDocument]).cursor[ResponseAccDataBean]().
collect[List](-1, Cursor.FailOnError[List[ResponseAccDataBean]]()))