I have a problematic mongo query. Problematic because it takes 10-12seconds and that's is why I am looking for a different implementation.
The query is trying to perform a count of how many items are in the collection. I'm sure there is a better way to perform the count.
Current implementation:
def count(criteria: Option[JsObject], skip: Int, limit: Option[Int]): Future[Long] =
(for {
col <- collection
counted <- col.count(
selector = criteria,
limit = limit,
skip = skip,
hint = None,
readConcern = ReadConcern.Majority
)
} yield counted) recoverWith {
case error =>
logger.error(
s"failed to count by [${criteria.getOrElse(JsObject.empty)}]" +
s" with error: [${error.getMessage}]"
, error)
Future.failed(MongoExceptionBuilder.buildError(error))
}
I've gone through documentation and I found aggregateWith command that could cooperate with the count function. I tried to implement it on myself but I failed.
http://reactivemongo.org/releases/0.1x/documentation/advanced-topics/aggregation.html#count
def count(criteria: Option[JsObject], skip: Int, limit: Option[Int]): Future[Long] =
(for {
col <- collection
counted <- col.aggregateWith[Long]() { framework =>
import framework.{Count, Group, Match}
(Match(criteria.getOrElse(JsObject.empty)),List(Count("count")))
}.head
} yield counted) recoverWith {
case error =>
logger.error(
s"failed to count by [${criteria.getOrElse(JsObject.empty)}]" +
s" with error: [${error.getMessage}]"
, error)
Future.failed(MongoExceptionBuilder.buildError(error))
}
The error I see:
lt-dispatcher-4 c.i.d.c.c.m.PlayerProfileDAOapplyOrElse(line:95) failed to count by [{}] with error: [JsResultException(errors:List((,List(JsonValidationError(List(error.expected.jsnumber),WrappedArray())))))]
play.api.libs.json.JsResultException: JsResultException(errors:List((,List(JsonValidationError(List(error.expected.jsnumber),WrappedArray())))))
Related
I am trying to count number of records, and if i get them return the records else just log the error.
Following is the snippet:
val records: Any = Try(count()) match {
case Success(records) => records
case Failure(exception) => logger.error(s"Exception occurred")
}
The data type of count method is Int but because of wrapping it in Try, getting type as any.
How can i solve the problem?
what is a return type of count method / field returns. If you want just ignore the exception and print it then continue, you should have done something like this:
val records: List[Int] = Try(count()) match {
case Success(records) => records
case Failure(exception) =>
log.error("Exception occurred", exception)
List.empty
}
def count(): List[Int] = List(1, 2 ,3)
I am trying to pass a List[String] into a query and then match on possibly multiple nodes by a property, where the value of the property is the string passed into the query.
I get an error - Expected parameter(s): list
import org.neo4j.driver.v1._
def getNodesByPropertyValue(list: List[String]): Future[List[(String, String)]] = {
val getNodes =
s"""
| UNWIND $$list AS propValue
| MATCH (i: item {id: propValue})<-[:CONTAINS]-(c: Collection)
| RETURN i.originalID AS OID
|""".stripMargin
storeAPI.NeoQuery(getNodes).resultList().map {
result =>
result.map {
record =>
record.get("OID").toString
}
}).recoverWith {
case e: Exception =>
logger.error(s"Failure in getNodesByProperty: ", e)
throw e
}
}
Also, when I use $list instead, I get an error saying Neo4J doesn't recognise the function List().
A solution to this would be appreciated.
Also, what is the difference between passing a variable with $ into a query, than passing a variable with $$? I thought the $$ might be used for collections but I am unsure, I haven't found information on it yet.
Thanks.
In my storeAPI.NeoQuery I was missing the parameter that maps the string $$list in the query, to the val list outside of the query.
Working version below.
import org.neo4j.driver.v1._
def getNodesByPropertyValue(list: List[String]): Future[List[(String, String)]] = {
val getNodes =
s"""
| UNWIND $$list AS propValue
| MATCH (i: item {id: propValue})<-[:CONTAINS]-(c: Collection)
| RETURN i.originalID AS OID
|""".stripMargin
storeAPI.NeoQuery(getNodes, Map("list" -> list.asJava)).resultList().map {
result =>
result.map {
record =>
record.get("OID").toString
}
}).recoverWith {
case e: Exception =>
logger.error(s"Failure in getNodesByProperty: ", e)
throw e
}
}
I'm writing code in scala/play with anorm/postgres for match generation based on users profiles. The following code works, but I've commented out the section that is causing problems, the while loop. I noticed while running it that the first 3 Futures seem to work synchronously but the problem comes when I'm retrieving the count of rows in the table in the fourth step.
The fourth step returns the count before the above insert's actually happened. As far as I can tell, steps 1-3 are being queued up for postgres synchronously, but the call to retrieve the count seems to return BEFORE the first 3 steps complete, which makes no sense to me. If the first 3 steps get queued up in the correct order, why wouldn't the fourth step wait to return the count until after the inserts happen?
When I uncomment the while loop, the match generation and insert functions are called until memory runs out, as the count returned is continually below the desired threshold.
I know the format itself is subpar, but my question is not about how to write the most elegant scala code, but merely how to get it to work for now.
def matchGeneration(email:String,itNum:Int) = {
var currentIterationNumber = itNum
var numberOfMatches = MatchData.numberOfCurrentMatches(email)
while(numberOfMatches < 150){
Thread.sleep(25000)//delay while loop execution time
generateUsers(email) onComplete {
case(s) => {
print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 1")
Thread.sleep(5000)//Time for initial user generation to take place
genDemoMatches(email, currentIterationNumber) onComplete {
case (s) => {
print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 2")
genIntMatches(email,currentIterationNumber) onComplete {
case(s) => {
print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 3")
genSchoolWorkMatches(email,currentIterationNumber) onComplete {
case(s) => {
Thread.sleep(10000)
print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 4")
incrementNumberOfMatches(email) onComplete {
case(s) => {
currentIterationNumber+=1
println(s"current number of matches: $numberOfMatches")
println(s"current Iteration: $currentIterationNumber")
}
}
}
}
}
}
}
}
}
}
//}
}
The match functions are defined as futures, such as :
def genSchoolWorkMatches(email:String,currentIterationNumber:Int):Future[Unit]=
Future(genUsersFromSchoolWorkData(email, currentIterationNumber))
genUsersFromSchoolWorkData(email:String) follows the same form as the other two. It is a function that initially gets all the school/work fields that a user has filled out in their profile ( SELECT major FROM school_work where email='$email') and it generates a dummyUser that contains one of those fields in common with this user of email:String. It would take about 30-40 lines of code to print this function so I can explain it further if need be.
I have edited my code, the only way I found so far to get this to work was by hacking it with Thread.sleep(). I think the problem may lie with anorm
as my Future logic constructs did work as I expected, but the problem lies in the inconsistency of when writes occur versus what the read returns. The numberOfCurrentMatches(email:String) function returns the number of matches as it is a simple SELECT count(email) from table where email='$email'. The problem is that sometimes after inserting 23 matches the count returns as 0, then after a second iteration it will return 46. I assumed that the onComplete() would bind to the underlying anorm function defined with DB.withConnection() but apparently it may be too far removed to accomplish this. I am not really sure at this point what to research or look up further to try to get around this problem, rather than writing a separate sort of supervisor function to return at a value closer to 150.
UPDATE
Thanks to the advice of user's here, and trying to understand Scala's documentation at this link: Scala Futures and Promises
I have updated my code to be a bit more readable and scala-esque:
def genMatchOfTypes(email:String,iterationNumber:Int) = {
genDemoMatches(email,iterationNumber)
genIntMatches(email,iterationNumber)
genSchoolWorkMatches(email,iterationNumber)
}
def matchGeneration(email:String) = {
var currentIterationNumber = 0
var numberOfMatches = MatchData.numberOfCurrentMatches(email)
while (numberOfMatches < 150) {
println(s"current number of matches: $numberOfMatches")
Thread.sleep(30000)
generateUsers(email)
.flatMap(users => genMatchOfTypes(email,currentIterationNumber))
.flatMap(matches => incrementNumberOfMatches(email))
.map{
result =>
currentIterationNumber += 1
println(s"current Iteration2: $currentIterationNumber")
numberOfMatches = MatchData.numberOfCurrentMatches(email)
println(s"current number of matches2: $numberOfMatches")
}
}
}
I still am heavily dependent upon the Thread.sleep(30000) to provide enough time to run through the while loop before it tries to loop back again. It's still an unwieldy hack. When I uncomment the Thread.sleep()
my output in bash looks like this:
users for match generation createdcurrent number of matches: 0
[error] c.MatchDataController - here is the list: jnkj
[error] c.MatchDataController - here is the list: hbhjbjjnkjn
current number of matches: 0
current number of matches: 0
current number of matches: 0
current number of matches: 0
current number of matches: 0
This of course is a truncated output. It runs like this over and over until I get errors about too many open files and the JVM/play server crashes entirely.
One solution is to use Future.traverse for known iteration count
Implying
object MatchData {
def numberOfCurrentMatches(email: String) = ???
}
def generateUsers(email: String): Future[Unit] = ???
def incrementNumberOfMatches(email: String): Future[Int] = ???
def genDemoMatches(email: String, it: Int): Future[Unit] = ???
def genIntMatches(email: String, it: Int): Future[Unit] = ???
def genSchoolWorkMatches(email: String, it: Int): Future[Unit] = ???
You can write code like
def matchGeneration(email: String, itNum: Int) = {
val numberOfMatches = MatchData.numberOfCurrentMatches(email)
Future.traverse(Stream.range(itNum, 150 - numberOfMatches + itNum)) { currentIterationNumber => for {
_ <- generateUsers(email)
_ = print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 1")
_ <- genDemoMatches(email, currentIterationNumber)
_ = print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 2")
_ <- genIntMatches(email, currentIterationNumber)
_ = print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 3")
_ <- genSchoolWorkMatches(email, currentIterationNumber)
_ = Thread.sleep(15000)
_ = print(s">>>>>>>>>>>>>>>>>>>>>>>>>>>STEP 4")
numberOfMatches <- incrementNumberOfMatches(email)
_ = println(s"current number of matches: $numberOfMatches")
_ = println(s"current Iteration: $currentIterationNumber")
} yield ()
}
Update
If you urged to check some condition each time, one way is to use cool monadic things from scalaz library. It have definition of monad for scala.Future so we can replace word monadic with asynchronous when we want to
For example StreamT.unfoldM can create conditional monadic(asynchronous) loop, even if we don need elements of resulting collection we still can use it just for iteration.
Lets define your
def generateAll(email: String, iterationNumber: Int): Future[Unit] = for {
_ <- generateUsers(email)
_ <- genDemoMatches(email, iterationNumber)
_ <- genIntMatches(email, iterationNumber)
_ <- genSchoolWorkMatches(email, iterationNumber)
} yield ()
Then iteration step
def generateStep(email: String, limit: Int)(iterationNumber: Int): Future[Option[(Unit, Int)]] =
if (MatchData.numberOfCurrentMatches(email) >= limit) Future(None)
else for {
_ <- generateAll(email, iterationNumber)
_ <- incrementNumberOfMatches(email)
next = iterationNumber + 1
} yield Some((), next)
Now our resulting function simplifies to
import scalaz._
import scalaz.std.scalaFuture._
def matchGeneration(email: String, itNum: Int): Future[Unit] =
StreamT.unfoldM(0)(generateStep(email, 150) _).toStream.map(_.force: Unit)
It looks like synchronous method MatchData.numberOfCurrentMatches is reacting on your asynchronous modification inside the incrementNumberOfMatches. Note that generally it could lead to disastrous results and you probably need to move that state inside some actor or something like that
I'm writing a Scala web application that use MongoDB as database and ReactiveMongo as driver.
I've a collection named recommendation.correlation in which I saved the correlation between a product and a category.
A document has the following form:
{ "_id" : ObjectId("544f76ea4b7f7e3f6e2db224"), "category" : "c1", "attribute" : "c3:p1", "value" : { "average" : 0, "weight" : 3 } }
Now I'm writing a method as following:
def calculateCorrelation: Future[Boolean] = {
def calculate(category: String, tag: String, similarity: List[Similarity]): Future[(Double, Int)] = {
println("Calculate correlation of " + category + " " + tag)
val value = similarity.foldLeft(0.0, 0)( (r, c) => if(c.tag1Name.split(":")(0) == category && c.tag2Name == tag) (r._1 + c.eq, r._2 + 1) else r
) //fold the tags
val sum = value._1
val count = value._2
val result = if(count > 0) (sum/count, count) else (0.0, 0)
Future{result}
}
play.Logger.debug("Start Correlation")
Similarity.all.toList flatMap { tagsMatch =>
val tuples =
for {
i<- tagsMatch
} yield (i.tag1Name.split(":")(0), i.tag2Name) // create e List[(String, String)] containing the category and productName
val res = tuples map { el =>
calculate(el._1, el._2, tagsMatch) flatMap { value =>
val correlation = Correlation(el._1, el._2, value._1, value._2) // create the correlation
val query = Json.obj("category" -> value._1, "attribute" -> value._2)
Correlations.find(query).one flatMap(element => element match {
case Some(x) => Correlations.update(query, correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
case None => Correlations.save(correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
}
)
}
}
val result = if(res.exists(_ equals false)) false else true
Future{result}
}
The problem is that the method insert duplicated documents.
Why this happen??
I've solved using db.recommendation.correlation.ensureIndex({"category": 1, "attribute": 1}, {"unique": true, "dropDups":true }), but how can I fixed the problem without using indexes??
What's wrong??
What you want to do is an in-place update. To do that with ReactiveMongo you need to use an update operator to tell it which fields to update, and how. Instead, you've passed correlation (which I assume is some sort of BSONDocument) to the collection's update method. That simply requests replacement of the document, which if the unique index value is different will cause a new document to be added to the collection. Instead of passing correlation you should pass a BSONDocument that uses one of the update operators such as $set (set a field) or $incr (increment a numeric field by one). For details on doing that, please see the MongoDB Documentation, Modify Document
I'm parallelising over a collection to count the number same item values in a List. The list in this case is uniqueSetOfLinks :
for (iListVal <- uniqueSetOfLinks.par) {
try {
val num : Int = listOfLinks.count(_.equalsIgnoreCase(iListVal))
linkTotals + iListVal -> num
}
catch {
case e : Exception => {
e.printStackTrace()
}
}
}
linkTotals is an immutable Map. To gain a reference to the total number of links do I need to update linkTotals so that it is immutable ?
I can then do something like :
linkTotals.put(iListVal, num)
You can't update immutable collection, all you can do is to combine immutable collection with addition element to get new immutable collection, like this:
val newLinkTotals = linkTotals + (iListVal -> num)
In case of collection you could create new collection of pairs and than add all pairs to the map:
val optPairs =
for (iListVal <- uniqueSetOfLinks.par)
yield
try {
val num : Int = listOfLinks.count(_.equalsIgnoreCase(iListVal))
Some(iListVal -> num)
}
catch {
case e : Exception => e.printStackTrace()
None
}
val newLinkTotals = linkTotals ++ optPairs.flatten // for non-empty initial map
val map = optPairs.flatten.toMap // in case there is no initial map
Note that you are using parallel collections (.par), so you should not use mutable state, like linkTotals += iListVal -> num.
Possible variation of #senia's answer (got rid of explicit flatten):
val optPairs =
(for {
iListVal <- uniqueSetOfLinks.par
count <- {
try
Some(listOfLinks.count(_.equalsIgnoreCase(iListVal)))
catch {
case e: Exception =>
e.printStackTrace()
None
}
}
} yield iListVal -> count) toMap
I think that you need some form of MapReduce in order to have parallel number of items estimation.
In your problem you already have all unique links. The partial intermediate result of map is simply a pair. And "reduce" is just toMap. So you can simply par-map the link to pair (link-> count) and then finally construct a map:
def count(iListVal:String) = listOfLinks.count(_.equalsIgnoreCase(iListVal))
val listOfPairs = uniqueSetOfLinks.par.map(iListVal => Try( (iListVal, count(iListVal)) ))
("map" operation is par-map)
Then remove exceptions:
val clearListOfPairs = listOfPairs.flatMap(_.toOption)
And then simply convert it to a map ("reduce"):
val linkTotals = clearListOfPairs.toMap
(if you need to check for exceptions, use Try.failure)