mongodb casbah and list handling

mongodb casbah and list handling - scala

I am having problems writing this function, which takes a string and returns a list of strings associated to it.
(I'm expecting entries like {_id: ...., hash: "abcde", n: ["a","b","ijojoij"]} in mongodb)
def findByHash(hash: Hash) = {
val dbobj = mongoColl.findOne(MongoDBObject("hash" -> hash.hashStr))
val n = dbobj match {
case Some(doc: com.mongodb.casbah.Imports.DBObject) => {
doc("n") match {
case Some(n: com.mongodb.casbah.Imports.DBObject) => {
Some(List[String]() ++ n map { x => x.asInstanceOf[String] })
}
case _ => {
None // hash match but no n in object
}
}
}
case _ => {
None // no hash match
}
}
n
}
Is there anything wrong with the code? Do you know how to correct it?

doc("n") returns AnyRef, so you should explicitly cast it to BasicDBList.
val n = doc("n").asInstanceOf[BasicDBList]
Some(List[String]() ++ n map { x => x.asInstanceOf[String] })

Related

Exception handling in Spark Scala UDF

def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = present.split(",")(0).split(":")
acc :+ (k.trim.toInt, v.trim.toDouble)
}
I am currently using the above UDF to parse a column of string into an array of keys and values.
"50:63.25,100:58.38" to [[50,63.2], [100,58.38]].
In some cases, the string is "\N" and I am unable to parse the column value.
If the string is "\N" then I should return an empty array. Can anyone help me to handle this exception or help me adding a new case? I am new to spark-scala.
Error: scala.MatchError: [Ljava.lang.String;#497cb6a9 (of class [Ljava.lang.String;)

You need to check that the resulting Array has two elements. You need a pattern matching like this to avoid that parse error:
def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = {
present.split(",")(0).split(":") match {
case Array(_) => Array("0", "0.0")
case arr => arr
}
}
acc :+ (k.trim.toInt, v.trim.toDouble)
}
}

Scala functional programming dry run

Could you please help me in understanding the following method:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
val globId: List[String] =
genArr.toList
.filter(_(0) == custDimIndex)
.map(custDim => custDim(1).toString)
globId match {
case Nil => ""
case x :: _ => x
}
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}

The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>, where the first element of the struct is an index, the second one an ID.
You could rewrite the code to be more readable:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.find(_(0) == custDimIndex)
.map(_(1).toString)
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
or even shorter with collectFirst:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}

Scala return variable type after future is complete Add Comment Collapse

I've got a problem with returning a list after handling futures in scala. My code looks like this:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
var elementArray: Seq[Element] = Seq()
arrayOfIds.map {
ids => ids.map(id => dto.getElementById(id).map {
case Some(element) => elementArray = elementArray :+ element
case None => println("Element not found")
})
}
arrayOfIds.onComplete(_ => elementArray)
}
I'd like to do something like .onComplete, however the return type is
Unit and I'd like to return a Future[Seq[Whatever]]. Is there clean way to handle futures like this? Thanks!

Please provide the type of function dto.getElementById. If it is Int => Future[Option[Element]], then:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
val allElements: Future[Seq[Option[Element]]] = arrayOfIds.flatMap( ids =>
Future.sequence(ids.map(dto.getElementById))
)
allElements.map(_.flatMap{
case None => println();None
case some => some
})
}
Without logging, it would be:
arrayOfIds.flatMap( ids => Future.traverse(ids.map(dto.getElementById))(_.flatten))

Instead of assigning the result to a mutable variable, return it from the continuation of the Future. You can use flatMap to extract only the Element results which actually contain a value:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
arrayOfIds.flatMap(id => Future.fold(id.map(getElementById))(Seq.empty[Element])(_ ++ _))
}

Tuple seen as Product, compiler rejects reference to element

Constructing phoneVector:
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (p == None) None
else
if (t == None) (p,"Main") else (p,t)
}
).filter(_ != None)
Consider this very simple snippet:
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
//val pKey = pTuple._1.replaceAll("[^\\d]","")
associate() // stub prints "associate"
}
When I run it, I see output like this:
scala.Tuple2
((609) 954-3815,Mobile)
associate
When I uncomment the line with replaceAll(), compile fails:
....scala:57: value _1 is not a member of Product with Serializable
[error] val pKey = pTuple._1.replaceAll("[^\\d]","")
[error] ^
Why does it not recognize pTuple as a Tuple2 and treat it only as Product

OK, this compiles and produces the desired result. But it's too verbose. Can someone please demonstrate a more concise solution for dealing with this typesafe stuff?
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
val pPhone = pTuple match {
case t:Tuple2[_,_] => t._1
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}

You can do:
for (pTuple <- phoneVector) {
val pPhone = pTuple match {
case (key, value) => key
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
Or simply phoneVector.map(_._1.replaceAll("[^\\d]",""))

By changing the construction of phoneVector, as wrick's question implied, I've been able to eliminate the match/case stuff because Tuple is assured. Not thrilled by it, but Change is Hard, and Scala seems cool.
Now, it's still possible to slip a None value into either of the Tuple values. My match/case does not check for that, and I suspect that could lead to a runtime error in the replaceAll call. How is that allowed?
def killNS (s:Option[_]) = {
(s match {
case _:Some[_] => s.get
case _ => None
}) match {
case None => None
case "" => None
case s => s
}
}
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (t == None) (p,"Main") else (p,t)
}
).filter(_._1 != None)
println(phoneVector)
println(name)
println
// Create the Neo4j nodes:
for (pTuple <- phoneVector) {
val pPhone = pTuple._1 match { case p:String => p }
val pType = pTuple._2
val pKey = pPhone.replaceAll(",.*","").replaceAll("[^\\d]","")
associate(Map("target"->Map("label"->"Phone","key"->pKey,
"dial"->pPhone),
"relation"->Map("label"->"IS_AT","key"->pType),
"source"->Map("label"->"Person","name"->name)
)
)
}
}

MongoDB update a document when exists already with ReactiveMongo

I'm writing a Scala web application that use MongoDB as database and ReactiveMongo as driver.
I've a collection named recommendation.correlation in which I saved the correlation between a product and a category.
A document has the following form:
{ "_id" : ObjectId("544f76ea4b7f7e3f6e2db224"), "category" : "c1", "attribute" : "c3:p1", "value" : { "average" : 0, "weight" : 3 } }
Now I'm writing a method as following:
def calculateCorrelation: Future[Boolean] = {
def calculate(category: String, tag: String, similarity: List[Similarity]): Future[(Double, Int)] = {
println("Calculate correlation of " + category + " " + tag)
val value = similarity.foldLeft(0.0, 0)( (r, c) => if(c.tag1Name.split(":")(0) == category && c.tag2Name == tag) (r._1 + c.eq, r._2 + 1) else r
) //fold the tags
val sum = value._1
val count = value._2
val result = if(count > 0) (sum/count, count) else (0.0, 0)
Future{result}
}
play.Logger.debug("Start Correlation")
Similarity.all.toList flatMap { tagsMatch =>
val tuples =
for {
i<- tagsMatch
} yield (i.tag1Name.split(":")(0), i.tag2Name) // create e List[(String, String)] containing the category and productName
val res = tuples map { el =>
calculate(el._1, el._2, tagsMatch) flatMap { value =>
val correlation = Correlation(el._1, el._2, value._1, value._2) // create the correlation
val query = Json.obj("category" -> value._1, "attribute" -> value._2)
Correlations.find(query).one flatMap(element => element match {
case Some(x) => Correlations.update(query, correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
case None => Correlations.save(correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
}
)
}
}
val result = if(res.exists(_ equals false)) false else true
Future{result}
}
The problem is that the method insert duplicated documents.
Why this happen??
I've solved using db.recommendation.correlation.ensureIndex({"category": 1, "attribute": 1}, {"unique": true, "dropDups":true }), but how can I fixed the problem without using indexes??
What's wrong??

What you want to do is an in-place update. To do that with ReactiveMongo you need to use an update operator to tell it which fields to update, and how. Instead, you've passed correlation (which I assume is some sort of BSONDocument) to the collection's update method. That simply requests replacement of the document, which if the unique index value is different will cause a new document to be added to the collection. Instead of passing correlation you should pass a BSONDocument that uses one of the update operators such as $set (set a field) or $incr (increment a numeric field by one). For details on doing that, please see the MongoDB Documentation, Modify Document

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

mongodb casbah and list handling - scala

doc("n") returns AnyRef, so you should explicitly cast it to BasicDBList. val n = doc("n").asInstanceOf[BasicDBList] Some(List[String]() ++ n map { x => x.asInstanceOf[String] })

Related

Exception handling in Spark Scala UDF

Scala functional programming dry run

Scala return variable type after future is complete Add Comment Collapse

Tuple seen as Product, compiler rejects reference to element

MongoDB update a document when exists already with ReactiveMongo

Categories

Resources