mongodb casbah and list handling - scala

I am having problems writing this function, which takes a string and returns a list of strings associated to it.
(I'm expecting entries like {_id: ...., hash: "abcde", n: ["a","b","ijojoij"]} in mongodb)
def findByHash(hash: Hash) = {
val dbobj = mongoColl.findOne(MongoDBObject("hash" -> hash.hashStr))
val n = dbobj match {
case Some(doc: com.mongodb.casbah.Imports.DBObject) => {
doc("n") match {
case Some(n: com.mongodb.casbah.Imports.DBObject) => {
Some(List[String]() ++ n map { x => x.asInstanceOf[String] })
}
case _ => {
None // hash match but no n in object
}
}
}
case _ => {
None // no hash match
}
}
n
}
Is there anything wrong with the code? Do you know how to correct it?

doc("n") returns AnyRef, so you should explicitly cast it to BasicDBList.
val n = doc("n").asInstanceOf[BasicDBList]
Some(List[String]() ++ n map { x => x.asInstanceOf[String] })

Related

Exception handling in Spark Scala UDF

def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = present.split(",")(0).split(":")
acc :+ (k.trim.toInt, v.trim.toDouble)
}
I am currently using the above UDF to parse a column of string into an array of keys and values.
"50:63.25,100:58.38" to [[50,63.2], [100,58.38]].
In some cases, the string is "\N" and I am unable to parse the column value.
If the string is "\N" then I should return an empty array. Can anyone help me to handle this exception or help me adding a new case? I am new to spark-scala.
Error: scala.MatchError: [Ljava.lang.String;#497cb6a9 (of class [Ljava.lang.String;)
You need to check that the resulting Array has two elements. You need a pattern matching like this to avoid that parse error:
def parse_values(value: String) = {
val values = value.split(",").map(_.trim)
values.foldLeft(Array[(Int, Double)]()) {
case (acc, present) =>
val Array(k, v) = {
present.split(",")(0).split(":") match {
case Array(_) => Array("0", "0.0")
case arr => arr
}
}
acc :+ (k.trim.toInt, v.trim.toDouble)
}
}

Scala functional programming dry run

Could you please help me in understanding the following method:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf[String,Seq[GenericRowWithSchema]](genArr => {
val globId: List[String] =
genArr.toList
.filter(_(0) == custDimIndex)
.map(custDim => custDim(1).toString)
globId match {
case Nil => ""
case x :: _ => x
}
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
The method applies an UDF to to dataframe. The UDF seems intended to extract a single ID from column of type array<struct>, where the first element of the struct is an index, the second one an ID.
You could rewrite the code to be more readable:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.find(_(0) == custDimIndex)
.map(_(1).toString)
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}
or even shorter with collectFirst:
def extractGlobalID(custDimIndex :Int)(gaData:DataFrame) : DataFrame = {
val getGlobId = udf((genArr : Seq[Row]) => {
genArr
.collectFirst{case r if(r.getInt(0)==custDimIndex) => r.getString(1)}
.getOrElse("")
})
gaData.withColumn("globalId", getGlobId('customDimensions))
}

Scala return variable type after future is complete Add Comment Collapse

I've got a problem with returning a list after handling futures in scala. My code looks like this:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
var elementArray: Seq[Element] = Seq()
arrayOfIds.map {
ids => ids.map(id => dto.getElementById(id).map {
case Some(element) => elementArray = elementArray :+ element
case None => println("Element not found")
})
}
arrayOfIds.onComplete(_ => elementArray)
}
I'd like to do something like .onComplete, however the return type is
Unit and I'd like to return a Future[Seq[Whatever]]. Is there clean way to handle futures like this? Thanks!
Please provide the type of function dto.getElementById. If it is Int => Future[Option[Element]], then:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
val allElements: Future[Seq[Option[Element]]] = arrayOfIds.flatMap( ids =>
Future.sequence(ids.map(dto.getElementById))
)
allElements.map(_.flatMap{
case None => println();None
case some => some
})
}
Without logging, it would be:
arrayOfIds.flatMap( ids => Future.traverse(ids.map(dto.getElementById))(_.flatten))
Instead of assigning the result to a mutable variable, return it from the continuation of the Future. You can use flatMap to extract only the Element results which actually contain a value:
def getElements(arrayOfIds: Future[Seq[Int]]): Future[Seq[Element]] = {
arrayOfIds.flatMap(id => Future.fold(id.map(getElementById))(Seq.empty[Element])(_ ++ _))
}

Tuple seen as Product, compiler rejects reference to element

Constructing phoneVector:
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (p == None) None
else
if (t == None) (p,"Main") else (p,t)
}
).filter(_ != None)
Consider this very simple snippet:
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
//val pKey = pTuple._1.replaceAll("[^\\d]","")
associate() // stub prints "associate"
}
When I run it, I see output like this:
scala.Tuple2
((609) 954-3815,Mobile)
associate
When I uncomment the line with replaceAll(), compile fails:
....scala:57: value _1 is not a member of Product with Serializable
[error] val pKey = pTuple._1.replaceAll("[^\\d]","")
[error] ^
Why does it not recognize pTuple as a Tuple2 and treat it only as Product
OK, this compiles and produces the desired result. But it's too verbose. Can someone please demonstrate a more concise solution for dealing with this typesafe stuff?
for (pTuple <- phoneVector) {
println(pTuple.getClass.getName)
println(pTuple)
val pPhone = pTuple match {
case t:Tuple2[_,_] => t._1
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
You can do:
for (pTuple <- phoneVector) {
val pPhone = pTuple match {
case (key, value) => key
case _ => None
}
val pKey = pPhone match {
case s:String => s.replaceAll("[^\\d]","")
case _ => None
}
println(pKey)
associate()
}
Or simply phoneVector.map(_._1.replaceAll("[^\\d]",""))
By changing the construction of phoneVector, as wrick's question implied, I've been able to eliminate the match/case stuff because Tuple is assured. Not thrilled by it, but Change is Hard, and Scala seems cool.
Now, it's still possible to slip a None value into either of the Tuple values. My match/case does not check for that, and I suspect that could lead to a runtime error in the replaceAll call. How is that allowed?
def killNS (s:Option[_]) = {
(s match {
case _:Some[_] => s.get
case _ => None
}) match {
case None => None
case "" => None
case s => s
}
}
val phoneVector = (
for (i <- 1 until 20) yield {
val p = killNS(r.get("Phone %d - Value" format(i)))
val t = killNS(r.get("Phone %d - Type" format(i)))
if (t == None) (p,"Main") else (p,t)
}
).filter(_._1 != None)
println(phoneVector)
println(name)
println
// Create the Neo4j nodes:
for (pTuple <- phoneVector) {
val pPhone = pTuple._1 match { case p:String => p }
val pType = pTuple._2
val pKey = pPhone.replaceAll(",.*","").replaceAll("[^\\d]","")
associate(Map("target"->Map("label"->"Phone","key"->pKey,
"dial"->pPhone),
"relation"->Map("label"->"IS_AT","key"->pType),
"source"->Map("label"->"Person","name"->name)
)
)
}
}

MongoDB update a document when exists already with ReactiveMongo

I'm writing a Scala web application that use MongoDB as database and ReactiveMongo as driver.
I've a collection named recommendation.correlation in which I saved the correlation between a product and a category.
A document has the following form:
{ "_id" : ObjectId("544f76ea4b7f7e3f6e2db224"), "category" : "c1", "attribute" : "c3:p1", "value" : { "average" : 0, "weight" : 3 } }
Now I'm writing a method as following:
def calculateCorrelation: Future[Boolean] = {
def calculate(category: String, tag: String, similarity: List[Similarity]): Future[(Double, Int)] = {
println("Calculate correlation of " + category + " " + tag)
val value = similarity.foldLeft(0.0, 0)( (r, c) => if(c.tag1Name.split(":")(0) == category && c.tag2Name == tag) (r._1 + c.eq, r._2 + 1) else r
) //fold the tags
val sum = value._1
val count = value._2
val result = if(count > 0) (sum/count, count) else (0.0, 0)
Future{result}
}
play.Logger.debug("Start Correlation")
Similarity.all.toList flatMap { tagsMatch =>
val tuples =
for {
i<- tagsMatch
} yield (i.tag1Name.split(":")(0), i.tag2Name) // create e List[(String, String)] containing the category and productName
val res = tuples map { el =>
calculate(el._1, el._2, tagsMatch) flatMap { value =>
val correlation = Correlation(el._1, el._2, value._1, value._2) // create the correlation
val query = Json.obj("category" -> value._1, "attribute" -> value._2)
Correlations.find(query).one flatMap(element => element match {
case Some(x) => Correlations.update(query, correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
case None => Correlations.save(correlation) flatMap {status => status match {
case LastError(ok, _, _, _, _, _, _) => Future{true}
case _ => Future{false}
}
}
}
)
}
}
val result = if(res.exists(_ equals false)) false else true
Future{result}
}
The problem is that the method insert duplicated documents.
Why this happen??
I've solved using db.recommendation.correlation.ensureIndex({"category": 1, "attribute": 1}, {"unique": true, "dropDups":true }), but how can I fixed the problem without using indexes??
What's wrong??
What you want to do is an in-place update. To do that with ReactiveMongo you need to use an update operator to tell it which fields to update, and how. Instead, you've passed correlation (which I assume is some sort of BSONDocument) to the collection's update method. That simply requests replacement of the document, which if the unique index value is different will cause a new document to be added to the collection. Instead of passing correlation you should pass a BSONDocument that uses one of the update operators such as $set (set a field) or $incr (increment a numeric field by one). For details on doing that, please see the MongoDB Documentation, Modify Document