scoobi concatenate DLists issue - scala

I have a scala scoobi code (that i execute locally for now) of the following structure:
def run() = {
val myvalue1 = processSource.source1("mypath/inputData/", references) // has several subfolders with input files that should be processed and some lines in these files are to be rejected due to some reasons and written to a log
val (partitionedVal1, partitionedVal2) = myvalue1.partition {
case Left(x) => true
case _ => false}
val output = processData.process(partitionedVal1) // some processing
val getRidOfRight = partitionedVal2.map(x => x.right.get)
persist(getRidOfRight.toTextFile("output/log", overwrite = true), output.toTextFile("output/processed_data", overwrite = true))
}
And the processSource is an object as:
object processSource {
def source1(a:DList[Either[Class1, String]] , b:DList[Either[Class2, String]] ) = {
val (a1, a2) = a.partition {
case Left(x) => true
case _ => false}
val (b1, b2) = b.partition {
case Left(x) => true
case _ => false}
val joinedAB1:DList[Either[Class1, String]] = a1.by(_.myKey).joinLeft(b1.by(_.myOtherKey)).map {
case (key, (val1, Some(val2))) => Left(val1.withCommon(val2))
case (key,(val1, None)) => Left(val1)}
val AB2:DList[Either[Class1, String]] = b2.map{ x => Right(x.right.get)}
AB1 ++ AB2 ++ b1
}
}
Now, my problem is that the getRidOfRight returns only part of the output it should return. WHY?

github.com/NICTA/scoobi/issues/328 its a bug in scoobi - solution is to use scoobi 0.9.1 and "collect" instead of "partition" ("partition" is not fixed)

Related

Trying to return a Map but an Iterable is currently being returned

Why is my val pairOpt a Option[Option[String,String]]?
I am trying to have it so it returns Option[(String, String)].
def blah(..): Map[String, String] = {
val map: Map[String, String] = //
val boolTry = Try(map.getOrElse("key1", "").trim.toBoolean)
val intTry = Try(map.getOrElse("key2, "").trim.toInt)
val pairOpt: Option[Option[(String, String)]] = for {
b <- boolTry.toOption
i <- intTry.toOption
} yield {
val res: Option[String] = (b, i) match {
case (true, 1) => Some("a")
case (false, 2 | 3 | 7) => Some("b")
case (true, 5 | 9 | 11) => Some("c")
case _ => None
}
res.map("foo" -> _)
}
map ++ pairOpt // map + ("foo" -> "c")
}
The return value is also currenly Iterable[Product with Serializable] when I want it to be Map[String,String].
What am I missing here?
You get an Option[Option[..]] because you have two "layers" of "optionality" here that you have to flatten:
If one of boolTry or intTry is a Failure - you'll get None
Else, if they're both Success but their values don't match anything, you'll get Some(None)
Otherwise, you'll get Some(Some(..))
More generally, given a opt: Option[V], the type of an expression of the form:
for {
x <- opt
..
} yield {
val y: T
y
}
is Option[T] - because it translates into opt.flatMap(...).map(...) which preserves the "external" structure (be it an Option, List, Seq etc.). In your case, T = Option[(String, String)], so the result has the type Option[Option[(String, String)]]. To fix this - you can use flatten:
val pairOpt: Option[(String, String)] = (for {
b <- boolTry.toOption
i <- intTry.toOption
} yield {
// same as you did...
}).flatten
Which would also fix the issue with the method's return type (now map ++ pairOpt will have the type Map[String, String] as expected).
To avoid this call to flatten - perhaps a cleaner way to achieve the same would be:
val maybeTuple: Option[(Boolean, Int)] = boolTry.flatMap(b => intTry.map((b, _))).toOption
val pairOpt: Option[(String, String)] = maybeTuple.flatMap {
case (true, 1) => Some("a")
case (false, 2 | 3 | 7) => Some("b")
case (true, 5 | 9 | 11) => Some("c")
case _ => None
}.map("foo" -> _)

Case Classes w/ Option Parameters & Case Matching

I have a case class that has multiple parameters of which some are Options. Here is a simplified example:
case class Foobar(a: String, b: Option[String], c: Option[CustomClass])
I want to be able to match cases of Foobar where b and/or c is not None. For example, one case could be:
testResult match {
case Foobar("str1", Some(_), None) => "good"
case Foobar("str2", None, Some(_)) => "ok"
case _ => "bad"
}
Furthermore, I want to reference the case patterns via variables and this is where I'm stuck. I want to do something like the following:
val goodPat = Foobar("str1", Some(_), None) // compile fail
val okPat = Foobar("str2", None, Some(_)) // compile fail
testResult match {
case `goodPat` => "good"
case `okPat` => "ok"
case _ => "bad"
}
Is something like this possible? Is there another way to specify "not None"? Is there another way to approach this problem?
EDIT: I'm adding more details and context to the question. I have a large List of 2-tuples representing unit tests for a particular function. The 2-tuples represent the input and expected output. Eg.
// imagine this list is much bigger and Foobar contains more Option parameters
val tests = List(
("test1", Foobar("idkfa", None, None)),
// I know these fail to compile but I need to do something like this
("test2", Foobar("idclip", Some("baz"), Some(_)),
("test3", Foobar("iddqd", Some(_), None)
)
tests.foreach(test => {
val (input, expected) = test
myFunction(input) match {
case `expected` => println("ok")
case _ => println("bad")
}
})
I think you seeking for something like this:
case class DoomOpt(s: String)
case class Foobar(a: String, b: Option[String], c: Option[DoomOpt])
def myFunction(s: String): Foobar = { // your code here }
val tests = Map[String, PartialFunction[Foobar, Unit]](
"idkfa" → { case Foobar("idkfa", None, None) ⇒ },
"test2" → { case Foobar("idclip", Some("baz"), Some(_)) ⇒ },
"test3" → { case Foobar("iddqd", Some(_), None) ⇒ },
"idspispopd" → { case Foobar("idspispopd", Some(_), None) ⇒ }
)
tests.foreach { case (input, checker) =>
if (checker.isDefinedAt(myFunction(input)))
println("ok")
else
println("bad")
}
Pattern matching uses extractors which provide the unapply function to deconstruct the object. So... you can just supply your custom extractor in this case. Create a list of these extractor test cases and apply them one by one.
case class Foobar(s: String, o: Option[Int])
trait TestExtractor {
def unapply(fbar: Foobar): Boolean
}
object somePatExtractor extends TestExtractor {
def unapply(fbar: Foobar): Boolean = fbar match {
case Foobar("yes", Some(_)) => true
case _ => false
}
}
object nonePatExtractor extends TestExtractor {
def unapply(fbar: Foobar): Boolean = fbar match {
case Foobar("yes", None) => true
case _ => false
}
}
object bazPatExtractor extends TestExtractor {
def unapply(fbar: Foobar): Boolean = fbar match {
case Foobar("yes", Some("baz")) => true
case _ => false
}
}
val testList: List[(String, TestExtractor)] = List(("test1", nonePatExtractor), ("test2", bazPatExtractor), ("test3", somePatExtractor))
val fooToTest = Foobar("yes", Some(5))
testList.foreach({
case (testName, extractor) => {
fooToTest match {
case pat # extractor() => println("testName :: " + testName + ", Result :: ok")
case _ => println("testName :: " + testName + ", Result :: bad")
}
}
})
And if you are looking for a more extendible approach then you can consider something like following,
case class Foobar(s: String, o1: Option[Int], o2: Option[String])
case class TestCondition(df: Foobar => Boolean) {
def test(foobar: Foobar): Boolean = df(foobar)
}
val o1IsNone = TestCondition(f => f.o1.isEmpty)
val o1IsSome = TestCondition(f => f.o1.isDefined)
val o2IsNone = TestCondition(f => f.o2.isEmpty)
val o2IsSome = TestCondition(f => f.o2.isDefined)
case class TestCase(tcs: List[TestCondition]) {
def test(foobar: Foobar) = tcs.foldLeft(true)({ case (acc, tc) => acc && tc.test(foobar) })
}
val testList = List[(String, TestCase)](
("test1", TestCase(List(o1IsSome, o2IsSome))),
("test2", TestCase(List(o1IsSome, o2IsNone))),
("test3", TestCase(List(o1IsNone, o2IsSome))),
("test4", TestCase(List(o1IsNone, o2IsNone)))
)
val foobarToTest = Foobar("yes", Some(5), None)
testList.foreach({
case (testName, testCase) => {
foobarToTest match {
case foobar: Foobar if testCase.test(foobar) => println("testName :: " + testName + ", Result :: ok")
case _ => println("testName :: " + testName + ", Result :: bad")
}
}
})

What is the simplest method to extract success from a list of Scala futures?

Let's say we have a sequence of Scala Future results like val futures: Seq[Future[Boolean]] and let's say I'd like to get a Future[Boolean] that returns the first true result, but otherwise fails or returns false normally.
At the moment the current implementation I have is something like this:
// convert to Seq[Future[Try[Boolean]]]
val tries = futures.map{ _.map(Success(_)).recover{ case t: Throwable => Failure(t) } }
val sequence = Future.sequence(tries)
// first positive
val positive = sequence.map{ _.find{ case Success(true) => true; case _ => false } }
val nonpositive = sequence.map{ _.find{ case Success(true) => false; case _ => true } }
// prioritise positive results over other results
val futureTry =
for {
p <- positive
n <- nonpositive
} yield( p.orElse(n) )
// convert from Future[Try[Boolean]] back to Future[Boolean]
val future = futureTry.map { _.map { t => t.get } }
Is there a simpler way to write the above?
You can use Future.find, and since you are looking for a Boolean, you can pass the identity function :
import scala.concurrent.Future
val failedF = Future.failed[Boolean](new Exception("boo"))
val falseF = Future.successful(false)
val trueF = Future.successful(true)
val f1 = Future.find(trueF :: falseF :: failedF :: Nil)(identity)
val f2 = Future.find(falseF :: failedF :: Nil)(identity)
val f3 = Future.find(failedF :: Nil)(identity)
Future.sequence(f1 :: f2 :: f3 :: Nil).foreach(println)
// List(Some(true), None, None)
It returns Future[Option[Boolean]] like it is, but you could easily turn it into Future[Boolean] by supplying a default value with getOrElse :
f2.map(_.getOrElse(false)).foreach(println)
// false

Can this be made prettier? Matching string to case class field

Sorry about the title, please edit it to be more descriptive if you can!
Is there a way to generalize this with scala? I have quite a few fields that can be filtered against, and this is just plain ugly! The problem I ran against was matching parameter name against the case class field, can it be done in a more general way, without this much code duplication?
get("/MostClicked") { request =>
val res = MongoDbOps.findMostClicked()
val res1 = request.params.get("source") match {
case None => res
case Some(f) => res.filter(_.source == f)
}
val res2 = request.params.get("category") match {
case None => res1
case Some(f) => res1.filter(_.category == f)
}
// more of the same...
render.plain {
res2.toJson.prettyPrint
}.toFuture
}
You can try either of the two approaches below.
case class MostClicked(
source: String,
category: String,
rating: String)
object MongoDbOps {
def findMostClicked() = List[MostClicked]()
}
class Request {
val params = Map[String, String]()
}
def get(path: String)(f: Request => String) = {
f(new Request)
}
First is to use a List of matchers and then apply them sequentially using foldLeft:
get("/MostClicked") { request =>
val res = MongoDbOps.findMostClicked()
val kfun = List(
"source" -> ((x: MostClicked, y: String) => x.source == y),
"category" -> ((x: MostClicked, y: String) => x.category == y),
"rating" -> ((x: MostClicked, y: String) => x.rating == y))
val r = kfun.foldLeft(res) { (x, y) =>
request.params.get(y._1)
.map(f => res.filter(y._2(_, f)))
.getOrElse(x)
}
r.toString
// more of the same...
render.plain {
r.toJson.prettyPrint
}.toFuture
}
Or simply make it more readable:
get("/MostClicked") { request =>
val res = MongoDbOps.findMostClicked()
val res1 = request.params.get("source")
.map(f => res.filter(_.source == f))
.getOrElse(res)
val res2 = request.params.get("category")
.map(f => res.filter(_.category == f))
.getOrElse(res1)
val res3 = request.params.get("rating")
.map(f => res.filter(_.rating == f))
.getOrElse(res2)
// more of the same...
render.plain {
res3.toJson.prettyPrint
}.toFuture
}
I had to break it down to parts and work out the types, thus the smaller methods.
It works in my simple experiment and gets rid of the duplication where it can, but might not be as readable?
Note that unless you get into reflection, you still need to create the name of the parameter and how it should be filtered.
This looks to be the same as tuxdna's answer except with types to increase readability and maintainability
SETUP
case class Request(params: Map[String, String])
case class Result(category: String, source: String)
type Filterer = (Result, String) => Boolean
case class FilterInfo(paramName: String, filterer: Filterer)
type Analyzer = FilterInfo => List[Result]
val request = Request(Map("source"->"b"))
EXTRACTION METHODS
def reduce(filterInfos: List[FilterInfo], results: List[Result]) = {
filterInfos.foldLeft(results) { (currentResult, filterInfo) =>
request.params.get(filterInfo.paramName)
.map(filterVal => currentResult.filter(filterInfo.filterer(_, filterVal)))
.getOrElse(currentResult)
}
}
USAGE
val filterInfos = List(
FilterInfo("source", (result, filterVal) => result.source == filterVal),
FilterInfo("category", (result, filterVal) => result.category == filterVal))
val res = List(Result("a","a"), Result("b", "b"))
reduce(filterInfos, res)
Used in your example it would be more like this:
get("/MostClicked") { request =>
val res = MongoDbOps.findMostClicked()
val filterInfos = List(
FilterInfo("source", (result, filterVal) => result.source == filterVal),
FilterInfo("category", (result, filterVal) => result.category == filterVal))
val finalResult = reduce(filterInfos, res)
render.plain {
finalResult.toJson.prettyPrint
}.toFuture
}

How to check if there's None in List[Option[_]] and return the element's name?

I have multiple Option's. I want to check if they hold a value. If an Option is None, I want to reply to user about this. Else proceed.
This is what I have done:
val name:Option[String]
val email:Option[String]
val pass:Option[String]
val i = List(name,email,pass).find(x => x match{
case None => true
case _ => false
})
i match{
case Some(x) => Ok("Bad Request")
case None => {
//move forward
}
}
Above I can replace find with contains, but this is a very dirty way. How can I make it elegant and monadic?
Edit: I would also like to know what element was None.
Another way is as a for-comprehension:
val outcome = for {
nm <- name
em <- email
pwd <- pass
result = doSomething(nm, em, pwd) // where def doSomething(name: String, email: String, password: String): ResultType = ???
} yield (result)
This will generate outcome as a Some(result), which you can interrogate in various ways (all the methods available to the collections classes: map, filter, foreach, etc.). Eg:
outcome.map(Ok(result)).orElse(Ok("Bad Request"))
val ok = Seq(name, email, pass).forall(_.isDefined)
If you want to reuse the code, you can do
def allFieldValueProvided(fields: Option[_]*): Boolean = fields.forall(_.isDefined)
If you want to know all the missing values then you can find all missing values and if there is none, then you are good to go.
def findMissingValues(v: (String, Option[_])*) = v.collect {
case (name, None) => name
}
val missingValues = findMissingValues(("name1", option1), ("name2", option2), ...)
if(missingValues.isEmpty) {
Ok(...)
} else {
BadRequest("Missing values for " + missingValues.mkString(", ")))
}
val response = for {
n <- name
e <- email
p <- pass
} yield {
/* do something with n, e, p */
}
response getOrElse { /* bad request /* }
Or, with Scalaz:
val response = (name |#| email |#| pass) { (n, e, p) =>
/* do something with n, e, p */
}
response getOrElse { /* bad request /* }
if ((name :: email :: pass :: Nil) forall(!_.isEmpty)) {
} else {
// bad request
}
I think the most straightforward way would be this:
(name,email,pass) match {
case ((Some(name), Some(email), Some(pass)) => // proceed
case _ => // Bad request
}
A version with stone knives and bear skins:
import util._
object Test extends App {
val zero: Either[List[Int], Tuple3[String,String,String]] = Right((null,null,null))
def verify(fields: List[Option[String]]) = {
(zero /: fields.zipWithIndex) { (acc, v) => v match {
case (Some(s), i) => acc match {
case Left(_) => acc
case Right(t) =>
val u = i match {
case 0 => t copy (_1 = s)
case 1 => t copy (_2 = s)
case 2 => t copy (_3 = s)
}
Right(u)
}
case (None, i) =>
val fails = acc match {
case Left(f) => f
case Right(_) => Nil
}
Left(i :: fails)
}
}
}
def consume(name: String, email: String, pass: String) = Console println s"$name/$email/$pass"
def fail(is: List[Int]) = is map List("name","email","pass") foreach (Console println "Missing: " + _)
val name:Option[String] = Some("Bob")
val email:Option[String]= None
val pass:Option[String] = Some("boB")
val res = verify(List(name,email,pass))
res.fold(fail, (consume _).tupled)
val res2 = verify(List(name, Some("bob#bob.org"),pass))
res2.fold(fail, (consume _).tupled)
}
The same thing, using reflection to generalize the tuple copy.
The downside is that you must tell it what tuple to expect back. In this form, reflection is like one of those Stone Age advances that were so magical they trended on twitter for ten thousand years.
def verify[A <: Product](fields: List[Option[String]]) = {
import scala.reflect.runtime._
import universe._
val MaxTupleArity = 22
def tuple = {
require (fields.length <= MaxTupleArity)
val n = fields.length
val tupleN = typeOf[Tuple2[_,_]].typeSymbol.owner.typeSignature member TypeName(s"Tuple$n")
val init = tupleN.typeSignature member nme.CONSTRUCTOR
val ctor = currentMirror reflectClass tupleN.asClass reflectConstructor init.asMethod
val vs = Seq.fill(n)(null.asInstanceOf[String])
ctor(vs: _*).asInstanceOf[Product]
}
def zero: Either[List[Int], Product] = Right(tuple)
def nextProduct(p: Product, i: Int, s: String) = {
val im = currentMirror reflect p
val ts = im.symbol.typeSignature
val copy = (ts member TermName("copy")).asMethod
val args = copy.paramss.flatten map { x =>
val name = TermName(s"_$i")
if (x.name == name) s
else (im reflectMethod (ts member x.name).asMethod)()
}
(im reflectMethod copy)(args: _*).asInstanceOf[Product]
}
(zero /: fields.zipWithIndex) { (acc, v) => v match {
case (Some(s), i) => acc match {
case Left(_) => acc
case Right(t) => Right(nextProduct(t, i + 1, s))
}
case (None, i) =>
val fails = acc match {
case Left(f) => f
case Right(_) => Nil
}
Left(i :: fails)
}
}.asInstanceOf[Either[List[Int], A]]
}
def consume(name: String, email: String, pass: String) = Console println s"$name/$email/$pass"
def fail(is: List[Int]) = is map List("name","email","pass") foreach (Console println "Missing: " + _)
val name:Option[String] = Some("Bob")
val email:Option[String]= None
val pass:Option[String] = Some("boB")
type T3 = Tuple3[String,String,String]
val res = verify[T3](List(name,email,pass))
res.fold(fail, (consume _).tupled)
val res2 = verify[T3](List(name, Some("bob#bob.org"),pass))
res2.fold(fail, (consume _).tupled)
I know this doesn't scale well, but would this suffice?
(name, email, pass) match {
case (None, _, _) => "name"
case (_, None, _) => "email"
case (_, _, None) => "pass"
case _ => "Nothing to see here"
}