Scala - functional methods - expected NotInferedA - scala

I have to Strings which represent two html site contents. I want to remove whitespaces and comments, compute Levenshtein distance between them and on that basis I want to decide wether they are similiar or not.
I created functions:
val removeWhiteSpacesAndHtmlComments: String => String = _.replaceAll("\\s+","\\s").replaceAll("<!--.*?-->","")
val prepareContents: (String,String) => (String,String) = (s1,s2) => (removeWhiteSpacesAndHtmlComments.apply(s1), removeWhiteSpacesAndHtmlComments(s2))
val computeLevenshteinDistance:(String,String) => Int = StringUtils.getLevenshteinDistance(_,_)
val areContentsSimilarEnough: Int => Boolean = _ <= 50
I want to combine all those functions into a flow:
val isHtmlContentChanged: (String,String) => Boolean = prepareContents.tupled andThen computeLevenshteinDistance andThen areContentsSimilarEnough
Unfortunately over the computeLevenshteinDistance part I get exception:
Type mismatch, expected: (String,String) => NotInferedA, actual: (String,String)=>Int
How to solve this ?

Add .tupled to computeLevenshteinDistance.
Try it out!

Related

scala iterating through the set type with condition

I have a type of set and union function as follow
type Set = Int => Boolean
def union(s: Set, t: Set): Set = (e: Int) => s(e) || t(e)
val xs = Set(12001,12002, 12003, 12004)
val ys = Set(13001,13002, 13003, 13004)
When i use the union operation,
union(xs,ys)
It should return me another set which contains all the elements of both sets xs and ys
Edited Section:
I am sorry i was not clear on my question, i have my own implementation of the iterator for both Set xs and ys
var i = xs.iterator;
while(i.hasNext)
println(i.next())
But i was not satisfied with this implementation and found that you can implement the condition with the function (after some googling) but i was unable to get it to work in my eclipse worksheet.
val rs = union(xs,ys) //> rs : Learn2.Set = <function1>
I am guessing it returns a function.
so my questions,
1. is it possible to implement as described above in the edited section? if so, then what am i missing to get it working?
2. I don't understand how the element e in (e: Int) => s(e) || t(e) is iterating over the elements in both the sets
Look at your Set type: Int => Boolean. So it takes an Int and returns a Boolean. What that means is that it is not a collection that you can iterate over to retrieve all its values, because it actually contains no values.
If you want to know what Int values return true then you have to iterate over the entire range of possible inputs (or some subset thereof) and filter for the condition you're looking for.
scala> val res = union(xs,ys)
res: Set = $$Lambda$1091/332405156#2c30c81d
scala> (0 to 20000).filter(res).foreach(println)
12001
12002
12003
12004
13001
13002
13003
13004
scala>
update
Your confusion stems from the fact that you've named your function after an existing collection in the standard library. xs.itorator works because xs is not an example of your Set, it is a Set from the standard library with all the associated methods. Rename your type alias to something like Xet and you'll see what I mean.
type Xet = Int => Boolean
def union(s: Xet, t: Xet): Xet = (e: Int) => s(e) || t(e)
val xx: Xet = _ == 12001
val yx: Xet = _ == 13002
val zx: Xet = union(xx, yx)
xx.itrerator // Error, won't compile
(1 to 20000).filter(zx).foreach(println) // output: 12001 & 13002

Can I call a method inside of sortWith when sorting a sequence (or a map)?

I have a map Map[String,Option[Seq[String]]] and I have values for each of the string in a different map: Map[String,Option[Int]]. I am trying to map over the values and use a sortWith on the sequence but as I read online, I don't see any examples of having custom methods inside the sortWith.
How can I sort my sequence using sortWith? If I wanted to implement a custom method that returns a boolean to tell me what object is considered greater, is this possible?
val fieldMap = Map("user1" -> Seq("field1_name", "field2_name"), "user2" -> Seq("field3_name"))
val fieldValues = Map("field1_name" -> 2, "field2_name" -> 1, "field3_name" -> 3)
val sortedMap = fieldMap.mapValues(fieldList => fieldList.sortWith(fieldValues(_) < fieldValues(_)) // Scala doesn't like this
I tried:
fieldList.sortWith{(x,y) =>
val x = fieldValues(x)
val y = fieldValues(y)
x < y
}
This gives me a Type mismatch of expected type:
(String,String) => Boolean
and actual:
(String,String) => Any
EDIT Solution:
fieldList.sortWith{(x,y) =>
val x = fieldValues(x)
val y = fieldValues(x)
x.getOrElse[Double](0.0) < y.getOrElse[Double](0.0) // have to unwrap the Option.
}
You're using wrong syntax. For using sortWith you have to do something like:
fieldMap.mapValues(
fieldList => fieldList.sortWith(
(a,b) => fieldValues(a) > fieldValues(b)
)
)

Using contains in scala - exception

I am encountering this error:
java.lang.ClassCastException: scala.collection.immutable.$colon$colon cannot be cast to [Ljava.lang.Object;
whenever I try to use "contains" to find if a string is inside an array. Is there a more appropriate way of doing this? Or, am I doing something wrong? (I am fairly new to Scala)
Here is the code:
val matches = Set[JSONObject]()
val config = new SparkConf()
val sc = new SparkContext("local", "SparkExample", config)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val ebay = sqlContext.read.json("/Users/thomassquires/Downloads/products.json")
val catalogue = sqlContext.read.json("/Users/thomassquires/Documents/catalogue2.json")
val eins = ebay.map(item => (item.getAs[String]("ID"), Option(item.getAs[Set[Row]]("itemSpecifics"))))
.filter(item => item._2.isDefined)
.map(item => (item._1 , item._2.get.find(x => x.getAs[String]("k") == "EAN")))
.filter(x => x._2.isDefined)
.map(x => (x._1, x._2.get.getAs[String]("v")))
.collect()
def catEins = catalogue.map(r => (r.getAs[String]("_id"), Option(r.getAs[Array[String]]("item_model_number")))).filter(r => r._2.isDefined).map(r => (r._1, r._2.get)).collect()
def matched = for(ein <- eins) yield (ein._1, catEins.filter(z => z._2.contains(ein._2)))
The exception occurs on the last line. I have tried a few different variants.
My data structure is one List[Tuple2[String, String]] and one List[Tuple2[String, Array[String]]] . I need to find the zero or more matches from the second list that contain the string.
Thanks
Long story short (there is still part that eludes me here*) you're using wrong types. getAs is implemented as fieldIndex (String => Int) followed by get (Int => Any) followed by asInstanceOf.
Since Spark doesn't use Arrays nor Sets but WrappedArray to store array column data, calls like getAs[Array[String]] or getAs[Set[Row]] are not valid. If you want specific types you should use either getAs[Seq[T]] or getAsSeq[T] and convert your data to desired type with toSet / toArray.
* See Why wrapping a generic method call with Option defers ClassCastException?

How to find tuple with different value in a list using scala?

I have following list:
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
I want following output:
List(("name3",40))
I tried following:
val distElements = list.map(_._2).distinct
list.groupBy(_._1).map{ case(k,v) =>
val h = v.map(_._2)
if(distElements.equals(h)) List.empty else distElements.diff(h)
}.flatten
But this is not I am looking for.
Can anybody give answer/hint me to get expected output.
I understand the question as looking for the element of the list whose _2 (number) occurs only once.
val list = List(("name1",20),("name2",20),("name1",30),("name2",30),
("name3",40),("name3",30),("name3",20))
First you group by the _2 element, which gives you a map whose keys are lists of all elements with the same _2:
val g = list.groupBy(_._2) // Map[Int, List[(String, Int)]]
Now you can filter those entries that consists only of one element:
val opt = g.collectFirst { // Option[(String, Int)]
case (_, single :: Nil) => single
}
Or (if you are expecting possibly more than one distinct value)
val col = g.collect { // Map[String, Int]
case (_, single :: Nil) => single
}
Seems to me that you're looking to match against both the value of the left hand and the right hand at the same time while also preserving the type of collection you're looking at, a List. I would use collect:
val out = myList.collect{
case item # ("name3", 40) => item
}
which combines a PartialFunction with filter and map like qualities. In this case, it filters out any value for which the PartialFunction is not defined while mapping the values which match. Here, I've only allowed for a singular match.

Chain Scala Filters Without the Overhead

I want to chain a bunch of filters but do not want the overhead associated with creating multiple lists.
type StringFilter = (String) => Boolean
def nameFilter(value: String): StringFilter =
(s: String) => s == value
def lengthFilter(length: Int): StringFilter =
(s: String) => s.length == length
val list = List("Apple", "Orange")
Problem is this builds a list after each filter:
list.filter(nameFilter("Apples")).filter(lengthFilter(5))
// list of string -> list of name filtered string -> list of name and length filtered string
I want:
// list of string -> list of name and length filtered string
I find out which filters are needed at run-time so I must add filters dynamically.
// Not sure how to implement add function.
val filterPipe: StringFilter = ???
// My preferred DSL (or very close to it)
filterPipe.add(nameFilter("Apples")
filterPipe.add(lengthFilter(5))
// Must have DSL
list.filter(filterPipe)
How can I implement filterPipe?
Is there some way to recursively AND the filter conditions together in a filterPipe (which is itself a StringFilter)?
You can use withFilter:
list.withFilter(nameFilter("Apples")).withFilter(lengthFilter(5))...
A blog post suggest another alternative using an implicit class to allow aggregating multiple predicates using custom operators
implicit class Predicate[A](val pred: A => Boolean) {
def apply(x: A) = pred(x)
def &&(that: A => Boolean) = new Predicate[A](x => pred(x) && that(x))
def ||(that: A => Boolean) = new Predicate[A](x => pred(x) || that(x))
def unary_! = new Predicate[A](x => !pred(x))
}
Then you can apply the predicate chain as follows
list.filter { (nameFilter("Apple") && lengthFilter(5)) (_) }
You can also chain the predicates dynamically
val list = List("Apple", "Orange", "Meat")
val isFruit = nameFilter("Apple") || nameFilter("Orange")
val isShort = lengthFilter(5)
list.filter { (isFruit && isShort) (_) }
As you can see the benefit of this approach compared to the withFilter approach is that you can combine the predicates arbitrarily
Consider also a view on the filters, like this,
list.view.filter(nameFilter("Apples")).filter(lengthFilter(5))
This prevents intermediate collections, namely for each entry in list it applies the subsequent filters.