Functional Programming: Perimeter of a polygon. - scala

I am trying to find the perimeter of a polygon in a functional way. I tried my best but I couldn't make it purely functional. This is my code:
object Solution {
def main(args: Array[String]) {
var x:Double = 0
val N = scala.io.StdIn.readInt
val points = scala.io.Source.stdin.getLines().take(N).map(x=>x).toList
for(i <- 0 to N-1){
if(i==N-1) x+=dist(List(points(i),points(0)))
else x += dist(List(points(i),points(i+1)))
}
println(x)
}
def dist(A: List[String]): Double = {
scala.math.sqrt(scala.math.pow((A(0).split(" ")(0).toDouble-A(1).split(" ")(0).toDouble),2) + scala.math.pow((A(0).split(" ")(1).toDouble-A(1).split(" ")(1).toDouble),2))
}
}
I enter the number of points of the polygon first and then enter Cartesian coordinates of each point in a new line.
Can anyone help me make it purely functional?

Start with separating concerns:
// dist should just take 2 points
def dist(a: (Double,Double), b: (Double,Double)): Double = ...
// calculate perimeter
def perimeter (points: List[(Double,Double)]): Double = {
// create a list of lines by connecting adjacent points
val lines = points zip (points.tail ++ List(points.head))
// aggregate the length of each line using foldLeft (/:)
(0d /: lines)((acc, line) => acc ++ dist(line._1, line._2))
}
def main (args: Array[String]) {
// main just needs to parse the lines
val points = ... // parse the points
println(perimeter(points))
}

Consider val n = 5 points
val points = (1 to n).map(_ => Math.random * 10).toArray
and a distance function, for example
def dist(a: Double, b: Double) = math.abs(a-b)
Then iterate continually (in circles) n times on (grouped) pairs of points to which we apply dist,
Iterator.continually(points)
.flatten
.sliding(2)
.take(n)
.map { case a :: b :: Nil => dist(a,b) }
.sum

Related

type mismatch in scala when using reduce

Can anybody help me understand what's wrong with the code below?
case class Point(x: Double, y: Double)
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.reduce(_.x + _.x)
val y = points.reduce(_.y + _.y)
val len = points.length
Point(x/len, y/len)
}
I get the error when I run it:
Error:(10, 30) type mismatch;
found : Double
required: A$A145.this.Point
val x = points.reduce(_.x + _.x)
^
reduce, in this case, takes a function of type (Point, Point) => Point and returns a Point.
One way to calculate the centroid:
case class Point(x: Double, y: Double)
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.map(_.x).sum
val y = points.map(_.y).sum
val len = points.length
Point(x/len, y/len)
}
If you want to use reduce you need to reduce both x and y in a single pass like this
def centroid(points: IndexedSeq[Point]): Point = {
val p = points.reduce( (s, p) => Point(s.x + p.x, s.y + p.y) )
val len = points.length
Point(p.x/len, p.y/len)
}
If you want to compute x and y independently then use foldLeft rather than reduce like this
def centroid(points: IndexedSeq[Point]): Point = {
val x = points.foldLeft(0.0)(_ + _.x)
val y = points.foldLeft(0.0)(_ + _.y)
val len = points.length
Point(x/len, y/len)
}
This is perhaps clearer but does process the points twice so it may be marginally less efficient.

Compare the similarity between two text in Scala

I want to compare two texts in Scala and calculate the similarity rate. I begin to code this but I'm blocked :
import org.apache.spark._
import org.apache.spark.SparkContext._
object WordCount {
def main(args: Array[String]):Unit = {
val white = "/whiteCat.txt" // "The white cat is eating a white soup"
val black = "/blackCat.txt" // "The black cat is eating a white sandwich"
val conf = new SparkConf().setAppName("wordCount")
val sc = new SparkContext(conf)
val b = sc.textFile(white)
val words = b.flatMap(line => line.split("\\W+"))
val counts = words.map(word => (word, 1)).reduceByKey{case (x, y) => x + y}
counts.take(10).foreach(println)
//counts.saveAsTextFile(outputFile)
}
}
I succeeded to split the words of each text and count the occurency of each word. For example in the file1 there is :
(The,1)
(white,2)
(cat,1)
(is,1)
(eating,1)
(a,1)
(soup,1)
To calculate the similarity rate. I have to do this algorithm but I'm not experienced with Scala
i=0
foreach word in the first text
j = 0
IF keyFile1[i] == keyFile2[j]
THEN MIN(valueFile1[i], valueFile2[j]) / MAX(valueFile1[i], valueFile2[j])
ELSE j++
i++
Can you help me please ?
You can use leftOuterJoin to join the two key/value-pair RDDs to generate a RDD of type Array[(String, (Int, Option[Int]))], gather both counts from the Tuples, flatten the counts to type Int, and apply your min/max formula, as in the following example:
val wordCountsWhite = sc.textFile("/path/to/whitecat.txt").
flatMap(_.split("\\W+")).
map((_, 1)).
reduceByKey(_ + _)
wordCountsWhite.collect
// res1: Array[(String, Int)] = Array(
// (is,1), (eating,1), (cat,1), (white,2), (The,1), (soup,1), (a,1)
// )
val wordCountsBlack = sc.textFile("/path/to/blackcat.txt").
flatMap(_.split("\\W+")).
map((_, 1)).
reduceByKey(_ + _)
wordCountsBlack.collect
// res2: Array[(String, Int)] = Array(
// (is,1), (eating,1), (cat,1), (white,1), (The,1), (a,1), (sandwich,1), (black,1)
// )
val similarityRDD = wordCountsWhite.leftOuterJoin(wordCountsBlack).map{
case (k: String, (c1: Int, c2: Option[Int])) => {
val counts = Seq(Some(c1), c2).flatten
(k, counts.min.toDouble / counts.max )
}
}
similarityRDD.collect
// res4: Array[(String, Double)] = Array(
// (is,1.0), (eating,1.0), (cat,1.0), (white,0.5), (The,1.0), (soup,1.0), (a,1.0)
// )
That seems straight forward using for comprehension
for( a <- counts1; b <- counts2 if a._1==b._1 ) yield Math.min(a._2,b._2)/Math.max(a._2,b._2)
Edit:
sorry, the above code does not work.
Here's a modified code with for comprehension. counts1 and counts2 are 2 counts from the question.
val result= for( (t1,t2) <- counts1.cartesian(counts2) if( t1._1==t2._1)) yield Math.min(t1._2,t2._2).toDouble/Math.max(t1._2,t2._2).toDouble
the result:
result.foreach(println)
1.0
0.5
1.0
1.0
1.0
There are numerous algorithms to find similarity between strings. One of these methods is edit distance. There are different definitions of edit distance and there are different sets of operations based on the methodology. But main idea is finding minimum series of operations (insertion, deletion, substitution) to convert string a into string b.
Levenshtein distance and Longest Common Subsequence are widely known algorithms to find similarity between strings. But these methods are insensitive to contexts. Because of this situation, you may want to take a look at this article in which n-gram similarity and distance are represented. You can also find Scala implementations of these algorithms in github or rosetta code.
I hope it helps!

area under the curve programatically in scala

im trying to solve for the area under the curve of the example 1 of: http://tutorial.math.lamar.edu/Classes/CalcI/AreaProblem.aspx
f(x) = x^3 - 5x^2 + 6x + 5 and the x-axis n = 5
the answers says it is: 25.12
but i'm getting a slightly less: 23.78880035448074
what im i doing wrong??
here's my code:
import scala.math.BigDecimal.RoundingMode
def summation(low: Int, up: Int, coe: List[Int], ex: List[Int]) = {
def eva(coe: List[Int], ex: List[Int], x: Double) = {
(for (i <- 0 until coe.size) yield coe(i) * math.pow(x,ex(i))).sum
}
#annotation.tailrec
def build_points(del: Float, p: Int, xs : List[BigDecimal]): List[BigDecimal] = {
if(p <= 0 ) xs map { x => x.setScale(3, RoundingMode.HALF_EVEN)}
else build_points(del, p - 1, ((del * p):BigDecimal ):: xs)
}
val sub = 5
val diff = (up - low).toFloat
val deltaX = diff / sub
val points = build_points(deltaX, sub, List(0.0f)); println(points)
val middle_points =
(for (i <- 0 until points.size - 1) yield (points(i) + points(i + 1)) / 2)
(for (elem <- middle_points) yield deltaX * eva(coe,ex,elem.toDouble)).sum
}
val coe = List(1,-5,6,5)
val exp = List(3,2,1,0)
print(summation(0,4,coe,exp))
I'm guessing the problem is that the problem is build_points(deltaX, 5, List(0.0f)) returns a list with 6 elements instead of 5. The problem is that you are passing a list with one element in the beginning, where I'm guessing you wanted an empty list, like
build_points(deltaX, sub, Nil)

Refactoring a small Scala function

I have this function to compute the distance between two n-dimensional points using Pythagoras' theorem.
def computeDistance(neighbour: Point) = math.sqrt(coordinates.zip(neighbour.coordinates).map {
case (c1: Int, c2: Int) => math.pow(c1 - c2, 2)
}.sum)
The Point class (simplified) looks like:
class Point(val coordinates: List[Int])
I'm struggling to refactor the method so it's a little easier to read, can anybody help please?
Here's another way that makes the following three assumptions:
The length of the list is the number of dimensions for the point
Each List is correctly ordered, i.e. List(x, y) or List(x, y, z). We do not know how to handle List(x, z, y)
All lists are of equal length
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
)
The obvious disadvantage here is that we don't have any safety around list length. The quick fix for this is to simply have the function return an Option[Double] like so:
def computeDistance(other: Point): Option[Double] = {
if(other.coordinates.length != coordinates.length) {
return None
}
return Some(sqrt(coordinates.zip(other.coordinates)
.flatMap(i => List(pow(i._2 - i._1, 2)))
.fold(0.0)(_ + _)
))
I'd be curious if there is a type safe way to ensure equal list length.
EDIT
It was politely pointed out to me that flatMap(x => List(foo(x))) is equivalent to map(foo) , which I forgot to refactor when I was originally playing w/ this. Slightly cleaner version w/ Map instead of flatMap :
def computeDistance(other: Point): Double = sqrt(
coordinates.zip(other.coordinates)
.map(i => pow(i._2 - i._1, 2))
.fold(0.0)(_ + _)
)
Most of your problem is that you're trying to do math with really long variable names. It's almost always painful. There's a reason why mathematicians use single letters. And assign temporary variables.
Try this:
class Point(val coordinates: List[Int]) { def c = coordinates }
import math._
def d(p: Point) = {
val delta = for ((a,b) <- (c zip p.c)) yield pow(a-b, dims)
sqrt(delta.sum)
}
Consider type aliases and case classes, like this,
type Coord = List[Int]
case class Point(val c: Coord) {
def distTo(p: Point) = {
val z = (c zip p.c).par
val pw = z.aggregate(0.0) ( (a,v) => a + math.pow( v._1-v._2, 2 ), _ + _ )
math.sqrt(pw)
}
}
so that for any two points, for instance,
val p = Point( (1 to 5).toList )
val q = Point( (2 to 6).toList )
we have that
p distTo q
res: Double = 2.23606797749979
Note method distTo uses aggregate on a parallelised collection of tuples, and combines the partial results by the last argument (summation). For high dimensional points this may prove more efficient than the sequential counterpart.
For simplicity of use, consider also implicit classes, as suggested in a comment above,
implicit class RichPoint(val c: Coord) extends AnyVal {
def distTo(d: Coord) = Point(c) distTo Point(d)
}
Hence
List(1,2,3,4,5) distTo List(2,3,4,5,6)
res: Double = 2.23606797749979

Scala: groupBy (identity) of List Elements

I develop an application that builds pairs of words in (tokenised) text and produces the number of times each pair occurs (even when same-word pairs occur multiple times, it's OK as it'll be evened out later in the algorithm).
When I use
elements groupBy()
I want to group by the elements' content itself, so I wrote the following:
def self(x: (String, String)) = x
/**
* Maps a collection of words to a map where key is a pair of words and the
* value is number of
* times this pair
* occurs in the passed array
*/
def producePairs(words: Array[String]): Map[(String,String), Double] = {
var table = List[(String, String)]()
words.foreach(w1 =>
words.foreach(w2 =>
table = table ::: List((w1, w2))))
val grouppedPairs = table.groupBy(self)
val size = int2double(grouppedPairs.size)
return grouppedPairs.mapValues(_.length / size)
}
Now, I fully realise that this self() trick is a dirty hack. So I thought a little a came out with a:
grouppedPairs = table groupBy (x => x)
This way it produced what I want. However, I still feel that I clearly miss something and there should be easier way of doing it. Any ideas at all, dear all?
Also, if you'd help me to improve the pairs extraction part, it'll also help a lot – it looks very imperative, C++ - ish right now. Many thanks in advance!
I'd suggest this:
def producePairs(words: Array[String]): Map[(String,String), Double] = {
val table = for(w1 <- words; w2 <- words) yield (w1,w2)
val grouppedPairs = table.groupBy(identity)
val size = grouppedPairs.size.toDouble
grouppedPairs.mapValues(_.length / size)
}
The for comprehension is much easier to read, and there is already a predifined function identity, with is a generalized version of your self.
you are creating a list of pairs of all words against all words by iterating over words twice, where i guess you just want the neighbouring pairs. the easiest is to use a sliding view instead.
def producePairs(words: Array[String]): Map[(String, String), Int] = {
val pairs = words.sliding(2, 1).map(arr => arr(0) -> arr(1)).toList
val grouped = pairs.groupBy(t => t)
grouped.mapValues(_.size)
}
another approach would be to fold the list of pairs by summing them up. not sure though that this is more efficient:
def producePairs(words: Array[String]): Map[(String, String), Int] = {
val pairs = words.sliding(2, 1).map(arr => arr(0) -> arr(1))
pairs.foldLeft(Map.empty[(String, String), Int]) { (m, p) =>
m + (p -> (m.getOrElse(p, 0) + 1))
}
}
i see you are return a relative number (Double). for simplicity i have just counted the occurances, so you need to do the final division. i think you want to divide by the number of total pairs (words.size - 1) and not by the number of unique pairs (grouped.size)..., so the relative frequencies sum up to 1.0
Alternative approach which is not of order O(num_words * num_words) but of order O(num_unique_words * num_unique_words) (or something like that):
def producePairs[T <% Traversable[String]](words: T): Map[(String,String), Double] = {
val counts = words.groupBy(identity).map{case (w, ws) => (w -> ws.size)}
val size = (counts.size * counts.size).toDouble
for(w1 <- counts; w2 <- counts) yield {
((w1._1, w2._1) -> ((w1._2 * w2._2) / size))
}
}