How to sum up based on a Tuples first elem? - scala

I have a 3-tuple list like the following [I added line breaks for readability]:
(2, 127, 3)
(12156, 127, 3)
(4409, 127, 2) <-- 4409 occurs 2x
(1312, 127, 12) <-- 1312 occurs 3x
(4409, 128, 1) <--
(12864, 128, 1)
(1312, 128, 1) <--
(2664, 128, 2)
(12865, 129, 1)
(183, 129, 1)
(12866, 129, 2)
(1312, 129, 10) <--
I want to sum up based on the first entry. The first entry should be unique.
The result should look like this:
(2, 127, 3)
(12156, 127, 3)
(4409, 127, 3) <- new sum = 3
(1312, 127, 23) <- new sum = 23
(12864, 128, 1)
(2664, 128, 2)
(12865, 129, 1)
(183, 129, 1)
(12866, 129, 2)
How can I achieve this in Scala?

Try this:
list groupBy {_._1} mapValues {v => (v.head._1, v.head._2, v map {_._3} sum)}
The middle entry is preserved and it always takes the first one that appeared in the input list.

If you can just ignore the middle entry, then:
val l = List(('a,'e,1), ('b,'f,2), ('a,'g,3), ('b,'h,4))
l.groupBy(_._1).mapValues(_.map(_._3).sum)
// Map('b -> 6, 'a -> 4)
If you have to keep the middle entry around:
l.groupBy(_._1).map {
case (_, values) =>
val (a,b,_) = values.head
(a, b, values.map(_._3).sum)
}
// List(('b,'f,6), ('a,'e,4))

You could use the concept of a monoid. If the first two values of your entries build the key values and the remaining the associate value itself, you could use a Map.
Once you have a Map you may proceed like this:
Best way to merge two maps and sum the values of same key?

Related

How can I split a list into multiple other lists?

I only recently started working with Scala and I came face to face with a problem I can't seem to find a solution to. So basically, I'm given an input text file by the name of "in.txt", which includes lines of coordinates that I have to work with like I've shown bellow.
2 1
6 6
4 2
2 5
2 6
2 7
3 4
6 1
6 2
2 3
6 3
6 4
6 5
6 7
I decided to use a List to store all the values so I could use built in functions to do calculations with the values afterwards.
val lines = io.Source.fromFile("in.txt").getLines
val coordinates =
lines
.drop(0)
.toList
.sortWith(_<_)
.mkString
.replaceAll("\\s", "")
.grouped(2)
.toList
Everything works as it should, as the output of println(coordinates) is
List(21, 23, 25, 26, 27, 34, 42, 61, 62, 63, 64, 65, 66, 67)
But what I want to do next is to create multiple lists out of this one. For example, a new list should be created if, for example, a value starts with "2", and all the values that start with "2" would be placed in the new list like this:
List(21, 23, 25, 26, 27)
Then the same would be done with "3", then "4" and so on.
Using functions such as .partition and .groupBy works, but taking into account the fact that the values in the coordinates can also reach 4 digit numbers, and that they can change if the input file is edited, it would be a pain to write all those conditions manually. So basically my question is this: Is it possible to achieve this by making use of Scala's functionality, some sort of form of iterations?
Thanks in advance!
I am assuming your file can take a mixture of 2, 3, 4, ... digit strings.
scala> val l = List("12", "13", "123", "1234")
l: List[String] = List(12, 13, 123, 1234)
scala> val grouped = l.groupBy(s => s.take(s.length - 1)).values
grouped: Iterable[List[String]] = MapLike(List(123), List(12, 13), List(1234))
If you want this sorted:
val grouped = l.groupBy(s => s.take(s.length - 1)).toSeq.sortBy(_._1).map{ case (_, l) => l.sorted}
grouped: Seq[List[String]] = ArrayBuffer(List(12, 13), List(123), List(1234))
You can generate all your input conditions with a range:
val conditions = 1 to 9999
And then foldLeft them filtering your original list by each of its elements:
conditions.foldLeft(List():List[List[Int]])((acc, elem) => l.filter(_.toString.startsWith(elem.toString))::acc).filterNot(_.isEmpty)
Output
res28: List[List[Int]] = List(List(67), List(66), List(65), List(64), List(63), List(62), List(61), List(42), List(34), List(27), List(26), List(25), List(23), List(21), List(61, 62, 63, 64, 65, 66, 67), List(42), List(34), List(21, 23, 25, 26, 27))

Set operation to divide lists by one another in Scala

Right now I have 2 lists in Scala:
val one = List(50, 10, 17, 8, 16)
val two = List(582, 180, 174, 159, 158)
These lists are going to be of the same length, and right now I'm looking to divide each element of the first list by a corresponding element in the second. In other words, I want a list that consists of:
List(50/582, 10/180, etc...)
Is there a set operation that accomplishes this that can be done without looping?
Thank you!
You can use the zip function.
val one = List(50, 10, 17, 8, 16)
val two = List(582, 180, 174, 159, 158)
one.zip(two).map {
case (a, b) => a.toDouble/b.toDouble
}

ScalaCheck: choose an integer with custom probability distribution

I want to create a generator in ScalaCheck that generates numbers between say 1 and 100, but with a bell-like bias towards numbers closer to 1.
Gen.choose() distributes numbers randomly between the min and max value:
scala> (1 to 10).flatMap(_ => Gen.choose(1,100).sample).toList.sorted
res14: List[Int] = List(7, 21, 30, 46, 52, 64, 66, 68, 86, 86)
And Gen.chooseNum() has an added bias for the upper and lower bounds:
scala> (1 to 10).flatMap(_ => Gen.chooseNum(1,100).sample).toList.sorted
res15: List[Int] = List(1, 1, 1, 61, 85, 86, 91, 92, 100, 100)
I'd like a choose() function that would give me a result that looks something like this:
scala> (1 to 10).flatMap(_ => choose(1,100).sample).toList.sorted
res15: List[Int] = List(1, 1, 1, 2, 5, 11, 18, 35, 49, 100)
I see that choose() and chooseNum() take an implicit Choose trait as an argument. Should I use that?
You could use Gen.frequency() (1):
val frequencies = List(
(50000, Gen.choose(0, 9)),
(38209, Gen.choose(10, 19)),
(27425, Gen.choose(20, 29)),
(18406, Gen.choose(30, 39)),
(11507, Gen.choose(40, 49)),
( 6681, Gen.choose(50, 59)),
( 3593, Gen.choose(60, 69)),
( 1786, Gen.choose(70, 79)),
( 820, Gen.choose(80, 89)),
( 347, Gen.choose(90, 100))
)
(1 to 10).flatMap(_ => Gen.frequency(frequencies:_*).sample).toList
res209: List[Int] = List(27, 21, 31, 1, 21, 18, 9, 29, 69, 29)
I got the frequencies from https://en.wikipedia.org/wiki/Standard_normal_table#Complementary_cumulative. The code is just a sample of the table (% 3 or mod 3), but I think you can get the idea.
I can't take much credit for this, and will point you to this excellent page:
http://www.javamex.com/tutorials/random_numbers/gaussian_distribution_2.shtml
A lot of this depends what you mean by "bell-like". Your example doesn't show any negative numbers but the number "1" can't be in the middle of the bell and not produce any negative numbers unless it was a very, very tiny bell!
Forgive the mutable loop but I use them sometimes when I have to reject values in a collection build:
object Test_Stack extends App {
val r = new java.util.Random()
val maxBellAttempt = 102
val stdv = maxBellAttempt / 3 //this number * 3 will happen about 99% of the time
val collectSize = 100000
var filled = false
val l = scala.collection.mutable.Buffer[Int]()
//ref article above "What are the minimum and maximum values with nextGaussian()?"
while(l.size < collectSize){
val temp = (r.nextGaussian() * stdv + 1).abs.round.toInt //the +1 is the mean(avg) offset. can be whatever
//the abs is clipping the curve in half you could remove it but you'd need to move the +1 over more
if (temp <= maxBellAttempt) l+= temp
}
val res = l.to[scala.collection.immutable.Seq]
//println(res.mkString("\n"))
}
Here's the distribution I just pasted the output into excel and did a "countif" to show the freq of each:

Scala group by list of list and subtracts grouped values

Below sample data
val combineList = List(("A",12),("B",11),("C",12),("D",14),("E",23),("F",12),("D",53),("C",23),("B",12),("A",22),("E",21),("F",12),("C",21),("B",34),("A",34),("G",67),("D",23),("E",21),("F",12),("D",31),("B",41),("E",14),("F",15),("G",18),("A",11),("C",10),("D",9),("A",13),("E",1),("F",14))
and
val X = 98
Now want final output as below,
first group by all values as below
val groupKey = List(Map("A"->List(12,22,34,11,13)),Map("B"->List(11,12,34,41)),Map("C"->List(12,23,21,10)),Map("D"->List(14,53,23,31,9)),
Map("E"->List(23,21,21,14,1)),Map("F"->List(12,12,12,15,14)),Map("G"->List(67,18)))
Second substract X from groupKey List values here X always gretter than List values so second output will be as
val substrackValues = List(Map("A"->List(86,76,34,87,85)),Map("B"->List(87,86,34,57)),Map("C"->List(86,75,77,88)),Map("D"->List(84,45,75,31,89)),
Map("E"->List(75,77,77,84,97)),Map("F"->List(86,86,86,15,84)),Map("G"->List(31,80)))
Consider
combineList.groupBy(_._1).mapValues(xs => xs.map(v => X-v._2))
which delivers
Map(E -> List(75, 77, 77, 84, 97), F -> List(86, 86, 86, 83, 84), A -> List(86, 76, 64, 87, 85), G -> List(31, 80), B -> List(87, 86, 64, 57), C -> List(86, 75, 77, 88), D -> List(84, 45, 75, 67, 89))
Note the embedded maps in groupKey above are singleton maps which can well be represented with tuples of [(String,List[Int])] or even better agglomerated into one map.
In the solution proposed here after grouping by first tuple element, we transform each element in each list by the value of X.

Group a list of Scala Ints into different intervals?

I wasn't sure if groupBy, takeWhile, or grouped would achieve what I wanted to do. I need to develop a function that automatically groups a list of numbers according to the interval I want to specify. The use case is taking a list of ages and sorting them into dynamic age categories (like 1-5, 5-10, etc.). It would need to be dynamic since the user may want to change the intervals.
For example, I have the list of numbers: List(103, 206, 101, 111, 211, 234, 242, 99)
I can interval by 10, or by 100. Then the result of an input of 100 would be: List(List(99),List(101,103,111),List(206,211,234,242)).
I searched Google and SO for the last hour but couldn't find anything. Thanks for the help!
You will want groupBy:
val xs = List(103, 206, 101, 111, 211, 234, 242, 99)
xs.groupBy(_ / 100)
// Map(0 -> List(99), 1 -> List(103, 101, 111), ...)
grouped just creates subsequent clumps of a given size, not looking at the actual elements. takeWhile just takes the leading elements as long as a predicate holds.
You can use the withDefaultValue method on the resulting map to make it appear as an indexed sequence, where some entries are empty:
val ys = xs.groupBy(_ / 100) withDefaultValue Nil
ys(0) // List(99)
ys(4) // List() !
Here's an approach that generates the ranges and filters for values within them. I think the l.groupBy(_ / 100).values is preferable though.
val interval = 100
//This gives List(Vector(0, 100), Vector(100, 200), Vector(200, 300))
val intervals = 0 until l.max + interval by interval sliding(2)
for(interval <- intervals;
within <- List(l.filter(x => x > interval(0) && x <= interval(1)))
) yield within
With val l = List(103, 206, 101, 111, 211, 234, 242, 99) this gives:
List[List[Int]] = List(List(99), List(103, 101, 111), List(206, 211, 234, 242))