I need to process a diff between two (huge) Maps. To parallelize the task, I would like to split these 2 Maps by Key hash value and create smaller Maps (by range of hash value).
How would I archieve that in (idiomatic) Scala?
Here's a rough sketch to get you started with the Scala syntax:
// create two (slightly different) maps, print them as table side by side
val rnd = new util.Random
val originalMap1 = (0 to 10).map(i => (i, i * i)).toMap
val originalMap2 = (0 to 10).map(i => (i, i * i + rnd.nextInt(2))).toMap
for (i <- 0 to 10) {
val a = originalMap1(i)
val b = originalMap2(i)
val marker = if (a == b) "" else " <-"
println(s"$i: $a $b $marker")
}
//subdivide into smaller maps
val numSubmaps = 5
val submaps1 = originalMap1.groupBy(_._1.hashCode % numSubmaps)
val submaps2 = originalMap2.groupBy(_._1.hashCode % numSubmaps)
// compare each corresponding pair of maps separately, merge diffs
val diffs = (for (s <- 0 until numSubmaps) yield {
val m1 = submaps1(s)
val m2 = submaps2(s)
for {
k <- m1.keys
a = m1(k)
b = m2(k)
if a != b
} yield (k, (a, b))
}).reduce(_ ++ _)
println(diffs.toList.sortBy(_._1))
Input:
0: 0 1 <-
1: 1 2 <-
2: 4 4
3: 9 9
4: 16 16
5: 25 26 <-
6: 36 36
7: 49 49
8: 64 65 <-
9: 81 82 <-
10: 100 101 <-
Output:
List((0,(0,1)), (1,(1,2)), (5,(25,26)), (8,(64,65)), (9,(81,82)), (10,(100,101)))
Related
I have a file with values like this :
user id | item id | rating | timestamp
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
6 86 3 883603013
62 257 2 879372434
200 222 5 876042340
210 40 3 891035994
224 29 3 888104457
303 785 3 879485318
122 387 5 879270459
194 274 2 879539794
......
I want to find all values where item id = "560"
and make Map from rating values(1-5) like this {1->6,2-5,3-10,4-6,5-14}
object Parse {
def main(args: Array[String]): Unit = {
//вытаскиваем данные с u.data
var a: List[(String, String, String, String)] = List()
for (line <- io.Source.fromFile("F:\\big data\\u.data").getLines) {
val newLine = line.replace("\t", ",")
if (newLine.split(",").length < 4) {
break
} else {
val asd = newLine.split(",")
val userId = asd(0)
val itemId = asd(1)
val rating = asd(2)
val timestamp = asd(3)
a = a :+ ((userId, itemId, rating, timestamp))
}
a = a.filter(_._2.equals("590")) <- filter list of tuples correctly
val empty: List[String] = a.map(_._2) <- have tyed to get list of all rating, but it does not work
}
}
How can I create a map of rating?
here as I can see we can generate a map of matching values
Scala groupBy for a list
If what you want is a Map of rating->count for a given "item id", this should do it.
util.Using(io.Source.fromFile("../junk.txt")) { file =>
val rec = raw"\d+\s+590\s+(\d+)\s+\d+".r //only this item id
file.getLines()
.collect { case rec(rating) => rating }
.foldLeft(Map.empty[String, Int]) {
case (m, r) => m + (r -> (m.getOrElse(r, 0) + 1))
}
}.getOrElse(Map.empty[String,Int])
Note that fromFile() is automatically closed at the end of the Using block.
I think using for-loop is not the better decision. Please, look at your problem from the data-stream problem not array. scala.io.Source.fromFile("F:\\big data\\u.data").getLines() returns to you Iterator[String] of your lines. It is more suitable to use it as data stream not as array of data. And in your conditions is better just use combination of map, filter, collect and groupBy functions to get grouped rows by rank.
Full correct code:
val sourceFile = scala.io.Source.fromFile("F:\\big data\\u.data")
try {
val linesOfArrays = sourceFile.getLines().map{
line => line.split(",")
}
require(!linesOfArrays.exists(_.length < 4)) // your data schema validation
val ratingCountsMap: Map[String, Int] = linesOfArrays.collect{
case rowValuesArray if rowValuesArray(1) == "590" =>
// in this line you will get rating and 1 for his counting
rowValuesArray(2) -> 1
}.toSeq
.groupBy{ case (rating, _) => rating }
.mapValues{ groupWithSameRating => groupWithSameRating.length }
} finally sourceFile.close()
And don't forget to release resource (in your case this is file) using close method in finally section or use scala-arm library (more about resources here)
I am new to scala and want to write a code that add two numbers represented by linked list in scala as per the below given example
Input:
First List: 5->6->3 // represents number 365
Second List: 8->4->2 // represents number 248
Output
Resultant list: 3->1->6 // represents number 613
I have implemented a code of mutable singly linked list in scala for adding,updating and inserting elements to linked list. Find my code below
class SinglyLinkedList[A] extends ListADT[A] {
private class Node(var data: A,var next: Node)
private var head: Node = null
def apply(index: Int): A = {
require(index >= 0)
var rover = head
for (i <- 0 until index) rover = rover.next
rover.data
}
def update(index: Int,data: A): Unit = {
require(index >= 0)
var rover = head
for (i <- 0 until index) rover = rover.next
rover.data = data
}
def insert(index: Int,data: A): Unit = {
require(index >= 0)
if(index == 0) {
head = new Node(data, head)
}
else{
var rover = head
for (i <- 0 until index-1)
rover = rover.next
rover.next = new Node(data, rover.next)
}
}
def remove(index: Int): A = {
require(index >= 0)
if(index == 0){
val ret = head.data
head = head.next
ret
} else {
var rover = head
for (i <- 0 until index-1) rover = rover.next
val ret = rover.next.data
rover.next = rover.next.next
ret
}
}
}
Can anyone let me know how I am going to perform the addition of two numbers represented by linked list.
How does addition works? I mean the addition on paper: one number under the other?
Let's try for 465 + 248
465
+ 248
---
We start with the least significant digits: 5 + 8. But 5 + 8 = 13, so the result won't fit into a single digit. Which is why we do just like a teacher in preschool taught us: we leave the unit digit and carry the tens digit to the next column
1
465
+ 248
---
3
Now tens. 6 + 4 + (carried) 1 = 11. Again, we leave 1 and carry 1 to the next column:
11
465
+ 248
---
13
And the last column. 4 + 2 + 1 = 7.
11
465
+ 248
---
713
Thus result is 713. If one these 2 numbers have more column or you would carry in the last addition, you could just rewrite remaining numbers.
With immutable liked list it would work the same way (I'll explain in a moment why I used immutable):
take both lists
take heads of both lists (if one of them is empty, you can just return the other as a result of addition)
add heads, and split the result into carry and current digit (carry would be 0 or 0, digit 0 to 9)
if there is carry > 0 add list carry :: Nil to one of tails recursively
prepend digit to recursively added tails
You should end up with something like that:
val add: (List[Int], List[Int]) => List[Int] = {
case (a :: as, b :: bs) =>
val digit = (a + b) % 10
val carry = (a + b) / 10
if (carry > 0) digit :: add(add(as, carry :: Nil), bs)
else digit :: add(as, bs)
case (as, Nil) => as
case (Nil, bs) => bs
}
add(5 :: 6 :: 4 :: Nil, 8 :: 4 :: 2 :: Nil) // 3 :: 1 :: 7 :: Nil
Now, if you would use mutable list it would get trickier. If you want to use mutable list you want to update one of them, right? Which one - first? Second? Both? Your algorithm might calculate the right result but butcher the input.
Let's say you always add the second list to the fist one, and you want to leave the second intact. If the second list is longer, and you would have to add some new places for digits, you have to copy all remaining segments (otherwise you could e.g. update one number in second list and change the first one). You would also have to handle the corner case with carry.
Quite counter-intuitive behavior - numbers are not mutable, and you want to represent numbers.
Try this:
def add(a: List[Int], b: List[Int], o: Int): List[Int] = (a,b,o) match {
case (x::xs, y::ys, d) =>
val v = d + x + y
(v%10)::add(xs, ys, v/10)
case (Nil, Nil, 0) => Nil
case (Nil, Nil, d) => d::Nil
case (xs, Nil, d) => add(xs, 0::Nil, d)
case (Nil, ys, d) => add(0::Nil, ys, d)
}
how to sort multiple colmuns (more than ten columns) in scala language.
for example:
1 2 3 4
4 5 6 3
1 2 1 1
2 3 5 10
desired output
1 2 1 1
1 2 3 3
2 3 5 4
4 5 6 10
Not much to it.
val input = io.Source.fromFile("junk.txt") // open file
.getLines // load all contents
.map(_.split("\\W+")) // turn rows into Arrays
.map(_.map(_.toInt)) // Arrays of Ints
val output = input.toList // from Iterator to List
.transpose // swap rows/columns
.map(_.sorted) // sort rows
.transpose // swap back
output.foreach(x => println(x.mkString(" "))) // report results
Note: This allows any whitespace between the numbers but it will fail to create the expected Array[Int] if it encounters other separators (commas, etc.) or if the line begins with a space.
Also, transpose will throw if the rows aren't all the same size.
I followed the following algorithm. First alter the dimension of the row and columns. Then sort the rows, then again alter the dimension to bring back original row-column configuration. Here is a sample proof of concept.
object SO_42720909 extends App {
// generating dummy data
val unsortedData = getDummyData(2, 3)
prettyPrint(unsortedData)
println("----------------- ")
// altering the dimension
val unsortedAlteredData = alterDimension(unsortedData)
// prettyPrint(unsortedAlteredData)
// sorting the altered data
val sortedAlteredData = sort(unsortedAlteredData)
// prettyPrint(sortedAlteredData)
// bringing back original dimension
val sortedData = alterDimension(sortedAlteredData)
prettyPrint(sortedData)
def alterDimension(data: Seq[Seq[Int]]): Seq[Seq[Int]] = {
val col = data.size
val row = data.head.size // make it safe
for (i <- (0 until row))
yield for (j <- (0 until col)) yield data(j)(i)
}
def sort(data: Seq[Seq[Int]]): Seq[Seq[Int]] = {
for (row <- data) yield row.sorted
}
def getDummyData(row: Int, col: Int): Seq[Seq[Int]] = {
val r = scala.util.Random
for (i <- (1 to row))
yield for (j <- (1 to col)) yield r.nextInt(100)
}
def prettyPrint(data: Seq[Seq[Int]]): Unit = {
data.foreach(row => {
println(row.mkString(", "))
})
}
}
val SumABC = 1000
val Max = 468
val Min = 32
val p9 = for {
a <- Max to 250 by -1
b <- Min+(Max-a) to 249
if a*a+b*b == (SumABC-a-b)*(SumABC-a-b)
} yield a*b*(SumABC-a-b)
Can I .take(1) here? (I tried to translate it to flatmap, filter, etc, but since I failed I guess it wouldn't be as readable anyway...)
If I understood your cryptic questin, what you would like to do is the following
val p9 = (for {
a <- Max to 250 by -1
b <- Min+(Max-a) to 249
if a*a+b*b == (SumABC-a-b)*(SumABC-a-b)
} yield a*b*(SumABC-a-b)).take(1)
Just add parenthesis before for and after yield to ensure the take method is called on the result of the for block
Given n ( say 3 people ) and s ( say 100$ ), we'd like to partition s among n people.
So we need all possible n-tuples that sum to s
My Scala code below:
def weights(n:Int,s:Int):List[List[Int]] = {
List.concat( (0 to s).toList.map(List.fill(n)(_)).flatten, (0 to s).toList).
combinations(n).filter(_.sum==s).map(_.permutations.toList).toList.flatten
}
println(weights(3,100))
This works for small values of n. ( n=1, 2, 3 or 4).
Beyond n=4, it takes a very long time, practically unusable.
I'm looking for ways to rework my code using lazy evaluation/ Stream.
My requirements : Must work for n upto 10.
Warning : The problem gets really big really fast. My results from Matlab -
---For s =100, n = 1 thru 5 results are ---
n=1 :1 combinations
n=2 :101 combinations
n=3 :5151 combinations
n=4 :176851 combinations
n=5: 4598126 combinations
---
You need dynamic programming, or memoization. Same concept, anyway.
Let's say you have to divide s among n. Recursively, that's defined like this:
def permutations(s: Int, n: Int): List[List[Int]] = n match {
case 0 => Nil
case 1 => List(List(s))
case _ => (0 to s).toList flatMap (x => permutations(s - x, n - 1) map (x :: _))
}
Now, this will STILL be slow as hell, but there's a catch here... you don't need to recompute permutations(s, n) for numbers you have already computed. So you can do this instead:
val memoP = collection.mutable.Map.empty[(Int, Int), List[List[Int]]]
def permutations(s: Int, n: Int): List[List[Int]] = {
def permutationsWithHead(x: Int) = permutations(s - x, n - 1) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoP getOrElseUpdate ((s, n),
(0 to s).toList flatMap permutationsWithHead)
}
}
And this can be even further improved, because it will compute every permutation. You only need to compute every combination, and then permute that without recomputing.
To compute every combination, we can change the code like this:
val memoC = collection.mutable.Map.empty[(Int, Int, Int), List[List[Int]]]
def combinations(s: Int, n: Int, min: Int = 0): List[List[Int]] = {
def combinationsWithHead(x: Int) = combinations(s - x, n - 1, x) map (x :: _)
n match {
case 0 => Nil
case 1 => List(List(s))
case _ =>
memoC getOrElseUpdate ((s, n, min),
(min to s / 2).toList flatMap combinationsWithHead)
}
}
Running combinations(100, 10) is still slow, given the sheer numbers of combinations alone. The permutations for each combination can be obtained simply calling .permutation on the combination.
Here's a quick and dirty Stream solution:
def weights(n: Int, s: Int) = (1 until s).foldLeft(Stream(Nil: List[Int])) {
(a, _) => a.flatMap(c => Stream.range(0, n - c.sum + 1).map(_ :: c))
}.map(c => (n - c.sum) :: c)
It works for n = 6 in about 15 seconds on my machine:
scala> var x = 0
scala> weights(100, 6).foreach(_ => x += 1)
scala> x
res81: Int = 96560646
As a side note: by the time you get to n = 10, there are 4,263,421,511,271 of these things. That's going to take days just to stream through.
My solution of this problem, it can computer n till 6:
object Partition {
implicit def i2p(n: Int): Partition = new Partition(n)
def main(args : Array[String]) : Unit = {
for(n <- 1 to 6) println(100.partitions(n).size)
}
}
class Partition(n: Int){
def partitions(m: Int):Iterator[List[Int]] = new Iterator[List[Int]] {
val nums = Array.ofDim[Int](m)
nums(0) = n
var hasNext = m > 0 && n > 0
override def next: List[Int] = {
if(hasNext){
val result = nums.toList
var idx = 0
while(idx < m-1 && nums(idx) == 0) idx = idx + 1
if(idx == m-1) hasNext = false
else {
nums(idx+1) = nums(idx+1) + 1
nums(0) = nums(idx) - 1
if(idx != 0) nums(idx) = 0
}
result
}
else Iterator.empty.next
}
}
}
1
101
5151
176851
4598126
96560646
However , we can just show the number of the possible n-tuples:
val pt: (Int,Int) => BigInt = {
val buf = collection.mutable.Map[(Int,Int),BigInt]()
(s,n) => buf.getOrElseUpdate((s,n),
if(n == 0 && s > 0) BigInt(0)
else if(s == 0) BigInt(1)
else (0 to s).map{k => pt(s-k,n-1)}.sum
)
}
for(n <- 1 to 20) printf("%2d :%s%n",n,pt(100,n).toString)
1 :1
2 :101
3 :5151
4 :176851
5 :4598126
6 :96560646
7 :1705904746
8 :26075972546
9 :352025629371
10 :4263421511271
11 :46897636623981
12 :473239787751081
13 :4416904685676756
14 :38393094575497956
15 :312629484400483356
16 :2396826047070372396
17 :17376988841260199871
18 :119594570260437846171
19 :784008849485092547121
20 :4910371215196105953021