Checking specific key/value exists in custom sequences - scala

I am mind boggled right now as to how to get this to work.
I have a case class such as:
case class Visitors(start: Int, end: Int, visitor_num: Int)
Now say I create two separate Sequences of type Visitors:
val visitors_A = Seq(Visitors(start = 1, end = 1, visitor_num = 2),Visitors(start = 2, end = 2, visitor_num = 129),Visitors(start = 3, end = 3, visitor_num = 90))
val visitors_B = Seq(Visitors(start = 1, end = 1, visitor_num = 0),Visitors(start = 2, end = 2, visitor_num = 0))
I want to create a separate Visitors Sequence that will output a Sequence of Visitors that have the same start times from both visitors_A and visitors_B,
output example should be:
visitors_Same = Seq(Visitors(start = 1, end = 1, visitor_num = 2),Visitors(start = 2, end = 2, visitor_num = 129)
It should check whether start times are in both of the Sequences, if they are, grab the Sequence values from visitors_A and append it to the list.
What confuses me is that I am working with a "custom" type of Visitor, and I cannot seem to be able to run intersect or contains function calls for visitor_a in visitor_b, I understand I probably need to check whether start value from A exists in B and then map (?) the output to a new sequence of type Visitor?

If you want a one liner (but probably very inefficient) here it is:
val r1 = visitors_A.filter(va => visitors_B.exists(vb => vb.start == va.start))
You can gain a bit more speed if you convert visitors_B to a Map first (logically the map is start -> visitor):
val vbm = visitors_B.map(vb => (vb.start, vb)).toMap
val r2 = visitors_A.filter(va => vbm.contains(va.start))
Edit
Actually since values in the Map are not used at all, you can use Set instead which will be a bit more efficient than Map:
val vbs = visitors_B.map(vb => vb.start).toSet
val r3 = visitors_A.filter(va => vbs.contains(va.start))

Related

scala array.product function in 2d array

I have 2 d array:
val arr =Array(Array(2,1),Array(3,1),Array(4,1))
I should multiply all inner 1st elements and sum all inner 2nd elements to get as result:
Array(24,3)
I`m looking a way to use map there, something like :
arr.map(a=>Array(a(1stElemnt).product , a(2ndElemnt).sum ))
Any suggestion
Regards.
Following works but note that it is not safe, it throws exception if arr contains element/s that does not have exact 2 elements. You should add additional missing cases in pattern match as per your use case
val result = arr.fold(Array(1, 0)) {
case (Array(x1, x2), Array(y1, y2)) => Array(x1 * y1, x2 + y2)
}
Update
As #Luis suggested, if you make your original Array[Array] to Array[Tuple], another implementation could look like this
val arr = Array((2, 1), (3, 1), (4, 1))
val (prdArr, sumArr) = arr.unzip
val result = (prdArr.product, sumArr.sum)
println(result) // (24, 3)

Adding Sparse Vectors 3.0.0 Apache Spark Scala

I am trying to create a function as the following to add
two org.apache.spark.ml.linalg.Vector. or i.e two sparse vectors
This vector could look as the following
(28,[1,2,3,4,7,11,12,13,14,15,17,20,22,23,24,25],[0.13028398104008743,0.23648605632753023,0.7094581689825907,0.13028398104008743,0.23648605632753023,0.0,0.14218861229025295,0.3580566057240087,0.14218861229025295,0.13028398104008743,0.26056796208017485,0.0,0.14218861229025295,0.06514199052004371,0.13028398104008743,0.23648605632753023])
For e.g.
def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector): org.apache.spark.ml.linalg.Vector = {
}
Let's look at a use case
val x = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val y = Vectors.sparse(2, List(1), List(1)) // [0, 1]
I want to output to be
Vectors.sparse(2, List(0,1), List(1,1))
Here's another case where they share the same indices
val x = Vectors.sparse(2, List(1), List(1))
val y = Vectors.sparse(2, List(1), List(1))
This output should be
Vectors.sparse(2, List(1), List(2))
I've realized doing this is harder than it seems. I looked into one possible solution of converting the vectors into breeze, adding them in breeze and then converting it back to a vector. e.g Addition of two RDD[mllib.linalg.Vector]'s. So I tried implementing this.
def add_vectors(x: org.apache.spark.ml.linalg.Vector,y:org.apache.spark.ml.linalg.Vector) ={
val dense_x = x.toDense
val dense_y = y.toDense
val bv1 = new DenseVector(dense_x.toArray)
val bv2 = new DenseVector(dense_y.toArray)
val vectout = Vectors.dense((bv1 + bv2).toArray)
vectout
}
however this gave me an error in the last line
val vectout = Vectors.dense((bv1 + bv2).toArray)
Cannot resolve the overloaded method 'dense'.
I'm wondering why is error is occurring and ways to fix it?
To answer my own question, I had to think about how sparse vectors are. For e.g. Sparse Vectors require 3 arguments. the number of dimensions, an array of indices, and finally an array of values. For e.g.
val indices: Array[Int] = Array(1,2)
val norms: Array[Double] = Array(0.5,0.3)
val num_int = 4
val vector: Vector = Vectors.sparse(num_int, indices, norms)
If I converted this SparseVector to an Array I would get the following.
code:
val choiced_array = vector.toArray
choiced_array.map(element => print(element + " "))
Output:
[0.0, 0.5,0.3,0.0].
This is considered a more dense representation of it. So once you convert the two vectors to array you can add them with the following code
val add: Array[Double] = (vector.toArray, vector_2.toArray).zipped.map(_ + _)
This gives you another array of them both added. Next to create your new sparse vector, you would want to create an indices array as shown in the construction
var i = -1;
val new_indices_pre = add.map( (element:Double) => {
i = i + 1
if(element > 0.0)
i
else{
-1
}
})
Then lets filter out all -1 indices indication that indicate zero for that indice.
new_indices_pre.filter(element => element != -1)
Remember to filter out none zero values from the array which has the addition of the two vectors.
val final_add = add.filter(element => element > 0.0)
Lastly, we can make the new sparse Vector
Vectors.sparse(num_int,new_indices,final_add)

List whose elements depend on the previous ones

Suppose I have a list of increasing integers. If the difference of 2 consecutive numbers is less than a threshold, then we index them by the same number, starting from 0. Otherwise, we increase the index by 1.
For example: for the list (1,2,5,7,8,11,15,16,20) and the threshold = 3, the output will be: (0, 0, 1, 1, 1, 2, 3, 3, 4).
Here is my code:
val listToGroup = List(1,2,5,7,8,11,15,16,20)
val diff_list = listToGroup.sliding(2,1).map{case List(i, j) => j-i}.toList
val thres = 2
var j=0
val output_ = for(i <- diff_list.indices) yield {
if (diff_list(i) > thres ) {
j += 1
}
j
}
val output = List.concat(List(0), output_)
I'm new to Scala and I feel the list is not used efficiently. How can this code be improved?
You can avoid the mutable variable by using scanLeft to get a more idiomatic code:
val output = diff_list.scanLeft(0) { (count, i) =>
if (i > thres) count + 1
else count
}
Your code shows some constructs which are usually avoided in Scala, but common when coming from procedural langugues, like: for(i <- diff_list.indices) ... diff_list(i) can be replaced with for(i <- diff_list).
Other than that, I think your code is efficient - you need to traverse the list anyway and you do it in O(N). I would not worry about efficiency here, more about style and readability.
My rewrite to how I think it would be more natural in Scala for the whole code would be:
val listToGroup = List(1,2,5,7,8,11,15,16,20)
val thres = 2
val output = listToGroup.zip(listToGroup.drop(1)).scanLeft(0) { case (count, (i, j)) =>
if (j - i > thres) count + 1
else count
}
My adjustments to your code:
I use scanLeft to perform the result collection construction
I prefer x.zip(x.drop(1)) over x.sliding(2, 1) (constructing tuples seems a bit more efficient than constructing collections). You could also use x.zip(x.tail), but that does not handle empty x
I avoid the temporary result diff_list
val listToGroup = List(1, 2, 5, 7, 8, 11, 15, 16, 20)
val thres = 2
listToGroup
.sliding(2)
.scanLeft(0)((a, b) => { if (b.tail.head - b.head > thres) a + 1 else a })
.toList
.tail
You don't need to use mutable variable, you can achieve the same with scanLeft.

Can I Pair Two Sequences Together by a Matching Key?

Let's say sequence one is going out to the web to retrieve the contents of sites 1, 2, 3, 4, 5 (but will return in unpredictable order).
Sequence two is going to a database to retrieve context about these same records 1, 2, 3, 4, 5 (but for the purposes of this example will return in unpredictable order).
Is there an Rx extension method that will combine these into one sequence when each matching pair is ready in both sequences? Ie, if the first sequence returns in the order 4,2,3,5,1 and the second sequence returns in the order 1,4,3,2,5, the merged sequence would be (4,4), (3,3), (2,2), (1,1), (5,5) - as soon as each pair is ready. I've looked at Merge and Zip but they don't seem to be exactly what I'm looking for.
I wouldn't want to discard pairs that don't match, which I think rules out a simple .Where.Select combination.
var paired = Observable
.Merge(aSource, bSource)
.GroupBy(i => i)
.SelectMany(g => g.Buffer(2).Take(1));
The test below gives the correct results. It's just taking ints at the moment, if you're using data with keys and values, then you'll need to group by i.Key instead of i.
var aSource = new Subject<int>();
var bSource = new Subject<int>();
paired.Subscribe(g => Console.WriteLine("{0}:{1}", g.ElementAt(0), g.ElementAt(1)));
aSource.OnNext(4);
bSource.OnNext(1);
aSource.OnNext(2);
bSource.OnNext(4);
aSource.OnNext(3);
bSource.OnNext(3);
aSource.OnNext(5);
bSource.OnNext(2);
aSource.OnNext(1);
bSource.OnNext(5);
yields:
4:4
3:3
2:2
1:1
5:5
Edit in response to Brandon:
For the situation where the items are different classes (AClass and BClass), the following adjustment can be made.
using Pair = Tuple<AClass, BClass>;
var paired = Observable
.Merge(aSource.Select(a => new Pair(a, null)), bSource.Select(b => new Pair(null, b)))
.GroupBy(p => p.Item1 != null ? p.Item1.Key : p.Item2.Key)
.SelectMany(g => g.Buffer(2).Take(1))
.Select(g => new Pair(
g.ElementAt(0).Item1 ?? g.ElementAt(1).Item1,
g.ElementAt(0).Item2 ?? g.ElementAt(1).Item2));
So you have 2 observable sequences that you want to pair together?
Pair from Rxx along with GroupBy can help here. I think code similar to the following might do what you want
var pairs = stream1.Pair(stream2)
.GroupBy(pair => pair.Switch(source1 => source1.Key, source2 => source2.Key))
.SelectMany(group => group.Take(2).ToArray()) // each group will have at most 2 results (1 left and 1 right)
.Select(pair =>
{
T1 result1 = default(T1);
T2 result2 = default(T2);
foreach (var r in pair)
{
if (r.IsLeft) result1 = r.Left;
else result2 = r.Right;
}
return new { result1, result2 };
});
```
I've not tested it, and not added in anything for error handling, but I think this is what you want.

help rewriting in functional style

I'm learning Scala as my first functional-ish language. As one of the problems, I was trying to find a functional way of generating the sequence S up to n places. S is defined so that S(1) = 1, and S(x) = the number of times x appears in the sequence. (I can't remember what this is called, but I've seen it in programming books before.)
In practice, the sequence looks like this:
S = 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7 ...
I can generate this sequence pretty easily in Scala using an imperative style like this:
def genSequence(numItems: Int) = {
require(numItems > 0, "numItems must be >= 1")
var list: List[Int] = List(1)
var seq_no = 2
var no = 2
var no_nos = 0
var num_made = 1
while(num_made < numItems) {
if(no_nos < seq_no) {
list = list :+ no
no_nos += 1
num_made += 1
} else if(no % 2 == 0) {
no += 1
no_nos = 0
} else {
no += 1
seq_no += 1
no_nos = 0
}
}
list
}
But I don't really have any idea how to write this without using vars and the while loop.
Thanks!
Pavel's answer has come closest so far, but it's also inefficient. Two flatMaps and a zipWithIndex are overkill here :)
My understanding of the required output:
The results contain all the positive integers (starting from 1) at least once
each number n appears in the output (n/2) + 1 times
As Pavel has rightly noted, the solution is to start with a Stream then use flatMap:
Stream from 1
This generates a Stream, a potentially never-ending sequence that only produces values on demand. In this case, it's generating 1, 2, 3, 4... all the way up to Infinity (in theory) or Integer.MAX_VALUE (in practice)
Streams can be mapped over, as with any other collection. For example: (Stream from 1) map { 2 * _ } generates a Stream of even numbers.
You can also use flatMap on Streams, allowing you to map each input element to zero or more output elements; this is key to solving your problem:
val strm = (Stream from 1) flatMap { n => Stream.fill(n/2 + 1)(n) }
So... How does this work? For the element 3, the lambda { n => Stream.fill(n/2 + 1)(n) } will produce the output stream 3,3. For the first 5 integers you'll get:
1 -> 1
2 -> 2, 2
3 -> 3, 3
4 -> 4, 4, 4
5 -> 5, 5, 5
etc.
and because we're using flatMap, these will be concatenated, yielding:
1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, ...
Streams are memoised, so once a given value has been calculated it'll be saved for future reference. However, all the preceeding values have to be calculated at least once. If you want the full sequence then this won't cause any problems, but it does mean that generating S(10796) from a cold start is going to be slow! (a problem shared with your imperative algorithm). If you need to do this, then none of the solutions so far is likely to be appropriate for you.
The following code produces exactly the same sequence as yours:
val seq = Stream.from(1)
.flatMap(Stream.fill(2)(_))
.zipWithIndex
.flatMap(p => Stream.fill(p._1)(p._2))
.tail
However, if you want to produce the Golomb sequence (that complies with the definition, but differs from your sample code result), you may use the following:
val seq = 1 #:: a(2)
def a(n: Int): Stream[Int] = (1 + seq(n - seq(seq(n - 2) - 1) - 1)) #:: a(n + 1)
You may check my article for more examples of how to deal with number sequences in functional style.
Here is a translation of your code to a more functional style:
def genSequence(numItems: Int): List[Int] = {
genSequenceR(numItems, 2, 2, 0, 1, List[Int](1))
}
def genSequenceR(numItems: Int, seq_no: Int, no:Int, no_nos: Int, numMade: Int, list: List[Int]): List[Int] = {
if(numMade < numItems){
if(no_nos < seq_no){
genSequenceR(numItems, seq_no, no, no_nos + 1, numMade + 1, list :+ no)
}else if(no % 2 == 0){
genSequenceR(numItems, seq_no, no + 1, 0, numMade, list)
}else{
genSequenceR(numItems, seq_no + 1, no + 1, 0, numMade, list)
}
}else{
list
}
}
The genSequenceR is the recursive function that accumulates values in the list and calls the function with new values based on the conditions. Like the while loop, it terminates, when numMade is less than numItems and returns the list to genSequence.
This is a fairly rudimentary functional translation of your code. It can be improved and there are better approaches typically used. I'd recommend trying to improve it with pattern matching and then work towards the other solutions that use Stream here.
Here's an attempt from a Scala tyro. Keep in mind I don't really understand Scala, I don't really understand the question, and I don't really understand your algorithm.
def genX_Ys[A](howMany : Int, ofWhat : A) : List[A] = howMany match {
case 1 => List(ofWhat)
case _ => ofWhat :: genX_Ys(howMany - 1, ofWhat)
}
def makeAtLeast(startingWith : List[Int], nextUp : Int, howMany : Int, minimumLength : Int) : List[Int] = {
if (startingWith.size >= minimumLength)
startingWith
else
makeAtLeast(startingWith ++ genX_Ys( howMany, nextUp),
nextUp +1, howMany + (if (nextUp % 2 == 1) 1 else 0), minimumLength)
}
def genSequence(numItems: Int) = makeAtLeast(List(1), 2, 2, numItems).slice(0, numItems)
This seems to work, but re-read the caveats above. In particular, I am sure there is a library function that performs genX_Ys, but I couldn't find it.
EDIT Could be
def genX_Ys[A](howMany : Int, ofWhat : A) : Seq[A] =
(1 to howMany) map { x => ofWhat }
Here is a very direct "translation" of the definition of the Golomb seqence:
val it = Iterator.iterate((1,1,Map(1->1,2->2))){ case (n,i,m) =>
val c = m(n)
if (c == 1) (n+1, i+1, m + (i -> n) - n)
else (n, i+1, m + (i -> n) + (n -> (c-1)))
}.map(_._1)
println(it.take(10).toList)
The tripel (n,i,m) contains the actual number n, the index i and a Map m, which contains how often an n must be repeated. When the counter in the Map for our n reaches 1, we increase n (and can drop n from the map, as it is not longer needed), else we just decrease n's counter in the map and keep n. In every case we add the new pair i -> n into the map, which will be used as counter later (when a subsequent n reaches the value of the current i).
[Edit]
Thinking about it, I realized that I don't need indexes and not even a lookup (because the "counters" are already in the "right" order), which means that I can replace the Map with a Queue:
import collection.immutable.Queue
val it = 1 #:: Iterator.iterate((2, 2, Queue[Int]())){
case (n,1,q) => (n+1, q.head, q.tail + (n+1))
case (n,c,q) => (n,c-1,q + n)
}.map(_._1).toStream
The Iterator works correctly when starting by 2, so I had to add a 1 at the beginning. The second tuple argument is now the counter for the current n (taken from the Queue). The current counter could be kept in the Queue as well, so we have only a pair, but then it's less clear what's going on due to the complicated Queue handling:
val it = 1 #:: Iterator.iterate((2, Queue[Int](2))){
case (n,q) if q.head == 1 => (n+1, q.tail + (n+1))
case (n,q) => (n, ((q.head-1) +: q.tail) + n)
}.map(_._1).toStream