identify sequence of numbers in Scala - scala

I was wondering what would be the logic to identify from a list of numbers a pattern of consecutive numbers and max of that in Scala. For example, if
val x = List(1,2,3,8,15,26)
Then the output of the function should be
val y = List(3,8,15,26) where 3 is the max of 1,2,3 which is a sequence of consecutive numbers. 8,15,26 are not consecutive and hence those numbers are unaltered.
The position does not matter, meaning, I can sort the list and then identify the sequences.
How to approach this problem?

After x is sorted you could do something like this.
(x :+ x.last).sliding(2).filter(p => p(0) != p(1)-1).map(_(0)).toList

Related

Using `quick-look` to find certain element

First of all, I want to say that this is a school assignment and I am only seeking for some guidance.
My task was to write an algorithm that finds the k:th smallest element in a seq using quickselect. This should be easy enough but when running some tests I hit a wall. For some reason if I use input (List(1, 1, 1, 1), 1) it goes into infinite loop.
Here is my implementation:
val rand = new scala.util.Random()
def find(seq: Seq[Int], k: Int): Int = {
require(0 <= k && k < seq.length)
val a: Array[Int] = seq.toArray[Int] // Can't modify the argument sequence
val pivot = rand.nextInt(a.length)
val (low, high) = a.partition(_ < a(pivot))
if (low.length == k) a(pivot)
else if (low.length < k) find(high, k - low.length)
else find(low, k)
}
For some reason (or because I am tired) I cannot spot my mistake. If someone could hint me where I go wrong I would be pleased.
Basically you are depending on this line - val (low, high) = a.partition(_ < a(pivot)) to split the array into 2 arrays. The first one containing the continuous sequence of elements smaller than pivot-element and the second contains the rest.
Then you say that if the first array has length k that means you have already seen k elements smaller that your pivot-element. Which means pivot-element is actually k+1th smallest and you are actually returning k+1th smallest element instead of kth. This is your first mistake.
Also... A greater problem occurs when you have all elements which are same because your first array will always have 0 elements.
Not only that... your code will give you wrong answer for inputs where you have repeating elements among k smallest ones like - (1, 3, 4, 1, 2).
The solution lies in obervation that in the sequence (1, 1, 1, 1) the 4th smallest element is the 4th 1. Meaning you have to use <= instead of <.
Also... Since the partition function will not split the array until your boolean condition is false, you can not use partition for achieving this array split. you will have to write the split yourself.

How to calculate median over RDD[org.apache.spark.mllib.linalg.Vector] in Spark efficiently?

What I want to do like this:
http://cn.mathworks.com/help/matlab/ref/median.html?requestedDomain=www.mathworks.com
Find the median value of each column.
It can be done by collecting the RDD to driver, for a big data which will become impossible.
I know Statistics.colStats() can calculate mean, variance... but median is not included.
Additionally, the vector is high-dimensional and sparse.
Well I didn't understand the vector part, however this is my approach (I bet there are better ones):
val a = sc.parallelize(Seq(1, 2, -1, 12, 3, 0, 3))
val n = a.count() / 2
println(n) // outputs 3
val b = a.sortBy(x => x).zipWithIndex()
val median = b.filter(x => x._2 == n).collect()(0)._1 // this part doesn't look nice, I hope someone tells me how to improve it, maybe zero?
println(median) // outputs 2
b.collect().foreach(println) // (-1,0) (0,1) (1,2) (2,3) (3,4) (3,5) (12,6)
The trick is to sort your dataset using sortBy, then zip the entries with their index using zipWithIndex and then get the middle entry, note that I set an odd number of samples, for simplicity but the essence is there, besides you have to do this with every column of your dataset.

Scala: Issue w/ sorting

Sort of new to Scala.. Say I have an array:
1,2,3,4,5,6,7,8,9,10
And would like to get back all the numbers from 6
How would I achieve this in Scala?
I may be misunderstanding your request. Do you want all the numbers from 6 to 10? If so,
val nums = List(1,2,3,4,5,6,7,8,9,10)
nums.filter(_ >= 6)
You can do it for example like this:
val l = List(1,2,3,4,5,6,7,8,9,10)
l.sortBy(num => Math.abs(num - 6))
Take a look at documentation of sortBy method of List: http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.List
sortBy takes as argument a function which defines order. In our case the ordering function takes single argument which is num and maps it to distance from number 6. Distance is computed as absolute value of 6 substracted from the given number.

Why mapped pairs get obliterated?

I'm trying to understand the example here which computes Jaccard similarity between pairs of vectors in a matrix.
val aBinary = adjacencyMatrix.binarizeAs[Double]
// intersectMat holds the size of the intersection of row(a)_i n row (b)_j
val intersectMat = aBinary * aBinary.transpose
val aSumVct = aBinary.sumColVectors
val bSumVct = aBinary.sumRowVectors
//Using zip to repeat the row and column vectors values on the right hand
//for all non-zeroes on the left hand matrix
val xMat = intersectMat.zip(aSumVct).mapValues( pair => pair._2 )
val yMat = intersectMat.zip(bSumVct).mapValues( pair => pair._2 )
Why does the last comment mention non-zero values? As far as I'm aware, the ._2 function selects the second element of a pair independent of the first element. At what point are (0, x) pairs obliterated?
Yeah, I don't know anything about scalding but this seems odd. If you look at zip implementation it mentions specifically that it does an outer join to preserve zeros on either side. So it does not seem that the comment applies to how zeroes are actually treated in matrix.zip.
Besides looking at the dimension returned by zip, it really seems this line just replicates the aSumVct column vector for each column:
val xMat = intersectMat.zip(aSumVct).mapValues( pair => pair._2 )
Also I find the val bSumVct = aBinary.sumRowVectors suspicious, because it sums the matrix along the wrong dimension. It feels like something like this would be better:
val bSumVct = aBinary.tranpose.sumRowVectors
Which would conceptually be the same as aSumVct.transpose, so that at the end of the day, in the cell (i, j) of xMat + yMat we find the sum of elements of row(i) plus the sum of elements of row(j), then we subtract intersectMat to adjust for the double counting.
Edit: a little bit of googling unearthed this blog post: http://www.flavianv.me/post-15.htm. It seems the comments were related to that version where the vectors to compare are in two separate matrices that don't necessarily have the same size.

How does this permutations function work (Scala)?

I'm looking at Pavel's solution to Project Euler question 24, but can't quite work out how this function works - can someone explain what it's doing? Its purpose is to return the millionth lexicographic permutation of the digits 0 to 9.
def ps(s: String): Seq[String] = if(s.size == 1) Seq(s) else
s.flatMap(c => ps(s.filterNot(_ == c)).map(c +))
val r = ps("0123456789")(999999).toLong
I understand that when the input String is length 1, the function returns that character as a Seq, and I think then what happens is that it's appended to the only other character that was left, but I can't really visualise how you get to that point, or why this results in a list of permutations.
(I already solved the problem myself but used the permutations method, which makes it a fairly trivial 1-liner, but would like to be able to understand the above.)
For each letter (flatMap(c => ...)) of the given string s, ps generates a permutation by permutating the remaining letters ps(s.filterNot(_ == c)) and prepending the taken letter in front of this permutation (map(c +)). For the trivial case of a one letter string, it does nothing (if(s.size == 1) Seq(s)).
Edit: Why does this work?
Let’s begin with shuffling a one letter string:
[a]
-> a # done.
Now for two letters, we split the task into subtasks. Take each character in the set, put it to the first position and permutate the rest.
a [b]
-> b
b [a]
-> a
For three letters, it’s the same. Take each character and prepend it to each of the sub-permutations of the remaining letters.
a [b c]
-> b [c]
-> c
-> c [b]
-> b
b [a c]
-> a [c]
-> c
-> c [a]
# ... and so on
So, basically the outermost function guarantees that each letter gets to the first position, the first recursive call guarantees the same for the second position and so on.
Let's write it out in pseudocode:
for each letter in the string
take that letter out
find all permutations of what remains
stick that letter on the front
Because it's working for each letter in the string, what this effectively does is move each letter in turn to the front of the string (which means that the first letter can be any of the letters present, which is what you need for a permutation). Since it works recursively, the remainder is every remaining permutation.
Note that this algorithm assumes all the letters are different (because filterNot is used to remove the selected letter); the permutations method in the collections library does not assume this.
Unrelated to this, but you might be interested in knowing you can compute the millionth lexicographical permutation without computing any of the previous ones.
The idea is pretty simple: for N digits, there are N! permutations. That means 10 digits can yield 3628800 permutations, 9 digits can yield 362880 permutations, and so on. Given that information, we can compute the following table:
First digit First Permutation Last Permutatation
0 1 362880
1 362881 725760
2 725761 1088640
3 1088641 1451520
4 1451521 1814400
5 1814401 2177280
6 2177281 2540160
7 2540161 2903040
8 2903041 3265920
9 3265921 3628800
So the first digit will be 2, because that's the range in which 1000000 is. Or, to put it more simply, the first digit is the one at the index (1000000 - 1) / fat(9). So you just need to apply that recursively:
def fat(n: Int) = (2 to n).foldLeft(1)(_*_)
def permN(digits: String, n: Int): String = if (digits.isEmpty) "" else {
val permsPerDigit = fat(digits.length - 1)
val index = (n - 1) / permsPerDigit
val firstDigit = digits(index)
val remainder = digits filterNot (firstDigit ==)
firstDigit + permN(remainder, n - index * permsPerDigit)
}