Double Hashing - HashValues outside of HashTable Range - hash

I have here a homework about double hashing and I stack on one point:
I have the Array: 17, 6, 5, 8, 11, 28, 14, 15
h1(k) = k mod 11,
h2(k) = 1 + (k mod 9),
Size of hash table = 11
The double Hash Function from this: dh(k) = k mod 11 + (j + (k mod 9).
Now I calculate the hashvalues:
h(17) = k mod 11 = 6 - OK
h( 6) = 6 = collision => 6 + (1 + (6 mod 9) = 12 = NOK
=> this is outside of the range of my Indices, and with every higher Index number it also will be higher. If I change the addition of the second HashFuncion into a subtraction, then the HashValues will get into negatives - what also is not good.
What am I doing wrong?
Thanks
Zuzana

I think you're misinterpreting how to compute the index for a double hash. The index should be
(h1(k) + j · h2(k)) mod TableSize
So the formula you should use with those two hash functions would be
((k mod 11) + j · (1 + (k mod 9))) mod 11

Related

Scala return prime numbers from Array

I'm quite new to Scala so apologies for the very basic question.
I have this great on liner that checks if a number is a prime. What I'm trying to do with it is allowing the function to take in an Array and spit out the out the prime numbers.
How can I best achieve this? Is it possible to do so in a one liner as well? Thanks!
def isPrime(num: Int): Boolean = (2 to num) forall (x => num % x != 0)
I'm trying to do with it is allowing the function to take in an Array and spit out the out the prime numbers
You can do the following
def primeNumbs(numbers: Array[Int]) = numbers.filter(x => !((2 until x-1) exists (x % _ == 0)) && x > 1)
and if you pass in array of numbers as
println(primeNumbs(Array(1,2,3,6,7,10,11)).toList)
You should be getting
List(2, 3, 7, 11)
I hope the answer is helpful
Note: your isPrime function doesn't work
You can use this method
def isPrime(num : Int) : Boolean = {
((1 to num).filter(e => (num % e == 0)).size) == 2
}
isPrime: (num: Int)Boolean
scala> (1 to 100) filter(isPrime(_)) foreach(e=> print(e+" "))
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
Your isPrime seems completely broken.
Even if you replace to by until, it will still return strange results for 0 and 1.
Here is a very simple implementation that returns the correct results for 0 and 1, and checks only divisors smaller than (approximately) sqrt(n) instead of n:
def isPrime(n: Int) =
n == 2 ||
(n > 1 && (2 to (math.sqrt(n).toInt + 1)).forall(n % _ > 0))
Now you can filter primes from a range (or a list):
(0 to 10000).filter(isPrime).foreach(println)
You could also write it like this:
0 to 10000 filter isPrime foreach println
But this version with explicit lambdas probably generalizes better, even though it's not necessary in this particular case:
(0 to 10000).filter(n => isPrime(n)).foreach(n => println(n))
In understand that the prime function may be the objective of your assignment/task/interest, but note that's already available in the JVM as BigInteger.isProbablePrime(). With that, and the fact that Scala can call Java transparently, try the following filter:
import java.math.BigInteger
val r = (1 to 100).filter { BigInteger.valueOf(_).isProbablePrime(25) }.mkString(", ")
// "2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97"
This works by iterating the range of numbers (or your Array, or any TraversableOnce, same syntax) and letting pass only those numbers "_" in the closure that fulfill the condition, i.e. that are prime. And instead of folding with a string concatenation, there's a convenient helper mkString that inserts a separator into a sequence and produces a String for you.
And don't worry about the "probable" prime here. For such small numbers like here, there's no probability involved, despite the method name. That kicks in for numbers with maybe 30+ digits or so.

Percentile calculator

I have been trying to create a small method to calculate given percentile from a seq. It works.. almost. Problem is I don't know why is doesn't work. I was hoping one of your 'a bit smarter' people than me could help me with it.
What I hope the result would be is that it would return the item from the seq that n prosent of the seq is smaller than equal than returned value.
def percentile[Int](p: Int)(seq: Seq[Int]) = {
require(0 <= p && p <= 100) // some value requirements
require(!seq.isEmpty) // more value requirements
val sorted = seq.sorted
val k = math.ceil((seq.length - 1) * (p / 100)).toInt
return sorted(k)
}
So for example if I have
val v = Vector(7, 34, 39, 18, 16, 17, 21, 36, 17, 2, 4, 39, 4, 19, 2, 12, 35, 13, 40, 37)
and I call my function percentile(11)(v) return value is 2. However, 10% of the vector are smaller or equal than 2, not 11% like I am calling. percentile(11)(v) should return 4.
your error is in this row:
val k = math.ceil((seq.length - 1) * (p / 100)).toInt
and particularly here: p / 100. Being p an Int <= 100 and >= 0, p/100 will always be equal to 0 or 1 (if p == 100). If you want a floating point result, you have to widen one of the two values to double: p/100.0
val k = math.ceil((seq.length - 1) * (p / 100.0)).toInt
On a side note: you don't need the [Int] type parameter
The problem is with the part p / 100 in
val k = math.ceil((seq.length - 1) * (p / 100)).toInt
Since p is of type Int and 100 is also an Int, the division is an integer division that returns an Int. If either p or 100 is a Double, the result will be a Double.
The easiest fix would be to change that part in p / 100.0.

Spark Scala: How to work with each 3 elements of rdd?

everyone.
I have such problem:
I have very big rdd: billions elements like:
Array[((Int, Int), Double)] = Array(((0,0),729.0), ((0,1),169.0), ((0,2),1.0), ((0,3),5.0), ...... ((34,45),34.0), .....)
I need to do such operation:
take value of each element by key (i,j) and add to it the
min(rdd_value[(i-1, j)],rdd_value[(i, j-1)], rdd_value[(i-1, j-1)])
How can I do this without using collect() as After collect() I have got Java memory errror as my rdd is very big.
Thank you very much!
I try to realize this algorithm from python. when time series are rdds.
def DTWDistance(s1, s2):
DTW={}
for i in range(len(s1)):
DTW[(i, -1)] = float('inf')
for i in range(len(s2)):
DTW[(-1, i)] = float('inf')
DTW[(-1, -1)] = 0
for i in range(len(s1)):
for j in range(len(s2)):
dist= (s1[i]-s2[j])**2
DTW[(i, j)] = dist + min(DTW[(i-1, j)],DTW[(i, j-1)], DTW[(i-1, j-1)])
return sqrt(DTW[len(s1)-1, len(s2)-1])
And now I should perform last operation with for loop. The dist is already calculated.
Example:
Input (like matrix):
4 5 1
7 2 3
9 0 1
Rdd looks like
rdd.take(10)
Array(((1,1), 4), ((1,2), 5), ((1,3), 1), ((2,1), 7), ((2,2), 2), ((2,3), 3), ((3,1), 9), ((3,2), 0), ((3,3), 1))
I want to do this operation
rdd_value[(i, j)] = rdd_value[(i, j)] + min(rdd_value[(i-1, j)],rdd_value[(i, j-1)], rdd_value[(i-1, j-1)])
For example:
((1, 1), 4) = 4 + min(infinity, infinity, 0) = 4 + 0 = 4
4 5 1
7 2 3
9 0 1
Then
((1, 2), 5) = 5 + min(infinity, 4, infinity) = 5 + 4 = 9
4 9 1
7 2 3
9 0 1
Then
....
Then
((2, 2), 2) = 2 + min(7, 9, 4) = 2 + 4 = 6
4 9 1
7 6 3
9 0 1
Then
.....
((3, 3), 1) = 1 + min(3, 0, 2) = 1 + 0 = 1
A short answer is that the problem you try to solve cannot be efficiently and concisely expressed using Spark. It doesn't really matter if you choose plain RDDs are distributed matrices.
To understand why you'll have to think about the Spark programming model. A fundamental Spark concept is a graph of dependencies where each RDD depends on one or more parent RDDs. If your problem was defined as follows:
given an initial matrix M0
for i <- 1..n
find matrix Mi where Mi(m,n) = Mi - 1(m,n) + min(Mi - 1(m-1,n), Mi - 1(m-1,n-1), Mi - 1(m,n-1))
then it would be trivial to express using Spark API (pseudocode):
rdd
.flatMap(lambda ((i, j), v):
[((i + 1, j), v), ((i, j + 1), v), ((i + 1, j + 1), v)])
.reduceByKey(min)
.union(rdd)
.reduceByKey(add)
Unfortunately you are trying to express dependencies between individual values in the same data structure. Spark aside it a problem which is much harder to parallelize not to mention distribute.
This type of dynamic programming is hard to parallelize because at different points is completely or almost completely sequential. When you try to compute for example Mi(0,0) or Mi(m,n) there is nothing to parallelize. It is hard to distribute because it can generate complex dependencies between blocks.
There are non trivial ways to handle this in Spark by computing individual blocks and expressing dependencies between these blocks or using iterative algorithms and propagating messages over the explicit graph (GraphX) but this is far from easy to do it right.
At the end of the day there tools which can be much better choice for this type of computations than Spark.

Complexity estimation for simple recursive algorithm

I wrote a code on Scala. And now I want to estimate time and memory complexity.
Problem statement
Given a positive integer n, find the least number of perfect square numbers (for example, 1, 4, 9, 16, ...) which sum to n.
For example, given n = 12, return 3 because 12 = 4 + 4 + 4; given n = 13, return 2 because 13 = 4 + 9.
My code
def numSquares(n: Int): Int = {
import java.lang.Math._
def traverse(n: Int, ns: Int): Int = {
val max = ((num: Int) => {
val sq = sqrt(num)
// a perfect square!
if (sq == floor(sq))
num.toInt
else
sq.toInt * sq.toInt
})(n)
if (n == max)
ns + 1
else
traverse(n - max, ns + 1)
}
traverse(n, 0)
}
I use here a recursion solution. So IMHO time complexity is O(n), because I need to traverse over the sequence of numbers using recursion. Am I right? Have I missed anything?

Understanding the scala substitution model through the use of sumInts method

I'm doing a scala course and one of the examples given is the sumInts function which is defined like :
def sumInts(a: Int, b: Int) : Int =
if(a > b) 0
else a + sumInts(a + 1 , b)
I've tried to understand this function better by outputting some values as its being iterated upon :
class SumInts {
def sumInts(a: Int, b: Int) : Int =
if(a > b) 0 else
{
println(a + " + sumInts("+(a + 1)+" , "+b+")")
val res1 = sumInts(a + 1 , b)
val res2 = a
val res3 = res1 + res2
println("res1 is : "+res1+", res2 is "+res2+", res3 is "+res3)
res3
}
}
So the code :
object SumIntsMain {
def main(args: Array[String]) {
println(new SumInts().sumInts(3 , 6));
}
}
Returns the output :
3 + sumInts(4 , 6)
4 + sumInts(5 , 6)
5 + sumInts(6 , 6)
6 + sumInts(7 , 6)
res1 is : 0, res2 is 6, res3 is 6
res1 is : 6, res2 is 5, res3 is 11
res1 is : 11, res2 is 4, res3 is 15
res1 is : 15, res2 is 3, res3 is 18
18
Can someone explain how these values are computed. I've tried by outputting all of the created variables but still im confused.
manual-human-tracer on:
return sumInts(3, 6) | a = 3, b = 6
3 > 6 ? NO
return 3 + sumInts(3 + 1, 6) | a = 4, b = 6
4 > 6 ? NO
return 3 + (4 + sumInts(4 + 1, 6)) | a = 5, b = 6
5 > 6 ? NO
return 3 + (4 + (5 + sumInts(5 + 1, 6))) | a = 6, b = 6
6 > 6 ? NO
return 3 + (4 + (5 + (6 + sumInts(6 + 1, 6)))) | a = 7, b = 6
7 > 6 ? YEEEEES (return 0)
return 3 + (4 + (5 + (6 + 0))) = return 18.
manual-human-tracer off.
To understand what recursive code does, it's not necessary to analyze the recursion tree. In fact, I believe it's often just confusing.
Pretending it works
Let's think about what we're trying to do: We want to sum all integers starting at a until some integer b.
a + sumInts(a + 1 , b)
Let us just pretend that sumInts(a + 1, b) actually does what we want it to: Summing the integers from a + 1 to b. If we accept this as truth, it's quite clear that our function will handle the larger problem, from a to b correctly. Because clearly, all that is missing from the sum is the additional term a, which is simply added. We conclude that it must work correctly.
A foundation: The base case
However, this sumInts() must be built on something: The base case, where no recursion is involved.
if(a > b) 0
Looking closely at our recursive call, we can see that it makes certain assumptions: we expect a to be lower than b. This implies that the sum will look like this: a + (a + 1) + ... + (b - 1) + b. If a is bigger than b, this sum naturally evaluates to 0.
Making sure it works
Seeing that sumInts() always increases a by one in the recursive call guarantees, that we will in fact hit the base case at some point.
Noticing further, that sumInts(b, b) will be called eventually, we can now verify that the code works: Since b is not greater than itself, the second case will be invoked: b + sumInts(b + 1, b). From here, it is obvious that this will evaluate to: b + 0, which means our algorithm works correctly for all values.
You mentioned the substitution model, so let's apply it to your sumInts method:
We start by calling sumInts(3,4) (you've used 6 as the second argument, but I chose 4, so I can type less), so let's substitute 3 for a and 4 for b in the definition of sumInts. This gives us:
if(3 > 4) 0
else 3 + sumInts(3 + 1, 4)
So, what will the result of this be? Well, 3 > 4 is clearly false, so the end result will be equal to the else clause, i.e. 3 plus the result of sumInts(4, 4) (4 being the result of 3+1). Now we need to know what the result of sumInts(4, 4) will be. For that we can substitute again (this time substituting 4 for a and b):
if(4 > 4) 0
else 4 + sumInts(4 + 1, 4)
Okay, so the result of sumInts(4,4) will be 4 plus the result of sumInts(5,4). So what's sumInts(5,4)? To the substitutionator!
if(5 > 4) 0
else 5 + sumInts(5 + 1, 4)
This time the if condition is true, so the result of sumInts(5,4) is 0. So now we know that the result of sumInts(4,4) must be 4 + 0 which is 4. And thus the result of sumInts(3,4) must be 3 + 4, which is 7.