Given a range f.e. :
range : 1 4
there are 3 lists of pairs that 'fill' this range w/o gaps. Call it a 'chain'
chain1: 1 2, 2 3, 3 4
chain2: 1 2, 2 4
chain3: 1 3, 3 4
the following lists of pairs fail to fill the range, so it is not a chain :
1 2 <= missing 2 3, 3 4 OR 2 4
1 3 <= missing 3 4
1 2, 3 4 <= missing 2 3
1 2, 2 3 <= missing 3 4
2 3, 3 4 <= missing 1 2
Now the question is given a random list of pairs and a range how can I find is there a combination of pairs that makes a CHAIN.
Pairs do not repeat, are always (lower-value, higher-value) and never overlap i.e. this is never an input [ 1 3, 2 4 ]
In addition I can pre-filter only the pairs that are within the range, so f.e. you dont have to worry about pairs like : 4 6, 7 9, 0 1 ...
update ... this is also valid input
chain: 1 2, 1 3, 3 4
chain: 1 2, 1 3, 1 4
chain: 1 2, 2 4, 3 4
i.e a pair can subsume another .... this breaks the sort-and-loop idea
implemented the Union-Find algo from here in python ..
q=QFUF(10)
In [131]: q.is_cycle([(0,1),(1,3),(2,3),(0,3)])
Out[131]: True
In [132]: q.is_cycle([0,1,4,3,1])
Out[132]: True
only cares about first two elements of the tuple
In [134]: q.is_cycle([(0,1,0.3),(1,3,0.7),(2,3,0.4),(0,3,0.2)])
Out[134]: True
here
import numpy as np
class QFUF:
""" Detect cycles using Union-Find algorithm """
def __init__(self, n):
self.n = n
self.reset()
def reset(self): self.ids = np.arange(self.n)
def find(self, a): return self.ids[a]
def union(self, a, b):
#assign cause can be updated in the for loop
aid = self.ids[a]
bid = self.ids[b]
if aid == bid : return
for x in range(self.n) :
if self.ids[x] == aid : self.ids[x] = bid
#given next ~link/pair check if it forms a cycle
def step(self, a, b):
# print(f'{a} => {b}')
if self.find(a) == self.find(b) : return True
self.union(a, b)
return False
def is_cycle(self, seq):
self.reset()
#if not seq of pairs, make them
if not isinstance(seq[0], tuple) :
seq = zip(seq[:-1], seq[1:])
for tpl in seq :
a,b = tpl[0], tpl[1]
if self.step(a, b) : return True
return False
Related
I have a text file which contains the following data
3 5
10 20 30 40 50
0 0 0 2 5
5 10 10 10 10
Question:
first line of the file will give us number of rows and number of columns of data
print the sum of column if every element of the column is not a prime otherwise print zero
Output:
0
30
40
0
0
Explanation:
(0 because column 10 0 5 has prime number 5)
(30 because column 20 0 10 has no prime number so print 20+0+10=30) like wise apply for all columns.
suggest us the method to access the dataframe in column wise manner
General idea : Just zip every value with an index, create a pairRDD then apply a reduceByKey (the key here is the index) verifying at each step the number is a prime number.
val rows = spark.sparkContext.parallelize(
Seq(
Array(10,20,30,40,50),
Array(0,0,0,2,5),
Array(5,10,10,10,10)
)
)
def isPrime(i: Int): Boolean = i>=2 && ! ((2 until i-1) exists (i % _ == 0))
val result = rows.flatMap{arr => arr.map(Option(_)).zipWithIndex.map(_.swap)}
.reduceByKey{
case (None, _) | (_, None) => None
case (Some(a),Some(b)) if isPrime(a) | isPrime(b) => None
case (Some(a),Some(b)) => Some(a+b)
}.map{case (k,v) => k -> v.getOrElse(0)}
result.foreach(println)
Output (you'll have to collect data in order to sort by column index) :
(3,0)
(0,0)
(4,0)
(2,40)
(1,30)
I am trying to understand aggregate in Scala and with one example, i understood the logic, but the result of second one i tried confused me.
Please let me know, where i went wrong.
Code:
val list1 = List("This", "is", "an", "example");
val b = list1.aggregate(1)(_ * _.length(), _ * _)
1 * "This".length = 4
1 * "is".length = 2
1 * "an".length = 2
1 * "example".length = 7
4 * 2 = 8 , 2 * 7 = 14
8 * 14 = 112
the output also came as 112.
but for the below,
val c = list1.aggregate(1)(_ * _.length(), _ + _)
I Thought it will be like this.
4, 2, 2, 7
4 + 2 = 6
2 + 7 = 9
6 + 9 = 15,
but the output still came as 112.
It is ideally doing whatever the operation i mentioned at seqop, here _ * _.length
Could you please explain or correct me where i went wrong.?
aggregate should be used to compute only associative and commutative operations. Let's look at the signature of the function :
def aggregate[B](z: ⇒ B)(seqop: (B, A) ⇒ B, combop: (B, B) ⇒ B): B
B can be seen as an accumulator (and will be your output). You give an initial output value, then the first function is how to add a value A to this accumulator and the second is how to merge 2 accumulators. Scala "chooses" a way to aggregate your collection but if your aggregation is not associative and commutative the output is not deterministic because the order matter. Look at this example :
val l = List(1, 2, 3, 4)
l.aggregate(0)(_ + _, _ * _)
If we create one accumulator and then aggregate all the values we get 1 + 2 + 3 + 4 = 10 but if we decide to parallelize the process by splitting the list in halves we could have (1 + 2) * (3 + 4) = 21.
So now what happens in reality is that for List aggregate is the same as foldLeft which explains why changing your second function didn't change the output. But where aggregate can be useful is in Spark for example or other distributed environments where it may be useful to do the folding on each partition independently and then combine the results with the second function.
So this question is related to question Transforming matrix format, scalding
But now, I want to make the back operation. So i can make it in a such way:
Tsv(in, ('row, 'col, 'v))
.read
.groupBy('row) { _.sortBy('col).mkString('v, "\t") }
.mapTo(('row, 'v) -> ('c)) { res : (Long, String) =>
val (row, v) = res
v }
.write(Tsv(out))
But, there, we got problem with zeros. As we know, scalding skips zero values fields. So for example we got matrix:
1 0 8
4 5 6
0 8 9
In scalding format is is:
1 1 1
1 3 8
2 1 4
2 2 5
2 3 6
3 2 8
3 3 9
Using my function I wrote above we can only get:
1 8
4 5 6
8 9
And that's incorrect. So, how can i deal with it? I see two possible variants:
To find way, to add zeros (actually, dunno how to insert data)
To write own operations on own matrix format (it is unpreferable, cause I'm interested in Scalding matrix operations, and dont want to write all of them my own)
Mb there r some methods, and I can avoid skipping zeros in matrix?
Scalding stores a sparse representation of the data. If you want to output a dense matrix (first of all, that won't scale, because the rows will be bigger than can fit in memory at some point), you will need to enumerate all the rows and columns:
// First, I highly suggest you use the TypedPipe api, as it is easier to get
// big jobs right generally
val mat = // has your matrix in 'row1, 'col1, 'val1
def zero: V = // the zero of your value type
val rows = IterableSource(0 to 1000, 'row)
val cols = IterableSource(0 to 2000, 'col)
rows.crossWithTiny(cols)
.leftJoinWithSmaller(('row, 'col) -> ('row1, 'col1), mat)
.map('val1 -> 'val1) { v: V =>
if(v == null) // this value should be 0 in your type:
zero
else
v
}
.groupBy('row) {
_.toList[(Int, V)](('col, 'val1) -> 'cols)
}
.map('cols -> 'cols) { cols: List[(Int, V)] =>
cols.sortBy(_._1).map(_._2).mkString("\t")
}
.write(TypedTsv[(Int, String)]("output"))
Is there a generic way using Breeze to achieve what you can do using broadcasting in NumPy?
Specifically, if I have an operator I'd like to apply to two 3x4 matrices, I can apply that operation element-wise. However, what I have is a 3x4 matrix and a 3-element column vector. I'd like a function which produces a 3x4 matrix created from applying the operator to each element of the matrix with the element from the vector for the corresponding row.
So for a division:
2 4 6 / 2 3 = 1 2 3
3 6 9 1 2 3
If this isn't available. I'd be willing to look at implementing it.
You can use mapPairs to achieve what I 'think' you're looking for:
val adder = DenseVector(1, 2, 3, 4)
val result = DenseMatrix.zeros[Int](3, 4).mapPairs({
case ((row, col), value) => {
value + adder(col)
}
})
println(result)
1 2 3 4
1 2 3 4
1 2 3 4
I'm sure you can adapt what you want from simple 'adder' above.
Breeze now supports broadcasting of this sort:
scala> val dm = DenseMatrix( (2, 4, 6), (3, 6, 9) )
dm: breeze.linalg.DenseMatrix[Int] =
2 4 6
3 6 9
scala> val dv = DenseVector(2,3)
dv: breeze.linalg.DenseVector[Int] = DenseVector(2, 3)
scala> dm(::, *) :/ dv
res4: breeze.linalg.DenseMatrix[Int] =
1 2 3
1 2 3
The * operator says which axis to broadcast along. Breeze doesn't allow implicit broadcasting, except for scalar types.
I'm doing a scala course and one of the examples given is the sumInts function which is defined like :
def sumInts(a: Int, b: Int) : Int =
if(a > b) 0
else a + sumInts(a + 1 , b)
I've tried to understand this function better by outputting some values as its being iterated upon :
class SumInts {
def sumInts(a: Int, b: Int) : Int =
if(a > b) 0 else
{
println(a + " + sumInts("+(a + 1)+" , "+b+")")
val res1 = sumInts(a + 1 , b)
val res2 = a
val res3 = res1 + res2
println("res1 is : "+res1+", res2 is "+res2+", res3 is "+res3)
res3
}
}
So the code :
object SumIntsMain {
def main(args: Array[String]) {
println(new SumInts().sumInts(3 , 6));
}
}
Returns the output :
3 + sumInts(4 , 6)
4 + sumInts(5 , 6)
5 + sumInts(6 , 6)
6 + sumInts(7 , 6)
res1 is : 0, res2 is 6, res3 is 6
res1 is : 6, res2 is 5, res3 is 11
res1 is : 11, res2 is 4, res3 is 15
res1 is : 15, res2 is 3, res3 is 18
18
Can someone explain how these values are computed. I've tried by outputting all of the created variables but still im confused.
manual-human-tracer on:
return sumInts(3, 6) | a = 3, b = 6
3 > 6 ? NO
return 3 + sumInts(3 + 1, 6) | a = 4, b = 6
4 > 6 ? NO
return 3 + (4 + sumInts(4 + 1, 6)) | a = 5, b = 6
5 > 6 ? NO
return 3 + (4 + (5 + sumInts(5 + 1, 6))) | a = 6, b = 6
6 > 6 ? NO
return 3 + (4 + (5 + (6 + sumInts(6 + 1, 6)))) | a = 7, b = 6
7 > 6 ? YEEEEES (return 0)
return 3 + (4 + (5 + (6 + 0))) = return 18.
manual-human-tracer off.
To understand what recursive code does, it's not necessary to analyze the recursion tree. In fact, I believe it's often just confusing.
Pretending it works
Let's think about what we're trying to do: We want to sum all integers starting at a until some integer b.
a + sumInts(a + 1 , b)
Let us just pretend that sumInts(a + 1, b) actually does what we want it to: Summing the integers from a + 1 to b. If we accept this as truth, it's quite clear that our function will handle the larger problem, from a to b correctly. Because clearly, all that is missing from the sum is the additional term a, which is simply added. We conclude that it must work correctly.
A foundation: The base case
However, this sumInts() must be built on something: The base case, where no recursion is involved.
if(a > b) 0
Looking closely at our recursive call, we can see that it makes certain assumptions: we expect a to be lower than b. This implies that the sum will look like this: a + (a + 1) + ... + (b - 1) + b. If a is bigger than b, this sum naturally evaluates to 0.
Making sure it works
Seeing that sumInts() always increases a by one in the recursive call guarantees, that we will in fact hit the base case at some point.
Noticing further, that sumInts(b, b) will be called eventually, we can now verify that the code works: Since b is not greater than itself, the second case will be invoked: b + sumInts(b + 1, b). From here, it is obvious that this will evaluate to: b + 0, which means our algorithm works correctly for all values.
You mentioned the substitution model, so let's apply it to your sumInts method:
We start by calling sumInts(3,4) (you've used 6 as the second argument, but I chose 4, so I can type less), so let's substitute 3 for a and 4 for b in the definition of sumInts. This gives us:
if(3 > 4) 0
else 3 + sumInts(3 + 1, 4)
So, what will the result of this be? Well, 3 > 4 is clearly false, so the end result will be equal to the else clause, i.e. 3 plus the result of sumInts(4, 4) (4 being the result of 3+1). Now we need to know what the result of sumInts(4, 4) will be. For that we can substitute again (this time substituting 4 for a and b):
if(4 > 4) 0
else 4 + sumInts(4 + 1, 4)
Okay, so the result of sumInts(4,4) will be 4 plus the result of sumInts(5,4). So what's sumInts(5,4)? To the substitutionator!
if(5 > 4) 0
else 5 + sumInts(5 + 1, 4)
This time the if condition is true, so the result of sumInts(5,4) is 0. So now we know that the result of sumInts(4,4) must be 4 + 0 which is 4. And thus the result of sumInts(3,4) must be 3 + 4, which is 7.