scala return matrix of average pixels - scala

Here's the thing: I want to modify (and then return) a matrix of integers that is given in the parameters of the function. The funcion average (of the class MatrixMotionBlur) gives the average between the own pixel, upper, down and left pixels. Follows the following formula:
result(x, y) = (M1(x, y)+M1(x-1, y)+M1(x, y-1)+M1(x, y+1)) / 4
This is the code i've implemented so far
MatrixMotionBlur - Average function
MotionBlurSingleThread - run
The objetive here is to apply "average" method to alter the matrix value and return that matrix. The thing is the program gives me error when I to insert the value on the matrix.
Any ideas how to do this ?

The functional way
val updatedData = data.map{ outter =>
outter(i).map{ inner =>
mx.average(i.j)
}
}
Pay attention that Seq is immutable collection type and you can't just modify it, you can create new, modified collection only.
By the way, why you iterate starting 1, but not 0. Are you sure you want it?

Related

Random pivot selection for quicksort not working

I am trying to choose a random index for quicksort, but for some reason, the array is not sorting. In fact, the algorithm returns a different array (ex. input [2,1,4] and [1,1,4] is outputted) sometimes. Any help would be much appreciated. This algorithm works if, instead of choosing a random index, I always choose the first element of the array as the pivot.
def quicksort(array):
if len(array) < 2:
return array
else:
random_pivot_index = randint(0, len(array) - 1)
pivot = array[random_pivot_index]
less = [i for i in array[1:] if i =< pivot]
greater = [i for i in array[1:] if i > pivot]
return quicksort(less) + [pivot] + quicksort(greater)
less = [i for i in array[1:] if i =< pivot]
You're including elements equal to the pivot value in less here.
But here, you also include the pivot value explicitly in the result:
return quicksort(less) + [pivot] + quicksort(greater)
Instead try it with just:
return quicksort(less) + quicksort(greater)
Incidentally, though this does divide-and-conquer in the same way as QuickSort does, it's not really an implementation of that algorithm: Actual QuickSort sorts the elements in place - your version will suffer from the run-time overhead associated with allocating and concatenating the utility arrays.

Make absolute work inside filtering in Scala

I want to return a percentage of results from a dataset. Being a noob in Scala, tried the following
ds.filter(abs(hash(col("source"))) % 100 < percentage)
but getting abs cannot be applied to (org.apache.spark.sql.Column). I don't want to sample it, I want to return based on the hash of a column so that it's deterministic even when dataset changes.
This works just fine:
ds.filter(abs(hash(col("source"))) % 100 < percentage)
Probabely you have multiple abs in your namespace (e.g. from imports like import math._ etc. To be sure, use
ds.filter(org.apache.spark.sql.functions.abs(hash(col("source"))) % 100 < percentage)
But I think this will not garantee that you get the exact percentage, because hash values may not be equally distributed (think about a dataframe with only 1 unique value of source, hash values will all be the same.... you get either all records or none. To get the exact percentage, you would need something like :
val newDF = df
.withColumn("rnb",row_number().over(Window.orderBy($"source"))) // or order by hash if you wish
.withColumn("count",count("*").over())
.where($"rnb" < lit(fraction)*$"count")

SCALA: Function for Square root of BigInt

I searched internet for a function to find exact square root of BigInt using scala programming language. I didn't get one, But saw one Java Program and I converted that function into Scala version. It is working but I am not sure, whether it can handle very large BigInt. But it returns BigInt only. Not BigDecimal as Square Root. It shows there is some bit manipulation done in the code with some hard coding of numbers like shiftRight(5), BigInt("8") and shiftRight(1). I can understand the logic clearly, But not the hard coding of these bitshift numbers and the number 8. May be these bitshift functions are not available in scala, and thats why it is needed to convert to java BigInteger at few places. These hard coded numbers may impact the precision of the result.I just changed the java code into scala code just copying the exact algorithm. And here is the code I have written in scala:
def sqt(n:BigInt):BigInt = {
var a = BigInt(1)
var b = (n>>5)+BigInt(8)
while((b-a) >= 0) {
var mid:BigInt = (a+b)>>1
if(mid*mid-n> 0) b = mid-1
else a = mid+1
}
a-1
}
My Points are:
Can't we return a BigDecimal instead of BigInt? How can we do that?
How these hardcoded numbers shiftRight(5), shiftRight(1) and 8 are related
to precision of the result.
I tested for one number in scala REPL: The function sqt is giving exact square root of the squared number. but not for the actual number as below:
scala> sqt(BigInt("19928937494873929279191794189"))
res9: BigInt = 141169888768369
scala> res9*res9
res10: scala.math.BigInt = 19928937494873675935734920161
scala> sqt(res10)
res11: BigInt = 141169888768369
scala>
I understand shiftRight(5) means divide by 2^5 ie.by 32 in decimal and so on..but why 8 is added here after shift operation? why exactly 5 shifts? as a first guess?
Your question 1 and question 3 are actually the same question.
How [do] these bitshifts impact [the] precision of the result?
They don't.
How [are] these hardcoded numbers ... related to precision of the result?
They aren't.
There are many different methods/algorithms for estimating/calculating the square root of a number (as can be seen here). The algorithm you've posted appears to be a pretty straight forward binary search.
Pick a number a guaranteed to be smaller than the target (square root of n).
Pick a number b guaranteed to be larger than the target (square root of n).
Calculate mid, the whole number mid-point between a and b.
If mid is larger than (or equal to) the target then move b to mid (-1 because we know it's too large).
If mid is smaller than the target then move a to mid (+1 because we know it's too small).
Repeat 3,4,5 until a is no longer less than b.
Return a-1 as the square root of n rounded down to a whole number.
The bitshifts and hardcoded numbers are used in selecting the initial value of b. But b only has be greater than the target. We could have just done var b = n. Why all the bother?
It's all about efficiency. The closer b is to the target, the fewer iterations are needed to find the result. Why add 8 after the shift? Because 31>>5 is zero, which is not greater than the target. The author chose (n>>5)+8 but he/she might have chosen (n>>7)+12. There are trade-offs.
Can't we return a BigDecimal instead of BigInt? How can we do that?
Here's one way to do that.
def sqt(n:BigInt) :BigDecimal = {
val d = BigDecimal(n)
var a = BigDecimal(1.0)
var b = d
while(b-a >= 0) {
val mid = (a+b)/2
if (mid*mid-d > 0) b = mid-0.0001 //adjust down
else a = mid+0.0001 //adjust up
}
b
}
There are better algorithms for calculating floating-point square root values. In this case you get better precision by using smaller adjustment values but the efficiency gets much worse.
Can't we return a BigDecimal instead of BigInt? How can we do that?
This makes no sense if you want exact roots: if a BigInt's square root can be represented exactly by a BigDecimal, it can be represented by a BigInt. If you don't want exact roots, you'll need to specify precision and modify the algorithm (and for most cases, Double will be good enough and much much much faster than BigDecimal).
I understand shiftRight(5) means divide by 2^5 ie.by 32 in decimal and so on..but why 8 is added here after shift operation? why exactly 5 shifts? as a first guess?
These aren't the only options. The point is that for every positive n, n/32 + 8 >= sqrt(n) (where sqrt is the mathematical square root). This is easiest to show by a bit of calculus (or just by building a graph of the difference). So at the start we know a <= sqrt(n) <= b (unless n == 0 which can be checked separately), and you can verify this remains true on each step.

Find value in vector "p" that corresponds to maximum value in vector "r = f(p)"

As simple as in title. I have nx1 sized vector p. I'm interested in the maximum value of r = p/foo - floor(p/foo), with foo being a scalar, so I just call:
max_value = max(p/foo-floor(p/foo))
How can I get which value of p gave out max_value?
I thought about calling:
[max_value, max_index] = max(p/foo-floor(p/foo))
but soon I realised that max_index is pretty useless. I'm sorry asking this, real beginner here.
Having dropped the issue to pieces, I realized there's no unique corrispondence between values p and values in my related vector p/foo-floor(p/foo), so there's a logical issue rather than a language one.
However, given my input data, I know that the solution is unique. How can I fix this?
I ended up doing:
result = p(p/foo-floor(p/foo) == max(p/foo-floor(p/foo)))
Looks terrible, so if you know any other way...
Once you have the index, use it:
result = p(max_index)
You can create a new vector with your lets say "transformed" values:
p2 = (p/foo-floor(p/foo))
and then just use find to find the max values on p2:
max_index = find(p2 == max(p2))
that will return the index or indices of p2 with the max value of that operation, and finally just lookup the original value in p
p(max_index)
in 1 line, this is:
p(find((p/foo-floor(p/foo) == max((p/foo-floor(p/foo))))))
which is basically the same thing you did in the end :)

Why are products called minterms and sums called maxterms?

Do they have a reason for doing so? I mean, in the sum of minterms, you look for the terms with the output 1; I don't get why they call it "minterms." Why not maxterms because 1 is well bigger than 0?
Is there a reason behind this that I don't know? Or should I just accept it without asking why?
The convention for calling these terms "minterms" and "maxterms" does not correspond to 1 being greater than 0. I think the best way to answer is with an example:
Say that you have a circuit and it is described by X̄YZ̄ + XȲZ.
"This form is composed of two groups of three. Each group of three is a 'minterm'. What the expression minterm is intended to imply it that each of the groups of three in the expression takes on a value of 1 only for one of the eight possible combinations of X, Y and Z and their inverses." http://www.facstaff.bucknell.edu/mastascu/elessonshtml/Logic/Logic2.html
So what the "min" refers to is the fact that these terms are the "minimal" terms you need in order to build a certain function. If you would like more information, the example above is explained in more context in the link provided.
Edit: The "reason they used MIN for ANDs, and MAX for ORs" is that:
In Sum of Products (what you call ANDs) only one of the minterms must be true for the expression to be true.
In Product of Sums (what you call ORs) all the maxterms must be true for the expression to be true.
min(0,0) = 0
min(0,1) = 0
min(1,0) = 0
min(1,1) = 1
So minimum is pretty much like logical AND.
max(0,0) = 0
max(0,1) = 1
max(1,0) = 1
max(1,1) = 1
So maximum is pretty much like logical OR.
In Sum Of Products (SOP), each term of the SOP expression is called a "minterm" because,
say, an SOP expression is given as:
F(X,Y,Z) = X'.Y'.Z + X.Y'.Z' + X.Y'.Z + X.Y.Z
for this SOP expression to be "1" or true (being a positive logic),
ANY of the term of the expression should be 1.
thus the word "minterm".
i.e, any of the term (X'Y'Z) , (XY'Z') , (XY'Z) or (XYZ) being 1, results in F(X,Y,Z) to be 1!!
Thus they are called "minterms".
On the other hand,
In Product Of Sum (POS), each term of the POS expression is called a "maxterm" because,
say an POS expression is given as: F(X,Y,Z) = (X+Y+Z).(X+Y'+Z).(X+Y'+Z').(X'+Y'+Z)
for this POS expression to be "0" (because POS is considered as a negative logic and we consider 0 terms), ALL of the terms of the expression should be 0. thus the word "max term"!!
i.e for F(X,Y,Z) to be 0,
each of the terms (X+Y+Z), (X+Y'+Z), (X+Y'+Z') and (X'+Y'+Z) should be equal to "0", otherwise F won't be zero!!
Thus each of the terms in POS expression is called a MAXTERM (maximum all the terms!) because all terms should be zero for F to
be zero, whereas any of the terms in POS being one results in F to be
one. Thus it is known as MINTERM (minimum one term!)
I believe that AB is called a minterm is because it occupies the minimum area on a Venn diagram; while A+B is called a MAXTERM because it occupies a maximum area in a Venn diagram. Draw the two diagrams and the meanings will become obvious
Ed Brumgnach
Here is another way to think about it.
A product is called a minterm because it has minimum-satisfiability where as a sum is called a maxterm because it has maximum-satisfiability among all practically interesting boolean functions.
They are called terms because they are used as the building-blocks of various canonical representations of arbitrary boolean functions.
Details:
Note that '0' and '1' are the trivial boolean functions.
Assume a set of boolean variables x1,x2,...,xk and a non-trivial boolean function f(x1,x2,...,xk).
Conventionally, an input is said to satisfy the boolean function f, whenever f holds a value of 1 for that input.
Note that there are exactly 2^k inputs possible, and any non-trivial boolean-function can satisfy a minimum of 1 input to a maximum of 2^k -1 inputs.
Now consider the two simple boolean functions of interest: sum of all variables S, and product of all variables P (variables may/may-not appear as complements). S is one boolean function that has maximum-satisfiability hence called as maxterm, where as P is the one having minimum-satisfiability hence called a minterm.