How to do a subtraction with BroadcastedRows in breeze - scala

I would do a subtraction between two dense vectors, both of them are the result of a dataset and a function. there is an example
1. The first dense vector is a row of dataset(*, 2)
BroadcastedRows(DenseVector(0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0))
2. The second dense vector is a return of predict_coef_sgd(dataset, coef)
DenseVector(0.2987569855650975, 0.14595105593031163, 0.08533326519733725,
0.21973731424800344, 0.24705900008926596, 0.9547021347460022,
0.8620341905282771, 0.9717729050420985, 0.9992954520878627,
0.9054893228110497)
I got an error when I subtract them
dataset(*, 2) - predict_coef_sgd(dataset, coef)
Name: Compile Error
Message: <console>:36: error: could not find implicit value for parameter op: breeze.linalg.operators.OpSub.Impl2[breeze.linalg.BroadcastedRows[breeze.linalg.DenseVector[Double],breeze.linalg.DenseVector[Double]],breeze.linalg.DenseVector[Double],That]
dataset(*, 2) - predict_coef_sgd(dataset, coef)
^
StackTrace:
Please comment how to convert "BroadcastedRows(DenseVector" to a dense vector. Thank you.

According to John's comment, referring row-broadcasting and transposed vectors? it solved, however, I don't understand why, please feel free to comment if you would explain that in details
(dataset.t(2,::) - predict_coef_sgd(dataset, coef).t).t
DenseVector(-0.2987569855650975, -0.14595105593031163, -0.08533326519733725,
-0.21973731424800344, -0.24705900008926596, 0.045297865253997815,
0.13796580947172288, 0.028227094957901544, 7.045479121372544E-4,
0.09451067718895034)

Related

Number of decimals in a number in Swift

In the code below, I declared a variable by adding up different values depending on if some Toggles are true or false. When I try and print this variable it returns with a number and a single decimal after but it then is followed by a lot of 0 after it. Is there a way to display no zeros after the number?
Disclaimer I am using Swift Playground
var rvalues = [6.5, 5.9, 5.3, 4.8, 4.2, 3.5, 3.1, 2.9, 1.75, 1.5, 1.2, 1.05, 0.92, 0.82, 0.75, 0.7]
var r0 : Double {
model.rvalues[(model.wearingMaskToggle ? 1:0) + (model.washingHandsToggle ? 2:0) + (model.quarantineToggle ? 8:0) + (model.coverMouthToggle ? 4:0)]
}
First, if you are curious where those 0 come from, read is floating point math broken? - a lot of the values in your array aren't precisely representable as Doubles.
To solve your display problem, use a NumberFormatter with minimumFractionDigits and maximumFractionDigits set appropriately.
Alternatively, use Decimal to represent your values.

scipy dblquad providing the wrong result in simple double integral

I am trying to calculate a straightforward doble definite integral in Python: function Max(0, (4-12x) + (6-12y)) in the square [0,1] x [0,1].
We can do it with Mathematica and get the exact result:
Integrate[Max[0, 4-12*u1 + 6-12*u2], {u1, 0, 1}, {u2, 0,1}] = 125/108.
With a simple Monte Carlo simulation I can confirm this result. However, using scipy.integrate.dblquad I am getting a value of 0.0005772072907971, with error 0.0000000000031299
from scipy.integrate import dblquad
def integ(u1, u2):
return max(0, (4 - 12*u1) + (6 - 12*u2))
sol_int, err = dblquad(integ, 0, 1, lambda _:0, lambda _:1, epsabs=1E-12, epsrel=1E-12)
print("dblquad: %0.16f. Error: %0.16f" % (sol_int, err) )
Agreed that the function is not derivable, but it is continuous, I see no reason for this particular integral to be problematic.
I thought maybe dblquad has an 'options' argument where I can try different numerical methods, but I found nothing like that.
So, what am I doing wrong?
try different numerical methods
That's what I would suggest, given the trouble that iterated quad has on Windows. After changing it to an explicit two-step process, you can replace one of quad with another method, romberg seems the best alternative to me.
from scipy.integrate import quad, romberg
def integ(u1, u2):
return max(0, (4 - 12*u1) + (6 - 12*u2))
sol_int = romberg(lambda u1: quad(lambda u2: integ(u1, u2), 0, 1)[0], 0, 1)
print("romberg-quad: %0.16f " % sol_int)
This prints 1.1574073959987758 on my computer, and hopefully you will get the same.

VectorAssembler behavior and aggregating sparse data with dense

May someone explain behavior of VectorAssembler?
from pyspark.ml.linalg import Vectors
from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(
inputCols=['CategoryID', 'CountryID', 'CityID', 'tf'],
outputCol="features")
output = assembler.transform(tf)
output.select("features").show(truncate=False)
the code via show method returns me
(262147,[0,1,2,57344,61006,80641,126469,142099,190228,219556,221426,231784],[2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])
when I use the same variable "output" with take I get different return
output.select('features').take(1)
[Row(features=SparseVector(262147, {0: 2.0, 1: 1.0, 2: 1.0, 57344: 1.0, 61006: 1.0, 80641: 1.0, 126469: 1.0, 142099: 1.0, 190228: 1.0, 219556: 1.0, 221426: 1.0, 231784: 1.0}))]
By the way, consider case, There is an sparse array output from "tfidf". I still have an additional data (metadata) available. I need somehow aggregate sparse arrays in Pyspark Dataframes with metadata for LSH algorithm. I've tried VectorAssembler as you can see but it also returns dense vector. Maybe there are any tricks to combine data and still have sparse data as output.
Only the format of the two returns is different; in both cases, you get actually the same sparse vector.
In the first case, you get a sparse vector with 3 elements: the dimension (262147), and two lists, containing the indices & values respectively of the nonzero elements. You can easily verify that the length of these lists is the same, as it should be:
len([0,1,2,57344,61006,80641,126469,142099,190228,219556,221426,231784])
# 12
len([2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0])
# 12
In the second case you get again a sparse vector with the same first element, but here the two lists are combined into a dictionary of the form {index: value}, which again has the same length with the lists of the previous representation:
len({0: 2.0, 1: 1.0, 2: 1.0, 57344: 1.0, 61006: 1.0, 80641: 1.0, 126469: 1.0, 142099: 1.0, 190228: 1.0, 219556: 1.0, 221426: 1.0, 231784: 1.0} )
# 12
Since assembler.transform() returns a Spark dataframe, the difference is due to the different formats returned by the Spark SQL functions show and take, respectively.
By the way, consider case [...]
It is not at all clear what exactly you are asking here, and in any case I suggest you open a new question on this with a reproducible example, since it sounds like a different subject...

Scala Breeze Dirichlet distribution parameter estimation

I am trying to estimate parameters (Dirichlet distribution) for a data set using Scala's breeze lib. I already have a working python (pandas/dataframes) and R code for it but I was curious as to how to do it in Scala. Also I am new to Scala.
I cant seem to get it to work. I guess syntactically I don't have things right or something.
The code I trying to use is here: https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/stats/distributions/Dirichlet.scala#L111
According to the code above: ExpFam[T,I] accepts two parameters T and I. I dont know what T and I are. Can T be a Dense Matrix ?
What I am doing is:
# Creating a matrix. The values are counts in my case.
val mat = DenseMatrix((1.0, 2.0, 3.0),(4.0, 5.0, 6.0))
# Then try to get sufficient stats and then MLE. I think this where I doing something wrong.
val diri = new ExpFam[DenseMatrix[Double],Int](mat)
println(diri.sufficientStatisticFor(mat))
Also if one has a data matrix like this DenseMatrix((1.0, 2.0, 3.0),(4.0, 5.0, 6.0)) how do estimate parameters (Dirichlet) in Scala.
I am not really very familiar with this aspect of breeze, but this works for me:
val data = Seq(
DenseVector(0.1, 0.1, 0.8),
DenseVector(0.2, 0.3, 0.5),
DenseVector(0.5, 0.1, 0.4),
DenseVector(0.3, 0.3, 0.4)
)
val expFam = new Dirichlet.ExpFam(DenseVector.zeros[Double](3))
val suffStat = data.foldLeft(expFam.emptySufficientStatistic){(a, x) =>
a + expFam.sufficientStatisticFor(x)
}
val alphaHat = expFam.mle(suffStat)
//DenseVector(2.9803000577558274, 2.325871404559782, 5.850530402841005)
The result is very close to but not exactly the same as what I get with my own code for maximum likelihood estimation of Dirichlets. The difference probably just comes down to differences in the optimizer being used (I'm using the fixed point iteration (9) in section 1 of this paper by T. Minka) and the stopping criteria.
Maybe there's a better way of doing this using the breeze api; if so, hopefully #dlwh or someone else more familiar with breeze will chime in.
T should be DenseVector and I should be Int. ExpFams aren't vectorized right now.

float format variable with integer value in string

I was facing with one minor issue, and I'm wondering: Why?
here we have some string:
[NSString stringWithFormat#"%.3f/%.3f/%.3f/%i", 1.0, 1.0, 1, 1];
in this case, result is 1.000/1.000/1/abra-kadabra like 34875689.
Why it's happens? Of course, I know, when we change the third value to 1.0, then everything will be okay.
So, please, explain me the deep proces of this operation.
Since you wrote %f, the compiler expects a float but recognizes an int. Writing 1 instead of 1.0 tells the compiler it's an integer constant instead of 1.0, which is a float constant.