How to calculate the Aitchison distance in python - scipy

I have compositional data for which I would like to calculate the similarity. As input I have two 1D arrays such as:
data1=[0.03510584946878486, 0.09929433687476773, 0.049647168437383864, 0.03510584946878486, 0.01755292473439243, 0.07021169893756972, 0.07021169893756972, 0.024823584218691932, 0.03510584946878486, 0.01755292473439243, 0.024823584218691932, 0.2808467957502789, 0.03510584946878486, 0.01755292473439243, 0.03510584946878486, 0.049647168437383864, 0.03510584946878486, 0.01755292473439243, 0.024823584218691932, 0.024823584218691932]
data2=[0.036891382048211505, 0.29513105638569204, 0.05217229282726804, 0.02608614641363402, 0.018445691024105752, 0.07378276409642301, 0.05217229282726804, 0.02608614641363402, 0.05217229282726804, 0.018445691024105752, 0.02608614641363402, 0.10434458565453608, 0.036891382048211505, 0.018445691024105752, 0.02608614641363402, 0.036891382048211505, 0.036891382048211505, 0.018445691024105752, 0.02608614641363402, 0.018445691024105752]
These can be thought of as two different conditions, each with its own probability-per-category. From reading around it seems like Aitchison distance is most appropriate for calculating how similar these are, although I might be wrong. It doesn't seem like Scipy has anything implemented for it. Does anyone know how to do it or have thoughts on how to best compare compositional data?

Related

Loss functions in MATLAB

I want to know how to interpret the loss functions results in MATLAB?
On other words, for example if I got 0.3247 as a results of kfoldLoss() function, is this mean that it is 32.47% error or it is a 0.3247%, or how correctly can I define/interpret this obtained result?
Thank you very much in advance
It means that the mean of the errors across your k folds was 32.47%.

Matlab GPU use with functions that take arguments of different dimensions

I am trying to use parallel computing with GPU in Matlab, and I would like to apply a function to a large array (to avoid the use of a for loop, which is quite slow). I have read Matlab's documentation, and I can use arrayfun, but only if I want to do elementwise operations. Maybe I am confused, but I would appreciate if someone can help me to use it. As an example of what I want to do, imagine that I would like to perform the following operation,
$X_t = B Z_t + Q\varepsilon_t$
where $X_t$ is 2x1, $B$ is 2x5, and $Z_t$ is 5x1, with $Q$ 2x2. I define a function,
function X = propose(Z,B,Q)
X=Z*B+Q*rand(2,1);
end
Now, suppose that I have an array $Z_p$ which is 5x1000. To each of the 1000 columns I would like to apply the previous function, for given matrices $B$ and $Q$, to get an array $X_p$ which is 2x1000.
Given the documentation for arrayfun I can not do this,
X=arrayfun(#propose,Zp,B,Q)
So, is there any possibility to do it?
Thanks!
PS: Yes, I know that in this simple example I can just do the multiplication without a for loop, but the application I have in mind is more complicated and I cannot do it. I just put this example as an illustration.

Matlab FindPeaks is oversensitive

I'm using Matlab's findpeaks function for finding local maxima's in a 1d array. My aim is to count the number of maximas, and that's where I encounter problems.
findpeaks() is just too sensitive. For instance, try this
v=[3.6107,3.6109, 3.6110,3.6110, 3.6108, 3.6107,3.6105, 3.6105, 3.6105,3.6106,3.6108,3.6109,3.6109, 3.6108,3.6105,3.6100,3.6094,3.6087,3.6080, 3.6073, 3.6067, 3.6062,3.6058,3.6053,3.6048,3.6041,3.6032,3.6021,3.6008,3.5993,3.5977, 3.5960,3.5942,3.5925,3.5907,3.5889,3.5869,3.5846,3.5820,3.5789,3.5753];
[maxvals, maxind] = findpeaks(v)
And you'll get a number of maximas, while this is obviously just a numerical artifact, and not the actual number of maximas.
How would you suggest to relax the parameters so I'll get a better result?
In Matlab 2014 there is a MinPeakProminence parameter that should solve the issue, but it doesn't seem to work in 2013a. Any ideas?

Calculating the expected value of a transformed random variable in MATLAB?

I am trying to compute the following expected value for Z being lognormally distributed
E[Z^eta w(F_Z (Z))^-eta]
where eta is a real number, F_Z the distribution function of Z and w:[0,1]->[0,1] an increasing function.
First of all, I am pretty new to Matlab so I don't know which way of integrating is the better one, numerically or symbolically. I tried symbolically.
My idea was to subsequently define functions:
syms x;
g_1(x) = x^eta;
g_2(x) = logncdf(x);
g_2(x) = w(x)^-eta;
g_4(x) = g_1(x) * g_3(g_2(x));
And then
exp = int(g_4(x),x,0,inf)
Unfortunately this doesn't work and MATLAB just posts the whole expression of g_4...
Is it better to use the numerical integration quadqk? What am I doing wrong here? I already read something about MATLAB not being the best program for integration but I have to use it so switching to a different program does not help.
Thanks a lot!

matlab zplane function: handles of vectors

I'm interested in understanding the variety of zeroes that a given function produces with the ultimate goal of identifying the what frequencies are passed in high/low pass filters. My idea is that finding the lowest value zero of a filter will identify the passband for a LPF specifically. I'm attempting to use the [hz,hp,ht] = zplane(z,p) function to do so.
The description for that function reads "returns vectors of handles to the zero lines, hz". Could someone help me with what a vector of a handle is and what I do with one to be able to find the various zeros?
For example, a simple 5-point running average filter:
runavh = (1/5) * ones(1,5);
using zplane(runavh) gives an acceptable pole/zero plot, but running the [hz,hp,ht] = zplane(z,p) function results in hz=175.1075. I don't know what this number represents and how to use it.
Many thanks.
Using the get command, you can find out things about the data.
For example, type G=get(hz) to get a list of properties of the zero lines. Then the XData is given by G.XData, i.e. X=G.XData.
Alternatively, you can only pull out the data you want
X=get(hz,'XData')
Hope that helps.