How to compute the AVERAGE() with a step of 10 cells in libreoffice? - libreoffice

I want that A1 computes the AVERAGE(C1:C10), A2 stores the AVERAGE(C11:C20) and so on...
At the same time, I want that B1 stores D1, while B2 stores D11 and so on...
How can I do both these formula? I wish I could say "I tried this..." but I'm a noob with excel and I didn't find nothing even by googling.
I tried the proposed answer but I get weird results:
In the image above I would like to compute the AVG of Q2:Q11

In A1 enter:
=AVERAGE(OFFSET(C$1,10*(ROW(1:1)-1),0,10))
and copy down
See:
A.S.H.'s Answer
EDIT#1:
To average Q2 through Q11, use:
=AVERAGE(OFFSET(Q$2,10*(ROW(1:1)-1),0,10))
and copy down

Use INDEX() to set the ranges. In A1:
=AVERAGE(INDEX(C:C,(ROW(1:1)-1)*10 + 1):INDEX(C:C,(ROW(1:1)-1)*10 + 10))
So B1 would be:
=INDEX(D:D,(ROW(1:1)-1)*10 + 1)
Copy/Drag both formulas down.
Advantages to this over OFFSET, OFFSET is Volatile and will recalculate every time that Excel recalculates whether the data changed or not. Where INDEX will only recalc if the data to which it refers changed or not. If used to many times it will make a difference in the calc times.
Advantage to OFFSET, it is shorter and if used in moderation will not have a noticeable effect on the calc time, so it will save some wear and tear on the fingers.

Related

How to avoid CIRCULAR REFERENCE in TABLEAU using LOOKUP and PREVIOUS_VALUE?

Hello!
In Excel I have 2 columns C and D with formulas in there for a specific purpose.
As an Example I have here cells C12 and D12 in these 2 columns to show he formulas.
C12 = 0.001855 * B12/E12 + 0.998145 * (C11+D11)
D12 = 0.981119 * (C12-C11) + 0.018881 * D11
Let's say the C-column variable is "Running Base" and the D-column variable is the "Growth" and the rows are months. And say I want to copy these formula's to a Tableau worksheet with months in the rows.
You see that C12 is using both it's own previous value C11 (the lag -1 of C) and the lag -1 of D (D11). I can find C11 in the formula in TABLEAU using the PREVIOUS_VALUE function and the previous value of D with the LOOKUP([D],-1) function (the B12 and E12 are not important for the discussion).
Then D12 is also using it's own previous value D11 and both C12 and its previous value C11. Of course we can do similar TABLEAU exercises here, but you already feel a CIRCULAR REFERENCE error coming up ;-).
So, there is no actual CIRCULAR REFERENCE and it's working in Excel. But I do understand why TABLEAU is giving one and I am sure there must be a work-round to this.
Can anybody help please???
Thx very much in advance!!
Herman Mentink
One way to solve this is to drop the calculation in Details part of the Marks and then reference the rows of that field in calculation which doesn't create a circular reference else you are correct tableau will create a circular reference error as behaviour is different from Excel.
Edit----------------------------------------------------------------
Not just a tool tip infact you can use those in calculated fields aswell, Few months back I have implemented same in my report.
A calculation creates 4 rows in my report and I need to do (1st row+4th row) in the same column, so I dropped the calculation in Detail on the Marks and referred same another calculated field. Just check the below example code:
LOOKUP(ATTR([Values]),FIRST()+4) + LOOKUP(ATTR([Values]),FIRST()+1)
Values is the calculated field now above caluction is referring to Values and picking rows 2 and 5 which I can't use directly on the Values column if I drop Values on Text in Marks
In the above image values in detail is original calcualtion and the one I used in text is the referring to values and displaying.

How to calculate within a factor in Tableau

Apologies if this question is trivially easy, I'm still learning Tableau.
I have data where the variables Set and Subset are arranged by week (W1 to W52) and by Source (A or B). So if I put Week into Rows and create the calculated fields
SUM(Set)
SUM(Subset)
Rate = {INCLUDE Source: SUM(Subset) / SUM(Set)}
I get data that look like this:
Week SUM(Set) SUM(Subset) Rate
A B A B A B
W1 1234 123 567 56 45.95% 45.53%
So far, so good. But what I really want is the percentage difference between Rate(A) and Rate(B) by week:
Diff = (Rate.A - Rate.B) / Rate.B
I could do this in a second if I were using Excel or R, but I can't seem to figure out how Tableau does it. Help?
There's a built in table calculation "Percent Difference" , you can deploy it using compute using Table across and relative to previous. For that you need to have continuous measures.
Something like this will be the calculation-:
(ZN(SUM([Quantity])) - LOOKUP(ZN(SUM([Quantity])), -1)) / ABS(LOOKUP(ZN(SUM([Quantity])), -1))
Create two different for "Set" & "Subset"

Averaging 4 identical arrays cell to cell in Matlab

I have four arrays of the same dimensions: (10,000, 1). My goal is to average the four arrays into one that has the same dimensions. How can one achieve it the most efficient way?
Edit: Over a year later, the silliness of this question is very apparent. Thank you for those who helped pry off the training wheels. In the immortal words 'Read the documentation!' I trust
You can simply add each array and divide by 4.
Example : arrays A1 A2 A3 A4
Average = (A1 + A2 + A3 + A4)/4
Since the arrays are identical in size, MATLAB will automatically add corresponding elements in the array and provide the accurate result.
Is there another problem you are having with this ?

Trying to find the point when two functions differ from each other a 5% with Matlab

As I've written in the title, I'm trying to find the exact distance (dimensionless distance in this case) when two functions start to differ from each other a 5% of the Y-axis. The two functions intersect at the value of 1 in the X-axis and I need to find the described distance before the intersection, not after (i.e., it must be less than 1). I've written a Matlab code for you to see the shape of the functions and the following calculations which I'm trying to make them work but they don't, I don't know why. "Explicit solution could not be found".
I don't know if I explained it clearly. Please let me know if you need a more detailed explanation.
I hope you can throw some light in this issue.
Thank you so much in advanced.
r=0:0.001:1.2;
ro=0.335;
rt=r./ro;
De=0.3534;
k=2.8552;
B=(2*k/De)^0.5;
Fm=2.*De.*B.*ro.*[1-exp(B.*ro.*(1-rt))].*exp(B.*ro.*(1-rt));
A=5;
b=2.2347;
C=167.4692;
Ftt=(C.*(exp(-b.*rt).*((b.^6.*rt.^5)./120 + (b.^5.*rt.^4)./24 + (b.^4.*rt.^3)./6 + (b.^3.*rt.^2)./2 + b.^2.*rt + b) - b.*exp(-b.*rt).*((b.^6.*rt.^6)./720 + (b.^5.*rt.^5)./120 + (b.^4.*rt.^4)./24 + (b.^3.*rt.^3)./6 + (b.^2.*rt.^2)./2 + b.*rt + 1)))./rt.^6 - (6.*C.*(exp(-b.*rt).*((b.^6.*rt.^6)./720 + (b.^5.*rt.^5)./120 + (b.^4.*rt.^4)./24 + (b.^3.*rt.^3)./6 + (b.^2.*rt.^2)./2 + b.*rt + 1) - 1))./rt.^7 - A.*b.*exp(-b.*rt);
plot(rt,-Fm,'red')
axis([0 2 -1 3])
xlabel('Dimensionless distance')
ylabel('Force, -dU/dr')
hold on
plot(rt,-Ftt,'green')
clear rt
syms rt
%assume(0<rt<1)
r1=solve((Fm-Ftt)/Ftt==0.05,rt)
r2=solve((Ftt-Fm)/Fm==0.05,rt)
Welcome to the crux of floating point data. The reason why is because for the values of r that you are providing, the exact solution of 0.05 may be in between two of the values in your r array and so you won't be able to get an exact solution. Also, FWIW, your equation may never generate a solution of 0.05, which is why you're getting that error too. Either way, doing that explicit solve on floating point data is never recommended, unless you know very well how your data are shaped and what values you expect for the output of the function you're applying the data to.
As such, it's always recommended that you find the nearest value that satisfies your condition. As such, you should do something like this:
[~,ind] = min(abs((Fm-Ftt)./Ftt - 0.05));
r1 = r(ind);
The first line will find the nearest location in your r array that satisfies the 5% criterion. The next line of code will then give you the value that is in your r array that satisfies this. You can do the same with r2 by:
[~,ind2] = min(abs((Ftt-Fm)./Fm - 0.05));
r2 = r(ind2);
What the above code is basically doing is that it is trying to find at what point in your array would the difference between your data and 5% be 0. In other words, which point in your r array would be close enough to make the above relation equal to 0, or essentially when it is as close to 5% as possible.
If you want to improve this, you can always change the step size of r... perhaps make it 0.00001 or something. However, the smaller the step size, the larger your array and you'll eventually run out of memory!

Random numbers that add to 1 with a minimum increment: Matlab

Having read carefully the previous question
Random numbers that add to 100: Matlab
I am struggling to solve a similar but slightly more complex problem.
I would like to create an array of n elements that sums to 1, however I want an added constraint that the minimum increment (or if you like number of significant figures) for each element is fixed.
For example if I want 10 numbers that sum to 1 without any constraint the following works perfectly:
num_stocks=10;
num_simulations=100000;
temp = [zeros(num_simulations,1),sort(rand(num_simulations,num_stocks-1),2),ones(num_simulations,1)];
weights = diff(temp,[],2);
I foolishly thought that by scaling this I could add the constraint as follows
num_stocks=10;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp2 = [zeros(num_simulations,1),sort(round(rand(num_simulations,num_stocks-1)*scaling)/scaling,2),ones(num_simulations,1)];
weights2 = diff(temp2,[],2);
However though this works for small values of n & small values of increment, if for example n=1,000 & the increment is 0.1% then over a large number of trials the first and last numbers have a mean which is consistently below 0.1%.
I am sure there is a logical explanation/solution to this but I have been tearing my hair out to try & find it & wondered anybody would be so kind as to point me in the right direction. To put the problem into context create random stock portfolios (hence the sum to 1).
Thanks in advance
Thank you for the responses so far, just to clarify (as I think my initial question was perhaps badly phrased), it is the weights that have a fixed increment of 0.1% so 0%, 0.1%, 0.2% etc.
I did try using integers initially
num_stocks=1000;
min_increment=0.001;
num_simulations=100000;
scaling=1/min_increment;
temp = [zeros(num_simulations,1),sort(randi([0 scaling],num_simulations,num_stocks-1),2),ones(num_simulations,1)*scaling];
weights = (diff(temp,[],2)/scaling);
test=mean(weights);
but this was worse, the mean for the 1st & last weights is well below 0.1%.....
Edit to reflect excellent answer by Floris & clarify
The original code I was using to solve this problem (before finding this forum) was
function x = monkey_weights_original(simulations,stocks)
stockmatrix=1:stocks;
base_weight=1/stocks;
r=randi(stocks,stocks,simulations);
x=histc(r,stockmatrix)*base_weight;
end
This runs very fast, which was important considering I want to run a total of 10,000,000 simulations, 10,000 simulations on 1,000 stocks takes just over 2 seconds with a single core & I am running the whole code on an 8 core machine using the parallel toolbox.
It also gives exactly the distribution I was looking for in terms of means, and I think that it is just as likely to get a portfolio that is 100% in 1 stock as it is to geta portfolio that is 0.1% in every stock (though I'm happy to be corrected).
My issue issue is that although it works for 1,000 stocks & an increment of 0.1% and I guess it works for 100 stocks & an increment of 1%, as the number of stocks decreases then each pick becomes a very large percentage (in the extreme with 2 stocks you will always get a 50/50 portfolio).
In effect I think this solution is like the binomial solution Floris suggests (but more limited)
However my question has arrisen because I would like to make my approach more flexible & have the possibility of say 3 stocks & an increment of 1% which my current code will not handle correctly, hence how I stumbled accross the original question on stackoverflow
Floris's recursive approach will get to the right answer, but the speed will be a major issue considering the scale of the problem.
An example of the original research is here
http://www.huffingtonpost.com/2013/04/05/monkeys-stocks-study_n_3021285.html
I am currently working on extending it with more flexibility on portfolio weights & numbers of stock in the index, but it appears my programming & probability theory ability are a limiting factor.......
One problem I can see is that your formula allows for numbers to be zero - when the rounding operation results in two consecutive numbers to be the same after sorting. Not sure if you consider that a problem - but I suggest you think about it (it would mean your model portfolio has fewer than N stocks in it since the contribution of one of the stocks would be zero).
The other thing to note is that the probability of getting the extreme values in your distribution is half of what you want them to be: If you have uniformly distributed numbers from 0 to 1000, and you round them, the numbers that round to 0 were in the interval [0 0.5>; the ones that round to 1 came from [0.5 1.5> - twice as big. The last number (rounding to 1000) is again from a smaller interval: [999.5 1000]. Thus you will not get the first and last number as often as you think. If instead of round you use floor I think you will get the answer you expect.
EDIT
I thought about this some more, and came up with a slow but (I think) accurate method for doing this. The basic idea is this:
Think in terms of integers; rather than dividing the interval 0 - 1 in steps of 0.001, divide the interval 0 - 1000 in integer steps
If we try to divide N into m intervals, the mean size of a step should be N / m; but being integer, we would expect the intervals to be binomially distributed
This suggests an algorithm in which we choose the first interval as a binomially distributed variate with mean (N/m) - call the first value v1; then divide the remaining interval N - v1 into m-1 steps; we can do so recursively.
The following code implements this:
% random integers adding up to a definite sum
function r = randomInt(n, limit)
% returns an array of n random integers
% whose sum is limit
% calls itself recursively; slow but accurate
if n>1
v = binomialRandom(limit, 1 / n);
r = [v randomInt(n-1, limit - v)];
else
r = limit;
end
function b = binomialRandom(N, p)
b = sum(rand(1,N)<p); % slow but direct
To get 10000 instances, you run this as follows:
tic
portfolio = zeros(10000, 10);
for ii = 1:10000
portfolio(ii,:) = randomInt(10, 1000);
end
toc
This ran in 3.8 seconds on a modest machine (single thread) - of course the method for obtaining a binomially distributed random variate is the thing slowing it down; there are statistical toolboxes with more efficient functions but I don't have one. If you increase the granularity (for example, by setting limit=10000) it will slow down more since you increase the number of random number samples that are generated; with limit = 10000 the above loop took 13.3 seconds to complete.
As a test, I found mean(portfolio)' and std(portfolio)' as follows (with limit=1000):
100.20 9.446
99.90 9.547
100.09 9.456
100.00 9.548
100.01 9.356
100.00 9.484
99.69 9.639
100.06 9.493
99.94 9.599
100.11 9.453
This looks like a pretty convincing "flat" distribution to me. We would expect the numbers to be binomially distributed with a mean of 100, and standard deviation of sqrt(p*(1-p)*n). In this case, p=0.1 so we expect s = 9.4868. The values I actually got were again quite close.
I realize that this is inefficient for large values of limit, and I made no attempt at efficiency. I find that clarity trumps speed when you develop something new. But for instance you could pre-compute the cumulative binomial distributions for p=1./(1:10), then do a random lookup; but if you are just going to do this once, for 100,000 instances, it will run in under a minute; unless you intend to do it many times, I wouldn't bother. But if anyone wants to improve this code I'd be happy to hear from them.
Eventually I have solved this problem!
I found a paper by 2 academics at John Hopkins University "Sampling Uniformly From The Unit Simplex"
http://www.cs.cmu.edu/~nasmith/papers/smith+tromble.tr04.pdf
In the paper they outline how naive algorthms don't work, in a way very similar to woodchips answer to the Random numbers that add to 100 question. They then go on to show that the method suggested by David Schwartz can also be slightly biased and propose a modified algorithm which appear to work.
If you want x numbers that sum to y
Sample uniformly x-1 random numbers from the range 1 to x+y-1 without replacement
Sort them
Add a zero at the beginning & x+y at the end
difference them & subtract 1 from each value
If you want to scale them as I do, then divide by y
It took me a while to realise why this works when the original approach didn't and it come down to the probability of getting a zero weight (as highlighted by Floris in his answer). To get a zero weight in the original version for all but the 1st or last weights your random numbers had to have 2 values the same but for the 1st & last ones then a random number of zero or the maximum number would result in a zero weight which is more likely.
In the revised algorithm, zero & the maximum number are not in the set of random choices & a zero weight occurs only if you select two consecutive numbers which is equally likely for every position.
I coded it up in Matlab as follows
function weights = unbiased_monkey_weights(num_simulations,num_stocks,min_increment)
scaling=1/min_increment;
sample=NaN(num_simulations,num_stocks-1);
for i=1:num_simulations
allcomb=randperm(scaling+num_stocks-1);
sample(i,:)=allcomb(1:num_stocks-1);
end
temp = [zeros(num_simulations,1),sort(sample,2),ones(num_simulations,1)*(scaling+num_stocks)];
weights = (diff(temp,[],2)-1)/scaling;
end
Obviously the loop is a bit clunky and as I'm using the 2009 version the randperm function only allows you to generate permutations of the whole set, however despite this I can run 10,000 simulations for 1,000 numbers in 5 seconds on my clunky laptop which is fast enough.
The mean weights are now correct & as a quick test I replicated woodchips generating 3 numbers that sum to 1 with the minimum increment being 0.01% & it also look right
Thank you all for your help and I hope this solution is useful to somebody else in the future
The simple answer is to use the schemes that work well with NO minimum increment, then transform the problem. As always, be careful. Some methods do NOT yield uniform sets of numbers.
Thus, suppose I want 11 numbers that sum to 100, with a constraint of a minimum increment of 5. I would first find 11 numbers that sum to 45, with no lower bound on the samples (other than zero.) I could use a tool from the file exchange for this. Simplest is to simply sample 10 numbers in the interval [0,45]. Sort them, then find the differences.
X = diff([0,sort(rand(1,10)),1]*45);
The vector X is a sample of numbers that sums to 45. But the vector Y sums to 100, with a minimum value of 5.
Y = X + 5;
Of course, this is trivially vectorized if you wish to find multiple sets of numbers with the given constraint.