Sample size formula for comparing more than 2 proportions - hypothesis-test

I have one control group and two treatment groups.
I know the sample size formulae for comparing two proportions.
n = (Zα/2+Zβ)2 * (p1(1-p1)+p2(1-p2)) / (p1-p2)2,
Can someone please help me what the formula should be for comparing 3 proportions?
Thanks

Related

Tableau calculation with attributes

I'm new with tableau and I try to calculate with different attributes from one dimension and one measure.
I want to calculate the sum of the attributes RC,RD and RP and divide this by RI. I would be very thankfull if you can help me.
Create two calculated fields.
if [Rrid] = 'RC' or [Rrid] = 'RD' or [Rrid] = 'RP' then [Rrem] end
Then create the same for RI. Divide the first by the second
sum(calc1) / sum(calc2)

For loop in Matlab for displaying all possible pairs of electrodes

I'm new to MATLAB and am stumped with an assignment for my research position. I need to create a For loop that compares two sets of data (day one and day 2) from 180 electrodes. However, we only have overlapping data from 175 electrodes (common_chans). I need to make a list of all possible electrode combinations from day 1 and day 2 of the common_chans electrodes, and have it display as an Y x 2 matrix for the all the combinations.
Any suggestions on how to do this? I feel like it's simple but I just don't have the background. Thanks!
Not sure I understood your question correctly, but if you need to make a list of all possible pairs:
nchoosek(1:175,2)
This displays a 15225 x 2 matrix.
This assumes that you don't care about the order (e.g. once you compared electrode 3 with electrode 6, this assumes you don't want to compare again electrode 6 with electrode 3). So this is different from Luis' link.

Calculating value y for each XI and XII in MATLAB:

I am currently working in matlab to design a way to reconstruct 3D data. For this I have two pictures with black points. The difference in the amount of points per frame is key for the reconstruction, but MATLAB gives an error when matrixes are not equal. This is happening becaus the code is not doing what I want it to do, so can anyone hel me with the following?
I have two columns of Xdata: XLI and XRI
What matlab does when I do XLI-XRI is substracting the pairs i.e XLI(1)-XRI(1) etc, but I want to substract each value of XRI of every value of XLI. i.e
XLI(1)-XRI(1,2,3,4 etc)
XLI(2)-XRI(1 2 3 4 etc)
and so on
Can anyone help?
I think you are looking for a way to deduct all combinations from eachother. Here is an example of how you can do that with bsxfun:
xLI = [1 2 3]
xRI = [1 2]
bsxfun(#minus,xLI ,xRI')
I cannot comment on Dennis's post (not enough points on this website) : his solution should work, but depending on your version of Matlab you might get a "Error using ==> bsxfun" and need to transpose either xLI or xRI for that to work :
bsxfun(#minus,xLI' ,xRI)
Best,
Tepp

Interactive Report - aggregate sum of multiple columns from one table multiply by values from another Table

I have a challenge in Oracle Apex - I would to sum multiple columns to give 3 extra rows namely points, Score, %score. There are more columns but I'm only choosing a few for now.
Below is an example structure of my data:
Town | Sector | Outside| Inside |Available|Price
Roy-----Formal----0----------0----------1------0
Kobus --Formal----0 ---------0--------- 1------0
Wika ---Formal----0----------0--------- 1------0
Mevo----Formal----1----------1----------1------0
Hoch----Formal----1----------1----------1------1
Points------------2----------2----------5------1
Score------------10---------10---------10------10
%score-----------20---------20---------50------10
Each column has a constant weighting (which serves as a factor and it can change depending on the areas). In this case, the weighting for the areas are in the first row of the sector Formal:
Sector |Outside| Inside |Available|Price
Formal----1----------1 ----------1-----1
Informal--1----------0 ----------2-----1
I tried using the aggregate sum function in apex but it wont work because I need the factor in the other table. This is where my challenge began.
To compute the rows below the report
points = sum per column * weighting factor per column
Score = sum of no of shops visited (in this case its 5) * weighting factor per column
% score = points/Score * 100
The report should display as described above. With the new computed rows below.
I kindly ask anyone to assist me with this challenge as I have tried searching for solutions but haven't come across any.
Thanks a lot for your support in advance!!

How to compare different distribution means with reference truth value in Matlab?

I have production (q) values from 4 different methods stored in the 4 matrices. Each of the 4 matrices contains q values from a different method as:
Matrix_1 = 1 row x 20 column
Matrix_2 = 100 rows x 20 columns
Matrix_3 = 100 rows x 20 columns
Matrix_4 = 100 rows x 20 columns
The number of columns indicate the number of years. 1 row would contain the production values corresponding to the 20 years. Other 99 rows for matrix 2, 3 and 4 are just the different realizations (or simulation runs). So basically the other 99 rows for matrix 2,3 and 4 are repeat cases (but not with exact values because of random numbers).
Consider Matrix_1 as the reference truth (or base case ). Now I want to compare the other 3 matrices with Matrix_1 to see which one among those three matrices (each with 100 repeats) compares best, or closely imitates, with Matrix_1.
How can this be done in Matlab?
I know, manually, that we use confidence interval (CI) by plotting the mean of Matrix_1, and drawing each distribution of mean of Matrix_2, mean of Matrix_3 and mean of Matrix_4. The largest CI among matrix 2, 3 and 4 which contains the reference truth (or mean of Matrix_1) will be the answer.
mean of Matrix_1 = (1 row x 1 column)
mean of Matrix_2 = (100 rows x 1 column)
mean of Matrix_3 = (100 rows x 1 column)
mean of Matrix_4 = (100 rows x 1 column)
I hope the question is clear and relevant to SO. Otherwise please feel free to edit/suggest anything in question. Thanks!
EDIT: My three methods I talked about are a1, a2 and a3 respectively. Here's my result:
ci_a1 =
1.0e+008 *
4.084733001497999
4.097677503988565
ci_a2 =
1.0e+008 *
5.424396063219890
5.586301025525149
ci_a3 =
1.0e+008 *
2.429145282593182
2.838897116739112
p_a1 =
8.094614835195452e-130
p_a2 =
2.824626709966993e-072
p_a3 =
3.054667629953656e-012
h_a1 = 1; h_a2 = 1; h_a3 = 1
None of my CI, from the three methods, includes the mean ( = 3.454992884900722e+008) inside it. So do we still consider p-value to choose the best result?
If I understand correctly the calculation in MATLAB is pretty strait-forward.
Steps 1-2 (mean calculation):
k1_mean = mean(k1);
k2_mean = mean(k2);
k3_mean = mean(k3);
k4_mean = mean(k4);
Step 3, use HIST to plot distribution histograms:
hist([k2_mean; k3_mean; k4_mean]')
Step 4. You can do t-test comparing your vectors 2, 3 and 4 against normal distribution with mean k1_mean and unknown variance. See TTEST for details.
[h,p,ci] = ttest(k2_mean,k1_mean);
EDIT : I misinterpreted your question. See the answer of Yuk and following comments. My answer is what you need if you want to compare distributions of two vectors instead of a vector against a single value. Apparently, the latter is the case here.
Regarding your t-tests, you should keep in mind that they test against a "true" mean. Given the number of values for each matrix and the confidence intervals it's not too difficult to guess the standard deviation on your results. This is a measure of the "spread" of your results. Now the error on your mean is calculated as the standard deviation of your results divided by the number of observations. And the confidence interval is calculated by multiplying that standard error with appx. 2.
This confidence interval contains the true mean in 95% of the cases. So if the true mean is exactly at the border of that interval, the p-value is 0.05 the further away the mean, the lower the p-value. This can be interpreted as the chance that the values you have in matrix 2, 3 or 4 come from a population with a mean as in matrix 1. If you see your p-values, these chances can be said to be non-existent.
So you see that when the number of values get high, the confidence interval becomes smaller and the t-test becomes very sensitive. What this tells you, is nothing more that the three matrices differ significantly from the mean. If you have to choose one, I'd take a look at the distributions anyway. Otherwise the one with the closest mean seems a good guess. If you want to get deeper into this, you could also ask on stats.stackexchange.com
Your question and your method aren't really clear :
Is the distribution equal in all columns? This is important, as two distributions can have the same mean, but differ significantly :
is there a reason why you don't use the Central Limit Theorem? This seems to me like a very complex way of obtaining a result that can easily be found using the fact that the distribution of a mean approaches a normal distribution where sd(mean) = sd(observations)/number of observations. Saves you quite some work -if the distributions are alike! -
Now if the question is really the comparison of distributions, you should consider looking at a qqplot for a general idea, and at a 2-sample kolmogorov-smirnov test for formal testing. But please read in on this test, as you have to understand what it does in order to interprete the results correctly.
On a sidenote : if you do this test on multiple cases, make sure you understand the problem of multiple comparisons and use the appropriate correction, eg. Bonferroni or Dunn-Sidak.