Sample weight in group mean comparison - t-test

I want to do a ttest for the two groups of the binaryeducation variable, but I cannot run this considering the weight of the observations. What do I need to change to include [pweight = w] in the code?
asdoc ttest binaryanswer1 if scenario==1, by(binaryeducation), [pweight = w]

Related

Understanding how pseudo random numbers in Matlab imply statistical independence

Consider the following Matlab code in which I generate some data using pseudo random number generator.
I would like your help to understand "how" random are these numbers from a statistical point of view, in the terms I explain below.
I first set some parameters
%%%%%%%%Parameters
clear
rng default
Xsup=-1:6;
Zsup=1:10;
n_m=200;
n_w=200;
R=n_m;
Then I generate the data
%%%%%%%%Creation of data [XZ,etapair,zetapair,etasingle,zetasingle]
%Vector X of dimension n_mx1
idX=randi(size(Xsup,2),n_m,1); %n_mx1
X=Xsup(idX).'; %n_mx1
%Vector Z of dimension n_wx1
idZ=randi(size(Zsup,2),n_w,1);
Z=Zsup(idZ).'; %n_wx1
%Combine X and Z in a matrix XZ of dimension (n_m*n_w)x2
which lists all possible combinations of values in X and Z
[cX, cZ] = ndgrid(X,Z);
XZ = [cX(:), cZ(:)]; %(n_m*n_w)x2
%Vector etapair of dimension (n_m*n_w)x1
etapair=randn(n_m*n_w,1); %(n_m*n_w)x1
%Vector zetapair of dimension (n_m*n_w)x1
zetapair=randn(n_m*n_w,1); %(n_m*n_w)x1
%Vector etasingle of dimension (n_m*n_w)x1
etasingle=max(randn(n_m,R),[],2); %n_mx1
etasingle=repmat(etasingle, n_w,1); %(n_m*n_w)x1
%Vector zetasingle of dimension (n_m*n_w)x1
zetasingle=max(randn(n_w,R),[],2); %n_wx1
zetasingle=kron(zetasingle, ones(n_m,1)); %(n_m*n_w)x1
Let me now translate these draws into statistical terms:
For t=1,...,n_w*n_m, X(t) can be thought as a realisation of a random variable X_t
For t=1,...,n_w*n_m, Z(t) can be thought as a realisation of a random variable Z_t
For t=1,...,n_w*n_m, etapair(t) can be thought as a realisation of a random variable E_t
For t=1,...,n_w*n_m, zetapair(t) can be thought as a realisation of a random variable Q_t
For t=1,...,n_w*n_m, etasingle(t) can be thought as a realisation of a random variable Y_t
For t=1,...,n_w*n_m, zetasingle(t) can be thought as a realisation of a random variable S_t
My belief was that the pseudo random number generator in Matlab allows to claim that
(X_1,X_2,..., Z_1,Z_2,...,E_1,E_2,..., Q_1,Q_2...,Y_1,Y_2,...,S_1,S_2,...) are mutually independent
as explained here
As a check of this hypothetical claim, I define W_t:=-E_t-Q_t+Y_t+S_t and empirically compute Pr(W_t<=1|X_t=5, Z_t=1)
If mutual independence holds, then Pr(W_t<=1|X_t=5, Z_t=1)=Pr(W_t<=1) and their empirical counterparts below, named option1 and option2, should be ALMOST the same.
%option 1
num1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1 && XZ(h,1)==5 && XZ(h,2)==1
num1(h)=1;
end
end
den1=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if XZ(h,1)==5 && XZ(h,2)==1
den1(h)=1;
end
end
option1=sum(num1)/sum(den1);
%option 2
num2=zeros(n_m*n_w,1);
for h=1:n_m*n_w
if -etapair(h)-zetapair(h)+etasingle(h)+zetasingle(h)<=1
num2(h)=1;
end
end
option2=sum(num2)/(n_m*n_w);
Question: the difference between option1 (=0.0021) and option2 (=0.0012) is referred to the "ALMOST" or I am doing something wrong?
By the very nature of observing random events, you cannot guarantee theoretically accurate results for a give empirical trial.
You have set rng default at the start of your script, which means you will always get the same result (option1 = 0.0021, option2 = 0.0012).
Running your script many times and averaging the results, we should approach theoretical accuracy.
kk = 10000;
option1 = zeros(kk, 1);
option2 = zeros(kk, 1);
for ii = 1:kk
% No need to use 'clear' here. If you were concerned
% for some reason, you could use 'clearvars -except kk option1 option2 ii'
% do not use 'rng default'. Use 'rng shuffle' if anything, but not necessary
Xsup = -1:6;
% ... all your other code
% replace 'option1=...' with 'option1(ii)=...'
% replace 'option2=...' with 'option2(ii)=...'
end
fprintf('Results:\nMean option1 = %f\nMean option2 = %f\n', mean(option1), mean(option2));
Results:
>> Mean option1 = 0.001461
>> Mean option2 = 0.001458
We can see these are the same to some degree of accuracy, which can be arbitrarily high if we run X trials (for large enough X). This is as expected for independent variables.
Note, if you have the parallel computing toolbox, this for loop can easily be swapped for a parfor, and you can run trials many times faster.

Select a subset of stocks using genetic algorithm in Matlab

I want to select 10 stocks out of the a possible set of given stocks that should be given some weight while the rest should be given zero weight. I have read the covariance matrix and returns from a file. My code is
Aeq = ones(1,stocks);
beq = 1;
lb = zeros(1,stocks);
up = ones(1,stocks);
options = gaoptimset;
options = gaoptimset(options,'PopulationSize' ,10);
fitnessFunction = #(x) (x * covariance * x') - (x * returns);
W = ga(fitnessFunction,stocks,[],[],Aeq,beq,lb,up,[],options);
This code is giving weights to all the stocks. I cannot figure it out how to limit the number to 10.
The 'PopulationSize' parameters specifies how many entities - in your case portfolios - exist at each epoch, it has nothing to do with the weights assigned to each asset.
You need to write appropriate crossoverFcn and mutationFcn functions that explicitly include maintaining exactly 10 non-zero weights.

Change the random number generator in Matlab function

I have a task to complete that requires quasi-random numbers as input, but I notice that the Matlab function I want to use does not have an option to select any of the quasi generators I want to use (e.g. Halton, Sobol, etc.). Matlab has them as stand alone functions and not as options in the ubiquitous 'randn' and 'rng' functions. What MatLab uses is the Mersenne Twister, a pseudo generator. So for instance the copularnd uses 'randn'/'rng' which is based on pseudo random numbers....
Is there a way to incorporate them into the rand or rng functions embedded in other code (e.g.copularnd)? Any pointers would be much appreciated. Note; 'copularnd' calls 'mvnrnd' which in turn uses 'randn' and then pulls 'rng'...
First you need to initialize the haltonset using the leap, skip, and scramble properties.
You can check the documents but the easy description is as follows:
Scramble - is used for shuffling the points
Skip - helps to exclude a range of points from the set
Leap - is the size of jump from the current selected point to the next one. The points in between are ignored.
Now you can built a haltonset object:
p = haltonset(2,'Skip',1e2,'Leap',1e1);
p = scramble(p,'RR2');
This makes a 2D halton number set by skipping the first 100 numbers and leaping over 10 numbers. The scramble method is 'PR2' which is applied in the second line. You can see that many points are generated:
p =
Halton point set in 2 dimensions (818836295885536 points)
Properties:
Skip : 100
Leap : 10
ScrambleMethod : RR2
When you have your haltonset object, p, you can access the values by just selecting them:
x = p(1:10,:)
Notice:
So, you need to create the object first and then use the generated points. To get different results, you can play with Leap and Scramble properties of the function. Another thing you can do is to use a uniform distribution such as randi to select numbers each time from the generated points. That makes sure that you are accessing uniformly random parts of the dataset each time.
For instance, you can generate a random index vector (4 points in this example). And then use those to select points from the halton points.
>> idx = randi(size(p,1),1,4)
idx =
1.0e+14 *
3.1243 6.2683 6.5114 1.5302
>> p(idx,:)
ans =
0.5723 0.2129
0.8918 0.6338
0.9650 0.1549
0.8020 0.3532
link
'qrandstream' may be the answer I am looking for....with 'qrand' instead of 'rand'
e.g..from MatLab doc
p = haltonset(1,'Skip',1e3,'Leap',1e2);
p = scramble(p,'RR2');
q = qrandstream(p);
nTests = 1e5;
sampSize = 50;
PVALS = zeros(nTests,1);
for test = 1:nTests
X = qrand(q,sampSize);
[h,pval] = kstest(X,[X,X]);
PVALS(test) = pval;
end
I will post my solution once I am done :)

Passing flux variables to events function in Matlab ODE

I am developing a dynamical model describing plant and microbe dynamics. Plant growth can be limited by carbon (light), nitrogen, or phosphorus, and the model has different dynamics depending on which of these three elements is limiting. Plants can get nitrogen from either the soil or from the atmosphere, and I want to determine the amount of nutrients obtained by each pathway. So, at each time step, I want to calculate what nutrient is limiting and then calculate the corresponding fluxes and changes in pools. My ultimate goal is to have a table with data for all of the pools, fluxes, and limitation.
So far, to address this question, I have been using ode45 in MATLAB. My function has both events and a series of extra values that are computed in my ODE (i.e., I want to extract different fluxes over time, and not just the dY values). Is there a way to export all of the fluxes/variables and use an events function?
The code I am using to pass out the extra variables (xvt, yvt, limvt) is of the form below
(based off of this MATLAB Central post):
function [dydx xvt yvt limvt] = myode(t,input,ps)
persistent xv yv limv
condition=input*ps;
if condition>1: lim=1; x=x*1; y=y/1;
elseif condition==1: lim=2; x=x*2; y=y/2;
elseif condition<1: lim=3; x=x*3; y=y/3;
end
limv = [limv; lim];
yv = [yv; y];
xv = [xv; x];
dydx = x + y;
if nargout>1
xvt = xv; yvt = yv; limvt = limv;
end
The solution to the ODE can be solved using the commands:
[X Y] = ode45(#myode, [0 5], 1);
Then, xv and yv and limv can be obtained via:
[dY, xv, yv, limv]=myode([], input,ps);
To use the events function, where changes in limv are the events, I set the options as follows (events_functions does the math to make events output the right values at the right times):
options = odeset('Events',#(t0,y,ps)events_function(t0,input,ps,limv,years));
So then I can run the function with the events (in a while loop, to start and stop whenever the events happen):
[t,xx] = ode23(#myode,[t0:years],input,options,ps);
However, the above lets me do either events based on limv, or calculate limv, but not both. I believe that this poster was trying to do the same thing. Do you know if this is possible?

Generating a set of emissions given a transition matrix and starting state in a hidden markov model

I have the transition matrix, emission matrix and starting state for a hidden Markov model. I want to generate a sequence of observations (emissions). However, I'm stuck on one thing.
I understand how to choose among two states (or emissions). If Event A probability x, then Event B (or, really not-A) occurs with probability 1-x. To generate a sequence of A's and B's, with a random number, rand, you do the following.
for iteration in iterations:
observation[iteration] <- A if rand < x else B
I don't understand how to extend this to more than two variables. For example, if three events occur such that Event A occurs with probability x1, Event B with x2 and Event C with 1-(x1+x2), then how do I extend the above pseudocode?
I didn't find the answer Googling. In fact I get the impression that I'm missing a basic fact that many of the notes online assume. :-/
One way would be
x<-rand()
if x < x1 observation is A
else if x < x1 + x2 observation is B
else observation is C
Of course if you have a large number of alternatives it might be better to build a cumulative probability table (holding x1, x1+x2, x1+x2+x3 ...) and then do a binary search in that table given the random number. If you are willing to do more preprocessing, there is an even more efficient way, see here for example
The two value case is a binomial distribution, and you generate random draws from a binomial distribution (a series of coin flips, essentially).
For more than 2 variables, you need to draw samples from a multinomial distribution, which is simply a generalisation of the binomial distribution for n>2.
Regardless of what language you use, there should most likely be built-in functions to accomplish this task. Below is some code in Python, which simulates a set of observations and states given your hmm model object:
import numpy as np
def random_MN_draw(n, probs):
""" get X random draws from the multinomial distribution whose probability is given by 'probs' """
mn_draw = np.random.multinomial(n,probs) # do 1 multinomial experiment with the given probs with probs= [0.5,0.5], this is a coin-flip
return np.where(mn_draw == 1)[0][0] # get the index of the state drawn e.g. 0, 1, etc.
def simulate(self, nSteps):
""" given an HMM = (A, B1, B2 pi), simulate state and observation sequences """
lenB= len(self.emission)
observations = np.zeros((lenB, nSteps), dtype=np.int) # array of zeros
states = np.zeros(nSteps)
states[0] = self.random_MN_draw(1, self.priors) # appoint the first state from the prior dist
for i in range(0,lenB): # initialise observations[i,0] for all observerd variables
observations[i,0] = self.random_MN_draw(1, self.emission[i][states[0],:]) #ith variable array, states[0]th row
for t in range(1,nSteps): # loop through t
states[t] = self.random_MN_draw(1, self.transition[states[t-1],:]) # given prev state (t-1) pick what row of the A matrix to use
for i in range(0,lenB): # loop through the observed variable for each t
observations[i,t] = self.random_MN_draw(1, self.emission[i][states[t],:]) # given current state t, pick what row of the B matrix to use
return observations,states
In pretty much every language, you can find equivalents of
np.random.multinomial()
for multinomial and other discrete or continuous distributions as built-in functions.