getting the R-square parameter - linear-regression

I did a regression for multiple portfolios (25 to be exact) , below is the loop to extract the alphas (interecpts) of the 25 portfolios and the betas (slopes). I want to extract the R-square of all 25 portfolio and add them to an appended empty list just like i did for the alphas and betas.. but its not working for r-square.. I want to do this for R-squares and standard errors.
I am using the statsmodels.formula.api
I tried reg.rsquared and reg.bse but both didn't work.
Showed error "dataframe object has no attribute"
Code:
# create empty series to store parameter estimates
CAPM_alphas = pd.Series()
CAPM_betas = pd.Series()
CAPM_R2 = pd.Series()
FF3_alphas = pd.Series()
FF3_betas = pd.Series()
FF3_R2 = pd.Series()
r_list = ['SMALL LoBM', 'ME1 BM2', 'ME1 BM3', 'ME1 BM4', 'SMALL HiBM',
'ME2 BM1', 'ME2 BM2', 'ME2 BM3', 'ME2 BM4', 'ME2 BM5',
'ME3 BM1', 'ME3 BM2', 'ME3 BM3', 'ME3 BM4', 'ME3 BM5',
'ME4 BM1', 'ME4 BM2', 'ME4 BM3', 'ME4 BM4', 'ME4 BM5',
'BIG LoBM', 'ME5 BM2', 'ME5 BM3', 'ME5 BM4', 'BIG HiBM']
for qqq in r_list:
reg = assetPriceReg(r[qqq])['results']
# store alphas and betas
CAPM_alphas = CAPM_alphas.append(pd.Series(reg.CAPMcoeff[0]))
CAPM_betas = CAPM_betas.append(pd.Series(reg.CAPMcoeff[1]))
FF3_alphas = FF3_alphas.append(pd.Series(reg. FF3coeff[0]))
FF3_betas = FF3_betas.append(pd.Series(reg.FF3coeff[1]))
CAPM_R2 = CAPM_R2.append(pd.Series(reg.rsquared_adj))
FF3_R2 = FF3_R2.append(pd.Series(reg.rsquared_adj))

Related

Machine Translation FFN : Dimension problem due to window size

this is my first time creating a FFN to train it to translate French to English using word prediction:
Input are two arrays of size 2 x window_size + 1 from source language and window_size target language. And the label of size 1
For e.g for window_size = 2:
["je","mange", "la", "pomme","avec"]
and
["I", "eat"]
So the input of size [5] and [2] after concatenating => 7
Label: "the" (refering to "la" in French)
The label is changed to one-hot-encoding before comparing with yHat
I'm using unique index for each word ( 1 to len(vocab) ) and train using the index (not the words)
The output of the FFN is a probability of the size of the vocab of the target language
The problem is that the FFN doesn't learn and the accuracy stays at 0.
When I print the size of y_final (target probability) and yHat (Model Hypo) they have different dimensions:
yHat.size()=[512, 7, 10212]
with 64 batch_size, 7 is the concatenated input size and 10212 size of target vocab, while
y_final.size()= [512, 10212]
And over all the forward method I have these sizes:
torch.Size([512, 5, 32])
torch.Size([512, 5, 64])
torch.Size([512, 5, 64])
torch.Size([512, 2, 256])
torch.Size([512, 2, 32])
torch.Size([512, 2, 64])
torch.Size([512, 2, 64])
torch.Size([512, 7, 64])
torch.Size([512, 7, 128])
torch.Size([512, 7, 10212])
Since the accuracy augments when yHat = y_final then I thought that it is never the case because they don't even have the same shapes (2D vs 3D). Is this the problem ?
Please refer to the code and if you need any other info please tell me.
The code is working fine, no errors.
trainingData = TensorDataset(encoded_source_windows, encoded_target_windows, encoded_labels)
# print(trainingData)
batchsize = 512
trainingLoader = DataLoader(trainingData, batch_size=batchsize, drop_last=True)
def ffnModel(vocabSize1,vocabSize2, learningRate=0.01):
class ffNetwork(nn.Module):
def __init__(self):
super().__init__()
self.embeds_src = nn.Embedding(vocabSize1, 256)
self.embeds_target = nn.Embedding(vocabSize2, 256)
# input layer
self.inputSource = nn.Linear(256, 32)
self.inputTarget = nn.Linear(256, 32)
# hidden layer 1
self.fc1 = nn.Linear(32, 64)
self.bnormS = nn.BatchNorm1d(5)
self.bnormT = nn.BatchNorm1d(2)
# Layer(s) afer Concatenation:
self.fc2 = nn.Linear(64,128)
self.output = nn.Linear(128, vocabSize2)
self.softmaaax = nn.Softmax(dim=0)
# forward pass
def forward(self, xSource, xTarget):
xSource = self.embeds_src(xSource)
xSource = F.relu(self.inputSource(xSource))
xSource = F.relu(self.fc1(xSource))
xSource = self.bnormS(xSource)
xTarget = self.embeds_target(xTarget)
xTarget = F.relu(self.inputTarget(xTarget))
xTarget = F.relu(self.fc1(xTarget))
xTarget = self.bnormT(xTarget)
xCat = torch.cat((xSource, xTarget), dim=1)#dim=128 or 1 ?
xCat = F.relu(self.fc2(xCat))
print(xCat.size())
xCat = self.softmaaax(self.output(xCat))
return xCat
# creating instance of the class
net = ffNetwork()
# loss function
lossfun = nn.CrossEntropyLoss()
# lossfun = nn.NLLLoss()
optimizer = torch.optim.Adam(net.parameters(), lr=learningRate)
return net, lossfun, optimizer
def trainModel(vocabSize1,vocabSize2, learningRate):
# number of epochs
numepochs = 64
# create a new Model instance
net, lossfun, optimizer = ffnModel(vocabSize1,vocabSize2, learningRate)
# initialize losses
losses = torch.zeros(numepochs)
trainAcc = []
# loop over training data batches
batchAcc = []
batchLoss = []
for epochi in range(numepochs):
#Switching on training mode
net.train()
# loop over training data batches
batchAcc = []
batchLoss = []
for A, B, y in tqdm(trainingLoader):
# forward pass and loss
final_y = []
for i in range(y.size(dim=0)):
yy = [0] * target_vocab_length
yy[y[i]] = 1
final_y.append(yy)
final_y = torch.tensor(final_y)
yHat = net(A, B)
loss = lossfun(yHat, final_y)
################
print("\n yHat.size()")
print(yHat.size())
print("final_y.size()")
print(final_y.size())
# backprop
optimizer.zero_grad()
loss.backward()
optimizer.step()
# loss from this batch
batchLoss.append(loss.item())
print(f'batchLoss: {loss.item()}')
#Accuracy calculator:
matches = torch.argmax(yHat) == final_y # booleans (false/true)
matchesNumeric = matches.float() # convert to numbers (0/1)
accuracyPct = 100 * torch.mean(matchesNumeric) # average and x100
batchAcc.append(accuracyPct) # add to list of accuracies
print(f'accuracyPct: {accuracyPct}')
trainAcc.append(np.mean(batchAcc))
losses[epochi] = np.mean(batchLoss)
return trainAcc,losses,net
trainAcc,losses,net = trainModel(len(source_vocab),len(target_vocab), 0.01)
print(trainAcc)

Count number of variables in netcdf file

I have a netcdf file
ncdisp('F:\Data\Coriolis\ArgoProfiles\ArgoProfiles\DataSelection_20191119_093255_9342316/argo-profiles-2902636.nc')
Source:
F:\Data\Coriolis\ArgoProfiles\ArgoProfiles\DataSelection_20191119_093255_9342316\argo-profiles-2902636.nc
Format:
classic
Global Attributes:
title = 'Argo float vertical profile'
institution = 'CSIO'
source = 'Argo float'
history = '2019-11-19T09:33:19Z creation'
references = 'http://www.argodatamgt.org/Documentation'
comment = ''
user_manual_version = '3.03'
Conventions = 'Argo-3.0 CF-1.6'
featureType = 'trajectoryProfile'
Dimensions:
DATE_TIME = 14
STRING256 = 256
STRING64 = 64
STRING32 = 32
STRING16 = 16
STRING8 = 8
STRING4 = 4
STRING2 = 2
N_PROF = 2
N_PARAM = 3
N_LEVELS = 71
N_CALIB = 1
N_HISTORY = 6 (UNLIMITED)
Variables:
DATA_TYPE
Size: 16x1
Dimensions: STRING16
Datatype: char
Attributes:
long_name = 'Data type'
_FillValue = ' '
FORMAT_VERSION
Size: 4x1
Dimensions: STRING4
Datatype: char
Attributes:
long_name = 'File format version'
_FillValue = ' '
HANDBOOK_VERSION
Size: 4x1
Dimensions: STRING4
Datatype: char
Attributes:
long_name = 'Data handbook version'
_FillValue = ' '
And I would like to count the number of variables. In this case, the answer would be 3.
I tried to use
info=ncinfo(FilePath);
numel(info.Variables.Name)
But this gives me
2.6413e+96
Warning: Number of elements exceeds maximum flint 2^53-1.
The result may be inaccurate.
Which is not correct.
How do I find the number of variables in my netcdf file? My original netcdf file is a lot bigger and I can't count them by hand anymore.
#obchardon is right.
numel(info.Variables) counts the number of variables in the netcdf file.

How do I define a variable from another function?

I have a multi-fuction script that is supposed to ask the user for 4 different cars and weigh them based on ratings to give the user the best car to purchase.
What I want to do is have a prompt for every car the user inputs so the user can put in data for each variable the user decides to use. However, when titling the prompt I want to use the cars name in the prompt. It seems impossible to me and Im not sure what to do, im very new to coding.
Main Script
prompt1 = {'How Many Cars (4): '};
title1 = 'Cars';
answer1 = inputdlg(prompt1, title1, [1 40]);
Q1 = str2double(answer1{1});
[N] = Group_Function1(Q1);
Car1 = N(1); %Stores the names of the cars
Car2 = N(2);
Car3 = N(3);
Car4 = N(4);
prompt2 = {'How Many Variables (4): '};
title2 = 'Variables';
answer2 = inputdlg(prompt2, title2, [1 50]);
fprintf('This code can accept costs between 0-100000\n');
fprintf('This code can accept top speeds between 0-200\n');
fprintf('This code can also accept the terms none, some, & alot\n');
fprintf('This code can accept safety ratings between 0-5\n');
Q2 = str2double(answer2{1});
[V,W] = Group_Function2(Q2);
W1 = W(1); %Stores the weights of the varibles
W2 = W(2);
W3 = W(3);
W4 = W(4);
for h=1:Q1
[H] = Group_Function3(V);
Weights(h,:)=H;
end
Group_Function1
function [N] = Group_Function1(Q1)
for Q = 1:Q1
prompt = {'Name of Car:'};
title = 'Car Name';
answer = inputdlg(prompt,title, [1 80])';
N(Q) = answer(1);
end
Group_Function2
function [V,W] = Group_Function2(Q2)
for Q=1:Q2
prompt = {'Variable? (Negative Variables First):','weights in decimal
form?'};
title = 'Variables and Weights';
answer = inputdlg(prompt,title, [1 80])';
V(Q)=answer(1);
W(Q)=str2double(answer{2});
s=sum(W);
end
if s~=1
fprintf('Weights do not add up to 1. Try Again!\n');
Group_Function2(Q2);
end
end
Group_Function3 (Where the problem occurs)
function [H] = Group_Function3(V)
prompt = {V};
title = ['Variable Ratings For' Group_Function1(answer{1})];
h = inputdlg(prompt, title, [1 80])';
end
The Problem
For 'Group_Function3' I want the prompt to include the users inputs from 'Group_Function1' so that when the prompt comes up to input the answers I know which vehicle I am entering for.
Each function runs in its own workspace, it means it does not know the state or content of variables outside of it. If you want a function to know something specific (like the name of a car), you have to give that to the function in the input parameters. A function can have several inputs parameters, you are not limited to only one.
Before going into the Group_Function3 , I'd like to propose a new way for Group_Function1.
Group_Function1 :
You run a loop to ask independantly for each car name. It is rather tedious to have to validate each dialog boxe. Here is a way to ask for the 4 car names in one go:
replace the beginning of your script with:
title1 = 'Cars';
prompt1 = {'How Many Cars (4): '};
answer1 = inputdlg(prompt1, title1 );
nCars = str2double( answer1{1} );
CarNames = getCarNames(nCars) ; % <= use this function
% [N] = Group_Function1(Q1); % instead of this one
and replace Group_Function1 with:
function CarNames = getCarNames(nCars)
title = 'Car Names';
prompt = cellstr( [repmat('Name of car #',nCars,1) , sprintf('%d',(1:nCars)).'] ) ;
CarNames = inputdlg( prompt, title, [1 80] ) ;
end
Now CarNames is a cell array containing the name of your 4 cars (as your variable N was doing earlier. I recommend sligthly more explicit variable names).
You can run the rest of your code as is (just replace N with CarNames, and Q1 with nCars).
Group_Function3 :
when you get to the Group_Function3, you have to send the current car name to the function (so it can use the name in the title or prompt). So replace your Group_Function3 as following (we add an input variable to the function definition):
function H = Group_Function3( V , thisCarName )
prompt = {V};
title = ['Variable Ratings For' thisCarName];
H = inputdlg(prompt, title, [1 80])';
end
and in your main script, call it that way:
for h = 1:nCars
thisCarName = carNames{h} ;
H = Group_Function3( V , thisCarName ) ;
% ...
% anything else you want to do in this loop
end

Random selection of a member's location in a nested cell of cells: Matlab

I have a nested cell of cells like the one below:
CellArray={1,1,1,{1,1,1,{1,1,{1,{1 1 1 1 1 1 1 1}, 1,1},1,1},1,1,1},1,1,1,{1,1,1,1}};
I need to randomly pick a location in CellArray. All members' locations of CellArray must have same chances to be chosen in the random selection process. Thanks.
You can capture the output of the celldisp function. Then use regex to extrcat indices:
s=evalc('celldisp(CellArray,'''')');
m = regexp(s, '\{[^\=]*\}', 'match');
Thanks to #excaza that suggested a clearer use of regexp
Result:
m =
{
[1,1] = {1}
[1,2] = {2}
[1,3] = {3}
[1,4] = {4}{1}
[1,5] = {4}{2}
[1,6] = {4}{3}
[1,7] = {4}{4}{1}
[1,8] = {4}{4}{2}
[1,9] = {4}{4}{3}{1}
[1,10] = {4}{4}{3}{2}{1}
[1,11] = {4}{4}{3}{2}{2}
[1,12] = {4}{4}{3}{2}{3}
[1,13] = {4}{4}{3}{2}{4}
[1,14] = {4}{4}{3}{2}{5}
[1,15] = {4}{4}{3}{2}{6}
[1,16] = {4}{4}{3}{2}{7}
[1,17] = {4}{4}{3}{2}{8}
[1,18] = {4}{4}{3}{3}
[1,19] = {4}{4}{3}{4}
[1,20] = {4}{4}{4}
[1,21] = {4}{4}{5}
[1,22] = {4}{5}
[1,23] = {4}{6}
[1,24] = {4}{7}
[1,25] = {5}
[1,26] = {6}
[1,27] = {7}
[1,28] = {8}{1}
[1,29] = {8}{2}
[1,30] = {8}{3}
[1,31] = {8}{4}
}
Use randi to select an index:
m{randi(numel(m))}

import data and Looping for the file

My purpose is to split the string into part then check whether the 'f11_data' contain those split word. if yes then return 0, if no then return 1. I have 100 strings, but it doesn't make sense to type the str/needles 100 inside my code times. How do I use looping to do that? I'm facing a problem using importdata.
str1 = 'http://en.wikipedia.org/wiki/hostname';
str2 = 'http://hello/world/hello';
str3 = 'http://hello/asd/wee';
f11_data1 = 'hostname From wikipedia, the free encyclopedia Jump to: navigation, search In computer networking, a hostname (archaically nodename .....';
f11_data2 = 'hell';
f11_data3 = 'hello .....';
needles1 = strcat('\<', regexpi(str1,'[:/.]*','split'), '\>')
needles2 = strcat('\<', regexpi(str2,'[:/.]*','split'), '\>')
needles3 = strcat('\<', regexpi(str3,'[:/.]*','split'), '\>')
~cellfun('isempty', regexpi(f11_data1, needles1, 'once'))
~cellfun('isempty', regexpi(f11_data2, needles2, 'once'))
~cellfun('isempty', regexpi(f11_data3, needles3, 'once'))
This is how I modified the above code using a loop:
data = importdata('URL')
needles = regexp(data,'[:/.]*','split') %// note the different search string
for i = 1:2
A11_data = needles{i};
data2 = importdata(strcat('f11_data', int2str(i)));
%feature11_data=(~cellfun('isempty', regexpi(data2, needles, 'once')))
%feature11(i)=feature11_data
~cellfun('isempty', regexpi(data2, needles, 'once'))
end
I get the error :
"
??? Error using ==> regexpi
All cells for regexpi must be strings.
Error in ==> f11_test2 at 14
~cellfun('isempty', regexpi(haystack, needles, 'once')) "
It's hard to tell what you are trying to do but, if you want to do something equivalent,
for i = 1:3
str = importdata(URL_LIST[i])
needle = strcat('\<', regexp(str,'[:/.]*','split'), '\>');
haystack = importdata(HAYSTACK_LIST[i]);
~cellfun('isempty', regexpi(haystack, needle, 'once'))
end
Obviously you need to define URL_LIST and HAYSTACK_LIST.