I was trying to calculate the beta for Reckitt Benckiser, but found many sites with different answers. So should I choose the S&P500 or another index?
After studying the monthly price or the daily price?
If possible can you teach me to do it again?
Thanks for the answer.
You can calculate the beta as cov / var.
First, you have to calculate (log) returns for your stock and the market index. I chose the FTSE 100 as the market proxy for Reckitt Benckiser, as Reckitt Benckiser is part of the index. Next, you can do the beta calculation.
Here is some sample code:
import yfinance as yf
import numpy as np
close = yf.download(['RBGLY', 'UKX'])['Adj Close']
log_returns = np.log(close/close.shift())
cov = log_returns.cov()
var = log_returns['UKX'].var()
beta = cov.loc['RBGLY', 'UKX']/var
Output:
0.011709992935796415
Related
I have calculated the embedding with the help of doc2vec and I have also calculated the distance between sentences in vector form. now I have a vector of sentences that tells the distance between them(sentences). how can I cluster them without giving the number of clusters? I have used k-means and agglomerative algo but they are not giving me good results. can anybody tell me the best method to determine the optimal number of clusters?
Try this. If it doesn't do what you want, I have a few other code samples to share. This may be the best option. The best option to use, can change, based on the dataset that you feed into the algo.
import numpy as np
from sklearn.cluster import AffinityPropagation
import distance
words = "kitten belly squooshy merley best eating google feedback face extension impressed map feedback google eating face extension climbing key".split(" ") #Replace this line
words = np.asarray(words) #So that indexing with a list will work
lev_similarity = -1*np.array([[distance.levenshtein(w1,w2) for w1 in words] for w2 in words])
affprop = AffinityPropagation(affinity="precomputed", damping=0.5)
affprop.fit(lev_similarity)
for cluster_id in np.unique(affprop.labels_):
exemplar = words[affprop.cluster_centers_indices_[cluster_id]]
cluster = np.unique(words[np.nonzero(affprop.labels_==cluster_id)])
cluster_str = ", ".join(cluster)
print(" - *%s:* %s" % (exemplar, cluster_str))
Result:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.datasets import make_regression
a = make_regression(n_samples=300,n_features=5,noise=5)
df1 = pd.DataFrame(a[0])
df1 = pd.concat([df1,pd.DataFrame(a[1].T)],axis=1,ignore_index=True)
df1.rename(columns={0:"X1",1:"X2",2:"X3",3:"X4",4:"X5",5:"Target"},inplace=True)
sns.heatmap(df1.corr(),annot=True);
Correlation Matrix
Now I can ask my question. How can I choose features that will be included in the model?
I am not that well-versed in python as I use R most of the time.
But it should be something like this:
# Create a model
model = LinearRegression()
# Call the .fit method and pass in your data
model.fit(Variables,Target)
# Or simply do
model = LinearRegression().fit(Variables,Target)
# So based on the dataset head provided, it should be
X<-df1[['X1','X2','X3','X4','X5']]
Y<-df1['Target']
model = LinearRegression().fit(X,Y)
In order to do feature selections. You need to run the model first. Then check for the p-value. Typically, a p-value of 5% (.05) or less is a good cut-off point. If the p-value crosses the upper threshold of .05, the variable is insignificant and you can remove it from your model. You will have to do this manually. You can also tell by looking from the correlation matrix to see which value has less correlation to the target. AFAIK, there are no libs with built-in functionality to do feature selection automatically. In the end, statistics are just numbers. It is up to humans to interpret the results.
How would I go about downloading the min_max_scaler attributes so that I could apply the same transform to data within a different notebook?
For full disclosure I've trained a NN within one notebook, and am running it in a different locations. It is simple for me to load the trained weights of the NN in the second location, but I need to scale the data before inputting it into the model. To be accurate I believe it has to use the original scale attributes.
Per the documentation, you can recreate what min max scaler does using
X_std = (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))
X_scaled = X_std * (max - min) + min
where X is your original dataset. (Although as long as your feature range is the default of (0,1), the second line above is not needed - you will come out with X_scaled = X_std)
If you want to do this same computation using your already trained MaxMinScaler instead of your original dataset, consider the following example (again assuming feature range is left at the default (0,1))
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
import numpy as np
# Test data set
X = pd.DataFrame(np.random.randint(0, 100, size=(20,4)))
# Test scaler
scaler = MinMaxScaler()
sklearn_result = scaler.fit_transform(X)
# Compute, and verify results match up to machine precision
manual_result = (X - scaler.data_min_)/(scaler.data_max_ - scaler.data_min_)
(sklearn_result - test).max().max() . # Is around 10e-16
I am using Keras (version 2.0.0) and I'd like to make use of pretrained models like e.g. VGG16.
In order to get started, I ran the example of the [Keras documentation site ][https://keras.io/applications/] for extracting features with VGG16:
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
import numpy as np
model = VGG16(weights='imagenet', include_top=False)
img_path = 'elephant.jpg'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)
features = model.predict(x)
The used preprocess_input() function bothers me
(the function does Zero-centering by mean pixel what can be seen by looking at the source code).
Do I really have to preprocess input data (validation/test data) before using a trained model?
a)
If yes, one can conclude that you always have to be aware of what preprocessing steps have been performed during training phase?!
b)
If no: Does preprocessing of validation/test data cause a bias?
I appreciate your help.
Yes you should use the preprocessing step. You can retrain the model without it but the first layers will learn to center your datas so this is a waste of parameters.
If you do not recenter your performances will suffer.
Great thread on reddit : https://www.reddit.com/r/MachineLearning/comments/3q7pjc/why_is_removing_the_mean_pixel_value_from_each/
I have a optimization problem. The object is to max two variable x and y. How to represent it in google optimization tools python version.
what i can do now is:
from ortools.linear_solver import pywraplp
solver = pywraplp.Solver('RunIntegerExampleCppStyleAPI',pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
x=solver.IntVar(0, 1, 'x')
y=solver.Intvar(0,1,'y')
objective = solver.Objective()
#so how to get the object function of max x*y?
The question I want to define is quadratic programming, so it can't be solved in linear programming.