GPflow change point kernel issue with multiple dimensions - gpflow

I'm following the tutorial here for implementing a change point kernel in gpflow.
However, I have 3 inputs and 1 output and I would like the changepoint kernel to be on the first input dimension only and other standard kernels to be on the other two input dimensions. I'm getting the following error :
InvalidArgumentError: Incompatible shapes: [2000,3,1] vs. [3,2000,1] [Op:Mul] name: mul/
Below is a minimum working example. Could anyone please let me know where I'm going wrong?
gpflow version 2.0.0.rc1
import pandas as pd
import gpflow
from gpflow.utilities import print_summary
df_all = pd.read_csv(
'https://raw.githubusercontent.com/ipan11/gp/master/dataset.csv')
# Training dataset in numpy format
X = df_all[['X1', 'X2', 'X3']].to_numpy()
Y1 = df_all['Y'].to_numpy().reshape(-1, 1)
# Changepoint kernel only on first dimension and standard kernels for the other two dimensions
base_k1 = gpflow.kernels.Matern32(lengthscale=0.2, active_dims=[0])
base_k2 = gpflow.kernels.Matern32(lengthscale=2., active_dims=[0])
k1 = gpflow.kernels.ChangePoints(
[base_k1, base_k2], [.4], steepness=5)
k2 = gpflow.kernels.Matern52(lengthscale=[1., 1.], active_dims=[1, 2])
k_all = k1+k2
print_summary(k_all)
m1 = gpflow.models.GPR(data=(X, Y1), kernel=k_all, mean_function=None)
print_summary(m1)
opt = gpflow.optimizers.Scipy()
def objective_closure():
return -m1.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure, m1.trainable_variables,
options=dict(maxiter=100))

The correct answer would be to move the active_dims=[0] from the base_k* kernels to the ChangePoints() kernel,
k1 = gpflow.kernels.ChangePoints([base_k1, base_k2], [0.4], steepness=5, active_dims=[0])
but this is currently not supported in GPflow 2, which is a bug. I've opened an issue on github, and will update this answer once it's fixed (if you feel up to having a go at fixing this bug, feel free to open a pull request, help always welcome!).

Related

Can I draw a bipartite graph from every dataset?

I am trying to draw a bipartite graph for my data set, which is like below:
source target weight
reduce energy 25
reduce consumption 25
energy pennsylvania 4
energy natural 4
consumption balancing 4
the code That I am trying to plot the graph is as below:
C_2021 = nx.Graph()
C_2021.add_nodes_from(df_final_2014['source'], bipartite=0)
C_2021.add_nodes_from(df_final_2014['target'], bipartite=1)
edges = df_final_2014[['source', 'target','weight']].apply(tuple, axis=1)
C_2021.add_weighted_edges_from(edges)
But when I check with the below code whether it is bipartite or not, I get the "False" feedback.
nx.is_bipartite(C_2021)
Could you please advise what the issue is?
The previous issue is resolved, but when I want to plot the bipartite graph with the below steps, I do not get a proper result. If someone could help me, I will be appreciated it:
top_nodes_2021 = set(n for n,d in C_2021.nodes(data=True) if d['bipartite']==0)
top_nodes_2021
the output of the above is:
{'reduce'}
bottom_nodes_2021 = set(C_2021) - top_nodes_2021
bottom_nodes_2021
the output of the above is:
{'balancing', 'consumption', 'energy', 'natural', 'pennsylvania '}
then plot it by:
pos = nx.bipartite_layout(C_2021,top_nodes_2021)
plt.figure(figsize=[8,6])
# Pass that layout to nx.draw
nx.draw(C_2021,pos,node_color='#A0CBE2',edge_color='black',width=0.2,
edge_cmap=plt.cm.Blues,with_labels=True)
and the result is:
It works for me using your code. nx.is_bipartite(C_2021) returns true. Check the example below:
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
data = StringIO('''source;target;weight
reduce;energy;25
reduce;consumption;25
energy;pennsylvania ;4
energy;natural;4
consumption;balancing;4
''')
df_final_2014 = pd.read_csv(data, sep=";")
C_2021 = nx.Graph()
C_2021.add_nodes_from(df_final_2014['source'], bipartite=0)
C_2021.add_nodes_from(df_final_2014['target'], bipartite=1)
edges = df_final_2014[['source', 'target','weight']].apply(tuple, axis=1)
C_2021.add_weighted_edges_from(edges)
nx.is_bipartite(C_2021)
Finally to draw them get the bipartite sets. The data you passed during the creation is false (i.g. bipartite=0 and bipartite=1).
Use the following commands:
from networkx.algorithms import bipartite
top_nodes_2021, bottom_nodes_2021 = bipartite.sets(C_2021)
pos = nx.bipartite_layout(C_2021, top_nodes_2021)
plt.figure(figsize=[8,6])
# Pass that layout to nx.draw
nx.draw(C_2021,pos,node_color='#A0CBE2',edge_color='black',width=0.2,
edge_cmap=plt.cm.Blues,with_labels=True)
With the following result:

Why scipy.griddata is much slower than matlab's griddata?

I could find some existing topics about this but somehow I could not find an answer...
Here is a python example taken from https://gist.github.com/fjarri/b6f1faefa95995d119b8 (already used in Why is scipy.interpolate.griddata so slow?), giving
Python:
import time
import numpy as np
from scipy.interpolate import griddata
def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
grid_x, grid_y = np.mgrid[0:1:200j, 0:1:200j]
points = np.random.rand(410500, 2)
values = func(points[:,0], points[:,1])
t1 = time.time()
grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear')
print(time.time() - t1)
the print gives always about 6.4secs
Matlab :
[grid_x, grid_y] = meshgrid(1:200, 1:200);
points = rand(410500, 2);
x=points(:,1);
y=points(:,2);
values = x.*(1-x).*cos(4*pi*x).*sin(4*pi*y.^2).^2;
tic;vq = griddata(x,y,values,grid_x,grid_y,'linear');toc;
the print gives always about 2.4secs.
Someone knows why is there such a big difference between the two software ? And if there is a solution to accelerate scipy.griddata ? I deal with many large arrays of scattered points and griddata is responsible for most of my computation time... but I need to use python for this and it is very slow.

Pytorch: NN function approximator, 2 in 1 out

[Please be aware of the Edit History below, as the major problem statement has changed.]
We are trying to implement a neural network in pytorch, that approximates a function f(x,y)=z. So there are two real numbers as input and one as ouput, we therefore want 2 nodes in the input layer and one in the output layer. We constructed a test set of 5050 samples and had pretty good results for that task in Keras with Tensorflow backend, with 3 hidden layers with a configuration of the nodes like: 2(in) - 4 - 16 - 4 - 1(out); and ReLU activation functions on all hidden layers, linear on in- and output.
Now in Pytorch we tried to implement a similar network but our loss function still literally explodes: It changes in the first few steps and converges then to some value around 10^7. In Keras we had an error around 10 percent. We already tried different network configurations without any improvement. Maybe someone could have a look on our code and suggest any change?
To explain: tr_data is a list, containing 5050 2*1 numpy arrays which are the inputs for the network. tr_labels is a list, containing 5050 numbers which are the outputs we want to learn. loadData() just load those two lists.
import torch.nn as nn
import torch.nn.functional as F
BATCH_SIZE = 5050
DIM_IN = 2
DIM_HIDDEN_1 = 4
DIM_HIDDEN_2 = 16
DIM_HIDDEN_3 = 4
DIM_OUT = 1
LEARN_RATE = 1e-4
EPOCH_NUM = 500
class Net(nn.Module):
def __init__(self):
#super(Net, self).__init__()
super().__init__()
self.hidden1 = nn.Linear(DIM_IN, DIM_HIDDEN_1)
self.hidden2 = nn.Linear(DIM_HIDDEN_1, DIM_HIDDEN_2)
self.hidden3 = nn.Linear(DIM_HIDDEN_2, DIM_HIDDEN_3)
self.out = nn.Linear(DIM_HIDDEN_3, DIM_OUT)
def forward(self, x):
x = F.relu(self.hidden1(x))
x = F.tanh(self.hidden2(x))
x = F.tanh(self.hidden3(x))
x = self.out(x)
return x
model = Net()
loss_fn = nn.MSELoss(size_average=False)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARN_RATE)
tr_data,tr_labels = loadData()
tr_data_torch = torch.zeros(BATCH_SIZE, DIM_IN)
tr_labels_torch = torch.zeros(BATCH_SIZE, DIM_OUT)
for i in range(BATCH_SIZE):
tr_data_torch[i] = torch.from_numpy(tr_data[i])
tr_labels_torch[i] = tr_labels[i]
for t in range(EPOCH_NUM):
labels_pred = model(tr_data_torch)
loss = loss_fn(labels_pred, tr_labels_torch)
#print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
I have to say, those are our first steps in Pytorch, so please forgive me if there are some obvious, dumb mistakes. I appreciate any help or hint,
Thank you!
EDIT 1 ------------------------------------------------------------------
Following the comments and answers, we improved our code. The Loss function has now for the first time reasonable values, around 250. Our new class definition looks like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#super().__init__()
self.hidden1 = nn.Sequential(nn.Linear(DIM_IN, DIM_HIDDEN_1), nn.ReLU())
self.hidden2 = nn.Sequential(nn.Linear(DIM_HIDDEN_1, DIM_HIDDEN_2), nn.ReLU())
self.hidden3 = nn.Sequential(nn.Linear(DIM_HIDDEN_2, DIM_HIDDEN_3), nn.ReLU())
self.out = nn.Linear(DIM_HIDDEN_3, DIM_OUT)
def forward(self, x):
x = self.hidden1(x)
x = self.hidden2(x)
x = self.hidden3(x)
x = self.out(x)
return x
and the loss function:
loss_fn = nn.MSELoss(size_average=True, reduce=True)
As we stated before, we already had far more satisfying results in keras with tensorflow backend. The loss function was around 30, with a similar network configuration. I share the essential parts(!) of our keras code here:
model = Sequential()
model.add(Dense(4, activation="linear", input_shape=(2,)))
model.add(Dense(16, activation="relu"))
model.add(Dense(4, activation="relu"))
model.add(Dense(1, activation="linear" ))
model.summary()
model.compile ( loss="mean_squared_error", optimizer="adam", metrics=["mse"] )
history=model.fit ( np.array(tr_data), np.array(tr_labels), \
validation_data = ( np.array(val_data), np.array(val_labels) ),
batch_size=50, epochs=200, callbacks = [ cbk ] )
Thank your already for all the help! If anybody still has suggestions to improve the network, we would be happy about it. As somebody already asked for the data, we want to share a pickle file here:
https://mega.nz/#!RDYxSYLY!P4a9mEDtZ7A5Bl7ZRjRk8EzLXQt2gyURa3wN3NCWFPA
together with the code to access it:
import pickle
f=open("data.pcl","rb")
tr_data=pickle.load ( f )
tr_labels=pickle.load ( f )
val_data=pickle.load ( f )
val_labels=pickle.load ( f )
f.close()
It should be interesting for you to point out the differences between torch.nn and torch.nn.functional (see here). Essentially, it might be that your backpropagation graph might be executed not 100% correct due to a different specification.
As pointed out by previous commenters, I would suggest to define your layers including the activations. My personal favorite way is to use nn.Sequential(), which allows you to specify multiple opeations chained together, like so:
self.hidden1 = nn.Sequential(nn.Linear(DIM_IN, DIM_HIDDEN1), nn.ReLU())
and then simply calling self.hidden1 later (without wrapping it in F.relu()).
May I also ask why you do not call the commented super(Net, self).__init__() (which is the generally recommended way)?
Additionally, if that should not fix the problem, can you maybe just share the code for Keras in comparison?

3-layered Neural network doesen't learn properly

So, I'm trying to implement a neural network with 3 layers in python, however I am not the brightest person so anything with more then 2 layers is kinda difficult for me. The problem with this one is that it gets stuck at .5 and does not learn I have no actual clue where it went wrong. Thank you for anyone with the patience to explain the error to me. (I hope the code makes sense)
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=[np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]
output=[0,1,1,0,1]
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
for justanumber in range(1000):
for i in range(len(l0)):
l1=sigmoid(np.dot(l0[i],syn0))
l2=sigmoid(np.dot(l1,syn1))
l2_err=output[i]-l2
l2_delta=reduce(l2_err)
l1_err=syn1*l2_delta
l1_delta=reduce(l1_err)
syn1=syn1.T
syn1+=l0[i].T*l2_delta
syn1=syn1.T
syn0=syn0.T
syn0+=l0[i].T*l1_delta
syn0=syn0.T
print l2
PS. I know that it might be a piece of trash as a script but that is why I asked for assistance
Your computations are not fully correct. For example, the reduce is called on the l1_err and l2_err, where it should be called on l1 and l2.
You are performing stochastic gradient descent. In this case with such few parameters, it oscilates hugely. In this case use a full batch gradient descent.
The bias units are not present. Although you can still learn without bias, technically.
I tried to rewrite your code with minimal changes. I have commented your lines to show the changes.
#!/usr/bin/python3
import matplotlib.pyplot as plt
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=np.array ([np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]);
output=np.array ([[0],[1],[1],[0],[1]]);
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
final_err = list ();
gamma = 0.05
maxiter = 100000
for justanumber in range(maxiter):
syn0_del = np.zeros_like (syn0);
syn1_del = np.zeros_like (syn1);
l2_err_sum = 0;
for i in range(len(l0)):
this_data = l0[i,np.newaxis];
l1=sigmoid(np.matmul(this_data,syn0))[:]
l2=sigmoid(np.matmul(l1,syn1))[:]
l2_err=(output[i,:]-l2[:])
#l2_delta=reduce(l2_err)
l2_delta=np.dot (reduce(l2), l2_err)
l1_err=np.dot (syn1, l2_delta)
#l1_delta=reduce(l1_err)
l1_delta=np.dot(reduce(l1), l1_err)
# Accumulate gradient for this point for layer 1
syn1_del += np.matmul(l2_delta, l1).T;
#syn1=syn1.T
#syn1+=l1.T*l2_delta
#syn1=syn1.T
# Accumulate gradient for this point for layer 0
syn0_del += np.matmul(l1_delta, this_data).T;
#syn0=syn0.T
#syn0-=l0[i,:].T*l1_delta
#syn0=syn0.T
# The error for this datpoint. Mean sum of squares
l2_err_sum += np.mean (l2_err ** 2);
l2_err_sum /= l0.shape[0]; # Mean sum of squares
syn0 += gamma * syn0_del;
syn1 += gamma * syn1_del;
print ("iter: ", justanumber, "error: ", l2_err_sum);
final_err.append (l2_err_sum);
# Predicting
l1=sigmoid(np.matmul(l0,syn0))[:]# 1 x d * d x 4 = 1 x 4;
l2=sigmoid(np.matmul(l1,syn1))[:] # 1 x 4 * 4 x 1 = 1 x 1
print ("Predicted: \n", l2)
print ("Actual: \n", output)
plt.plot (np.array (final_err));
plt.show ();
The output I get is:
Predicted:
[[0.05214011]
[0.97596354]
[0.97499515]
[0.03771324]
[0.97624119]]
Actual:
[[0]
[1]
[1]
[0]
[1]]
Therefore the network was able to predict all the toy training examples. (Note in real data you would not like to fit the data at its best as it leads to overfitting). Note that you may get a bit different result, as the weight initialisations are different. Also, try to initialise the weight between [-0.01, +0.01] as a rule of thumb, when you are not working on a specific problem and you specifically know the initialisation.
Here is the convergence plot.
Note that you do not need to actually iterate over each example, instead you can do matrix multiplication at once, which is much faster. Also, the above code does not have bias units. Make sure you have bias units when you re-implement the code.
I would recommend you go through the Raul Rojas' Neural Networks, a Systematic Introduction, Chapter 4, 6 and 7. Chapter 7 will tell you how to implement deeper networks in a simple way.

Trying to balance my dataset through sample_weight in scikit-learn

I'm using RandomForest for classification, and I got an unbalanced dataset, as: 5830-no, 1006-yes. I try to balance my dataset with class_weight and sample_weight, but I can`t.
My code is:
X_train,X_test,y_train,y_test = train_test_split(arrX,y,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
But I don't get any improvement on my ratios TPR, FPR, ROC when using class_weight and sample_weight.
Why? Am I doing anything wrong?
Nevertheless, if I use the function called balanced_subsample, my ratios obtain a great improvement:
def balanced_subsample(x,y,subsample_size):
class_xs = []
min_elems = None
for yi in np.unique(y):
elems = x[(y == yi)]
class_xs.append((yi, elems))
if min_elems == None or elems.shape[0] < min_elems:
min_elems = elems.shape[0]
use_elems = min_elems
if subsample_size < 1:
use_elems = int(min_elems*subsample_size)
xs = []
ys = []
for ci,this_xs in class_xs:
if len(this_xs) > use_elems:
np.random.shuffle(this_xs)
x_ = this_xs[:use_elems]
y_ = np.empty(use_elems)
y_.fill(ci)
xs.append(x_)
ys.append(y_)
xs = np.concatenate(xs)
ys = np.concatenate(ys)
return xs,ys
My new code is:
X_train_subsampled,y_train_subsampled=balanced_subsample(arrX,y,0.5)
X_train,X_test,y_train,y_test = train_test_split(X_train_subsampled,y_train_subsampled,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
This is not a full answer yet, but hopefully it'll help get there.
First some general remarks:
To debug this kind of issue it is often useful to have a deterministic behavior. You can pass the random_state attribute to RandomForestClassifier and various scikit-learn objects that have inherent randomness to get the same result on every run. You'll also need:
import numpy as np
np.random.seed()
import random
random.seed()
for your balanced_subsample function to behave the same way on every run.
Don't grid search on n_estimators: more trees is always better in a random forest.
Note that sample_weight and class_weight have a similar objective: actual sample weights will be sample_weight * weights inferred from class_weight.
Could you try:
Using subsample=1 in your balanced_subsample function. Unless there's a particular reason not to do so we're better off comparing the results on similar number of samples.
Using your subsampling strategy with class_weight and sample_weight both set to None.
EDIT: Reading your comment again I realize your results are not so surprising!
You get a better (higher) TPR but a worse (higher) FPR.
It just means your classifier tries hard to get the samples from class 1 right, and thus makes more false positives (while also getting more of those right of course!).
You will see this trend continue if you keep increasing the class/sample weights in the same direction.
There is a imbalanced-learn API that helps with oversampling/undersampling data that might be useful in this situation. You can pass your training set into one of the methods and it will output the oversampled data for you. See simple example below
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
x_oversampled, y_oversampled = ros.fit_sample(orig_x_data, orig_y_data)
Here it the link to the API: http://contrib.scikit-learn.org/imbalanced-learn/api.html
Hope this helps!