Why scipy.griddata is much slower than matlab's griddata? - matlab

I could find some existing topics about this but somehow I could not find an answer...
Here is a python example taken from https://gist.github.com/fjarri/b6f1faefa95995d119b8 (already used in Why is scipy.interpolate.griddata so slow?), giving
Python:
import time
import numpy as np
from scipy.interpolate import griddata
def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
grid_x, grid_y = np.mgrid[0:1:200j, 0:1:200j]
points = np.random.rand(410500, 2)
values = func(points[:,0], points[:,1])
t1 = time.time()
grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear')
print(time.time() - t1)
the print gives always about 6.4secs
Matlab :
[grid_x, grid_y] = meshgrid(1:200, 1:200);
points = rand(410500, 2);
x=points(:,1);
y=points(:,2);
values = x.*(1-x).*cos(4*pi*x).*sin(4*pi*y.^2).^2;
tic;vq = griddata(x,y,values,grid_x,grid_y,'linear');toc;
the print gives always about 2.4secs.
Someone knows why is there such a big difference between the two software ? And if there is a solution to accelerate scipy.griddata ? I deal with many large arrays of scattered points and griddata is responsible for most of my computation time... but I need to use python for this and it is very slow.

Related

GPflow change point kernel issue with multiple dimensions

I'm following the tutorial here for implementing a change point kernel in gpflow.
However, I have 3 inputs and 1 output and I would like the changepoint kernel to be on the first input dimension only and other standard kernels to be on the other two input dimensions. I'm getting the following error :
InvalidArgumentError: Incompatible shapes: [2000,3,1] vs. [3,2000,1] [Op:Mul] name: mul/
Below is a minimum working example. Could anyone please let me know where I'm going wrong?
gpflow version 2.0.0.rc1
import pandas as pd
import gpflow
from gpflow.utilities import print_summary
df_all = pd.read_csv(
'https://raw.githubusercontent.com/ipan11/gp/master/dataset.csv')
# Training dataset in numpy format
X = df_all[['X1', 'X2', 'X3']].to_numpy()
Y1 = df_all['Y'].to_numpy().reshape(-1, 1)
# Changepoint kernel only on first dimension and standard kernels for the other two dimensions
base_k1 = gpflow.kernels.Matern32(lengthscale=0.2, active_dims=[0])
base_k2 = gpflow.kernels.Matern32(lengthscale=2., active_dims=[0])
k1 = gpflow.kernels.ChangePoints(
[base_k1, base_k2], [.4], steepness=5)
k2 = gpflow.kernels.Matern52(lengthscale=[1., 1.], active_dims=[1, 2])
k_all = k1+k2
print_summary(k_all)
m1 = gpflow.models.GPR(data=(X, Y1), kernel=k_all, mean_function=None)
print_summary(m1)
opt = gpflow.optimizers.Scipy()
def objective_closure():
return -m1.log_marginal_likelihood()
opt_logs = opt.minimize(objective_closure, m1.trainable_variables,
options=dict(maxiter=100))
The correct answer would be to move the active_dims=[0] from the base_k* kernels to the ChangePoints() kernel,
k1 = gpflow.kernels.ChangePoints([base_k1, base_k2], [0.4], steepness=5, active_dims=[0])
but this is currently not supported in GPflow 2, which is a bug. I've opened an issue on github, and will update this answer once it's fixed (if you feel up to having a go at fixing this bug, feel free to open a pull request, help always welcome!).

How to implement exponentially decay learning rate in Keras by following the global steps

Look at the following example
# encoding: utf-8
import numpy as np
import pandas as pd
import random
import math
from keras import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import Adam, RMSprop
from keras.callbacks import LearningRateScheduler
X = [i*0.05 for i in range(100)]
def step_decay(epoch):
initial_lrate = 1.0
drop = 0.5
epochs_drop = 2.0
lrate = initial_lrate * math.pow(drop,
math.floor((1+epoch)/epochs_drop))
return lrate
def build_model():
model = Sequential()
model.add(Dense(32, input_shape=(1,), activation='relu'))
model.add(Dense(1, activation='linear'))
adam = Adam(lr=0.5)
model.compile(loss='mse', optimizer=adam)
return model
model = build_model()
lrate = LearningRateScheduler(step_decay)
callback_list = [lrate]
for ep in range(20):
X_train = np.array(random.sample(X, 10))
y_train = np.sin(X_train)
X_train = np.reshape(X_train, (-1,1))
y_train = np.reshape(y_train, (-1,1))
model.fit(X_train, y_train, batch_size=2, callbacks=callback_list,
epochs=1, verbose=2)
In this example, the LearningRateSchedule does not change the learning rate at all because in each iteration of ep, epoch=1. Thus the learning rate is just const (1.0, according to step_decay). In fact, instead of setting epoch>1 directly, I have to do outer loop as shown in the example, and insider each loop, I just run 1 epoch. (This is the case when I implement deep reinforcement learning, instead of supervised learning).
My question is how to set an exponentially decay learning rate in my example and how to get the learning rate in each iteration of ep.
You can actually pass two arguments to the LearningRateScheduler.
According to Keras documentation, the scheduler is
a function that takes an epoch index as input (integer, indexed from
0) and current learning rate and returns a new learning rate as output
(float).
So, basically, simply replace your initial_lr with a function parameter, like so:
def step_decay(epoch, lr):
# initial_lrate = 1.0 # no longer needed
drop = 0.5
epochs_drop = 2.0
lrate = lr * math.pow(drop,math.floor((1+epoch)/epochs_drop))
return lrate
The actual function you implement is not exponential decay (as you mention in your title) but a staircase function.
Also, you mention your learning rate does not change inside your loop. That's true because you set model.fit(..., epochs=1,...) and your epochs_drop = 2.0 at the same time. I am not sure this is your desired case or not. You are providing a toy example and it's not clear in that case.
I would like to add the more common case where you don't mix a for loop with fit() and just provide a different epochs parameter in your fit() function. In this case you have the following options:
First of all keras provides a decaying functionality itself with the predefined optimizers. For example in your case Adam() the actual code is:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations, K.dtype(self.decay))))
which is not exactly exponential either and it's somehow different than tensorflow's one. Also, it's used only when decay > 0.0 as it's obvious.
To follow the tensorflow convention of exponential decay you should implement:
decayed_learning_rate = learning_rate * ^ (global_step / decay_steps)
Depending on your needs you could choose to implement a Callback subclass and define a function within it (see 3rd bullet below) or use LearningRateScheduler which is actually exactly this with some checking: a Callback subclass which updates the learning rate at each epoch end.
If you want a finer handling of your learning rate policy (per batch for example) you would have to implement your subclass since as far as I know there is no implemented subclass for this task. The good part is that it's super easy:
Create a subclass
class LearningRateExponentialDecay(Callback):
and add the __init__() function which will initialize your instance with all needed parameters and also create a global_step variables to keep track of the iterations (batches):
def __init__(self, init_learining_rate, decay_rate, decay_steps):
self.init_learining_rate = init_learining_rate
self.decay_rate = decay_rate
self.decay_steps = decay_steps
self.global_step = 0
Finally, add the actual function inside the class:
def on_batch_begin(self, batch, logs=None):
actual_lr = float(K.get_value(self.model.optimizer.lr))
decayed_learning_rate = actual_lr * self.decay_rate ^ (self.global_step / self.decay_steps)
K.set_value(self.model.optimizer.lr, decayed_learning_rate)
self.global_step += 1
The really cool part is the if you want the above subclass to update every epoch you could use on_epoch_begin(self, epoch, logs=None) which nicely has epoch as parameter to it's signature. This case is even easier as you could skip global step altogether (no need to keep track of it now unless you want a fancier way to apply your decay) and use epoch in it's place.

Pytorch: NN function approximator, 2 in 1 out

[Please be aware of the Edit History below, as the major problem statement has changed.]
We are trying to implement a neural network in pytorch, that approximates a function f(x,y)=z. So there are two real numbers as input and one as ouput, we therefore want 2 nodes in the input layer and one in the output layer. We constructed a test set of 5050 samples and had pretty good results for that task in Keras with Tensorflow backend, with 3 hidden layers with a configuration of the nodes like: 2(in) - 4 - 16 - 4 - 1(out); and ReLU activation functions on all hidden layers, linear on in- and output.
Now in Pytorch we tried to implement a similar network but our loss function still literally explodes: It changes in the first few steps and converges then to some value around 10^7. In Keras we had an error around 10 percent. We already tried different network configurations without any improvement. Maybe someone could have a look on our code and suggest any change?
To explain: tr_data is a list, containing 5050 2*1 numpy arrays which are the inputs for the network. tr_labels is a list, containing 5050 numbers which are the outputs we want to learn. loadData() just load those two lists.
import torch.nn as nn
import torch.nn.functional as F
BATCH_SIZE = 5050
DIM_IN = 2
DIM_HIDDEN_1 = 4
DIM_HIDDEN_2 = 16
DIM_HIDDEN_3 = 4
DIM_OUT = 1
LEARN_RATE = 1e-4
EPOCH_NUM = 500
class Net(nn.Module):
def __init__(self):
#super(Net, self).__init__()
super().__init__()
self.hidden1 = nn.Linear(DIM_IN, DIM_HIDDEN_1)
self.hidden2 = nn.Linear(DIM_HIDDEN_1, DIM_HIDDEN_2)
self.hidden3 = nn.Linear(DIM_HIDDEN_2, DIM_HIDDEN_3)
self.out = nn.Linear(DIM_HIDDEN_3, DIM_OUT)
def forward(self, x):
x = F.relu(self.hidden1(x))
x = F.tanh(self.hidden2(x))
x = F.tanh(self.hidden3(x))
x = self.out(x)
return x
model = Net()
loss_fn = nn.MSELoss(size_average=False)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARN_RATE)
tr_data,tr_labels = loadData()
tr_data_torch = torch.zeros(BATCH_SIZE, DIM_IN)
tr_labels_torch = torch.zeros(BATCH_SIZE, DIM_OUT)
for i in range(BATCH_SIZE):
tr_data_torch[i] = torch.from_numpy(tr_data[i])
tr_labels_torch[i] = tr_labels[i]
for t in range(EPOCH_NUM):
labels_pred = model(tr_data_torch)
loss = loss_fn(labels_pred, tr_labels_torch)
#print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
I have to say, those are our first steps in Pytorch, so please forgive me if there are some obvious, dumb mistakes. I appreciate any help or hint,
Thank you!
EDIT 1 ------------------------------------------------------------------
Following the comments and answers, we improved our code. The Loss function has now for the first time reasonable values, around 250. Our new class definition looks like:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
#super().__init__()
self.hidden1 = nn.Sequential(nn.Linear(DIM_IN, DIM_HIDDEN_1), nn.ReLU())
self.hidden2 = nn.Sequential(nn.Linear(DIM_HIDDEN_1, DIM_HIDDEN_2), nn.ReLU())
self.hidden3 = nn.Sequential(nn.Linear(DIM_HIDDEN_2, DIM_HIDDEN_3), nn.ReLU())
self.out = nn.Linear(DIM_HIDDEN_3, DIM_OUT)
def forward(self, x):
x = self.hidden1(x)
x = self.hidden2(x)
x = self.hidden3(x)
x = self.out(x)
return x
and the loss function:
loss_fn = nn.MSELoss(size_average=True, reduce=True)
As we stated before, we already had far more satisfying results in keras with tensorflow backend. The loss function was around 30, with a similar network configuration. I share the essential parts(!) of our keras code here:
model = Sequential()
model.add(Dense(4, activation="linear", input_shape=(2,)))
model.add(Dense(16, activation="relu"))
model.add(Dense(4, activation="relu"))
model.add(Dense(1, activation="linear" ))
model.summary()
model.compile ( loss="mean_squared_error", optimizer="adam", metrics=["mse"] )
history=model.fit ( np.array(tr_data), np.array(tr_labels), \
validation_data = ( np.array(val_data), np.array(val_labels) ),
batch_size=50, epochs=200, callbacks = [ cbk ] )
Thank your already for all the help! If anybody still has suggestions to improve the network, we would be happy about it. As somebody already asked for the data, we want to share a pickle file here:
https://mega.nz/#!RDYxSYLY!P4a9mEDtZ7A5Bl7ZRjRk8EzLXQt2gyURa3wN3NCWFPA
together with the code to access it:
import pickle
f=open("data.pcl","rb")
tr_data=pickle.load ( f )
tr_labels=pickle.load ( f )
val_data=pickle.load ( f )
val_labels=pickle.load ( f )
f.close()
It should be interesting for you to point out the differences between torch.nn and torch.nn.functional (see here). Essentially, it might be that your backpropagation graph might be executed not 100% correct due to a different specification.
As pointed out by previous commenters, I would suggest to define your layers including the activations. My personal favorite way is to use nn.Sequential(), which allows you to specify multiple opeations chained together, like so:
self.hidden1 = nn.Sequential(nn.Linear(DIM_IN, DIM_HIDDEN1), nn.ReLU())
and then simply calling self.hidden1 later (without wrapping it in F.relu()).
May I also ask why you do not call the commented super(Net, self).__init__() (which is the generally recommended way)?
Additionally, if that should not fix the problem, can you maybe just share the code for Keras in comparison?

Trying to balance my dataset through sample_weight in scikit-learn

I'm using RandomForest for classification, and I got an unbalanced dataset, as: 5830-no, 1006-yes. I try to balance my dataset with class_weight and sample_weight, but I can`t.
My code is:
X_train,X_test,y_train,y_test = train_test_split(arrX,y,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
But I don't get any improvement on my ratios TPR, FPR, ROC when using class_weight and sample_weight.
Why? Am I doing anything wrong?
Nevertheless, if I use the function called balanced_subsample, my ratios obtain a great improvement:
def balanced_subsample(x,y,subsample_size):
class_xs = []
min_elems = None
for yi in np.unique(y):
elems = x[(y == yi)]
class_xs.append((yi, elems))
if min_elems == None or elems.shape[0] < min_elems:
min_elems = elems.shape[0]
use_elems = min_elems
if subsample_size < 1:
use_elems = int(min_elems*subsample_size)
xs = []
ys = []
for ci,this_xs in class_xs:
if len(this_xs) > use_elems:
np.random.shuffle(this_xs)
x_ = this_xs[:use_elems]
y_ = np.empty(use_elems)
y_.fill(ci)
xs.append(x_)
ys.append(y_)
xs = np.concatenate(xs)
ys = np.concatenate(ys)
return xs,ys
My new code is:
X_train_subsampled,y_train_subsampled=balanced_subsample(arrX,y,0.5)
X_train,X_test,y_train,y_test = train_test_split(X_train_subsampled,y_train_subsampled,test_size=0.25)
cw='auto'
clf=RandomForestClassifier(class_weight=cw)
param_grid = { 'n_estimators': [10,50,100,200,300],'max_features': ['auto', 'sqrt', 'log2']}
sw = np.array([1 if i == 0 else 8 for i in y_train])
CV_clf = GridSearchCV(estimator=clf, param_grid=param_grid, cv= 10,fit_params={'sample_weight': sw})
This is not a full answer yet, but hopefully it'll help get there.
First some general remarks:
To debug this kind of issue it is often useful to have a deterministic behavior. You can pass the random_state attribute to RandomForestClassifier and various scikit-learn objects that have inherent randomness to get the same result on every run. You'll also need:
import numpy as np
np.random.seed()
import random
random.seed()
for your balanced_subsample function to behave the same way on every run.
Don't grid search on n_estimators: more trees is always better in a random forest.
Note that sample_weight and class_weight have a similar objective: actual sample weights will be sample_weight * weights inferred from class_weight.
Could you try:
Using subsample=1 in your balanced_subsample function. Unless there's a particular reason not to do so we're better off comparing the results on similar number of samples.
Using your subsampling strategy with class_weight and sample_weight both set to None.
EDIT: Reading your comment again I realize your results are not so surprising!
You get a better (higher) TPR but a worse (higher) FPR.
It just means your classifier tries hard to get the samples from class 1 right, and thus makes more false positives (while also getting more of those right of course!).
You will see this trend continue if you keep increasing the class/sample weights in the same direction.
There is a imbalanced-learn API that helps with oversampling/undersampling data that might be useful in this situation. You can pass your training set into one of the methods and it will output the oversampled data for you. See simple example below
from imblearn.over_sampling import RandomOverSampler
ros = RandomOverSampler(random_state=1)
x_oversampled, y_oversampled = ros.fit_sample(orig_x_data, orig_y_data)
Here it the link to the API: http://contrib.scikit-learn.org/imbalanced-learn/api.html
Hope this helps!

numba numpy array slicing is too slow?

i'm a user of numba, could someone tell me why the slice of numpy array is so slow, here is an example:
def pairwise_python2(X):
n_samples = X.shape[0]
result = np.zeros((n_samples, n_samples), dtype=X.dtype)
for i in xrange(X.shape[0]):
for j in xrange(X.shape[0]):
result[i, j] = np.sqrt(np.sum((X[i, :] - X[j, :]) ** 2))
return result
%timeit pairwise_python2(X)
1 loops, best of 3: 18.2 s per loop
from numba import double
from numba.decorators import jit, autojit
pairwise_numba = autojit(pairwise_python)
%timeit pairwise_numba(X)
1 loops, best of 3: 13.9 s per loop
it seems there is no difference between jit and cpython version, am i wrong?
You're timing numpy memory allocations.
X[i,:] - X[j,:] generates a new matrix of shape(n_samples, n_samples), as does the square operation. Try something like the following instead:
def pairwise_python2(X):
n_samples = X.shape[0]
result = np.empty((n_samples, n_samples), dtype=X.dtype)
temp = np.empty((n_samples,), dtype=X.dtype)
for i in xrange(n_samples):
slice = X[i,:]
for j in xrange(n_samples):
result[i,j] = np.sqrt(np.sum(np.power(np.subtract(slice,X[j,:],temp),2.0,temp)))
return result
Numba doesn't add a whole lot to this because you're doing all of your operations in numpy (it will speed up the loop iterations though, which was seen in your timing function).
The new version of numba has a support for a numpy array slicing and np.sqrt() function. So, this question can be closed.