simple linear regression with Python with 10 lines of code - linear-regression

I am doing first steps in machine learning. Firstly, I try to create a simple algorithm, for example, linear regression of two variables. So, this manual (https://towardsdatascience.com/linear-regression-using-gradient-descent-in-10-lines-of-code-642f995339c0) is best example of coding this one. When I transfer this code, but it does not work. More correct, it prints unreal parameters of regression. Please, help me to overcome the problem. The script below.
x_1 = range(1,100)
y_1 = range(1,100)
N = float(len(y_1))
epochs=1000
m_current = b_current = 0
learning_rate=0.01
for i in range(epochs):
for X,y in zip(x_1, y_1):
y_current = (m_current * X) + b_current
cost = (y-y_current)/N
m_gradient = -(2/N)*(X * (y - y_current))
b_gradient = -(2/N)*(y - y_current)
m_current = m_current - (learning_rate * m_gradient)
b_current = b_current - (learning_rate * b_gradient)
print(m_current)
print(b_current)
print(cost)
/*print
1.9999 i excpect 0.9999999 or 1
9.2333 i excpect 0.00000001 or 0
101.11 i excpect 0.1
*/

Related

Function fzero in Matlab is not converging

I am solving a problem in my Macroeconomics class. Consider the following equation:
Here, k is fixed and c(k) was defined through the ```interp1''' function in Matlab. Here is my code:
beta = 0.98;
delta = 0.13;
A = 2;
alpha = 1/3;
n_grid = 1000; % Number of points for capital
k_grid = linspace(5, 15, n_grid)';
tol = 1e-5;
max_it = 1000;
c0 = ones(n_grid, 1);
new_k = zeros(n_grid, 1);
dist_c = tol + 1;
it_c = 0;
while dist_c > tol && it_c < max_it
c_handle = #(k_tomorrow) interp1(k_grid, c0, k_tomorrow, 'linear', 'extrap');
for i=1:n_grid
% Solve for k'
euler = #(k_tomorrow) (1/((1-delta)* k_grid(i) + A * k_grid(i)^alpha - k_tomorrow)) - beta*(1-delta + alpha*A*k_tomorrow^(alpha - 1))/c_handle(k_prime);
new_k(i) = fzero(euler, k_grid(i)); % What's a good guess for fzero?
end
% Compute new values for consumption
new_c = A*k_grid.^alpha + (1-delta)*k_grid - new_k;
% Check convergence
dist_c = norm(new_c - c0);
c0 = new_c;
it_c = it_c + 1;
end
When I run this code, for some indexes $i$, it runs fine and fzero can find the solution. But for indexes it just returns NaN and exits without finding the root. This is a somewhat well-behaved problem in Economics and the solution we are looking indeed exists and the algorithm I tried to implement is guaranteed to work. But I don't have much experience with solving this in MATLAB and I guess I have a silly mistake somewhere. Any ideas on how to procede?
This is the typical error message:
Exiting fzero: aborting search for an interval containing a sign change
because complex function value encountered during search.
(Function value at -2.61092 is 0.74278-0.30449i.)
Check function or try again with a different starting value.
Thanks a lot in advance!
The only term that can produce complex numbers is
k'^(alpha - 1) = k'^(-2/3)
You probably want the result according to the real variant of the cube root, which you could get as
sign(k') * abs(k')^(-2/3)
or more generally and avoiding divisions by zero
k' * (1e-16+abs(k'))^(alpha - 2)

How to convert deep learning gradient descent equation into python

I've been following an online tutorial on deep learning. It has a practical question on gradient descent and cost calculations where I been struggling to get the given answers once it was converted to python code. Hope you can kindly help me get the correct answer please
Please see the following link for the equations used
Click here to see the equations used for the calculations
Following is the function given to calculate the gradient descent,cost etc. The values need to be found without using for loops but using matrix manipulation operations
import numpy as np
def propagate(w, b, X, Y):
"""
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size
(1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = # compute activation
cost = # compute cost
### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw =
db =
### END CODE HERE ###
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
Following are the data given to test the above function
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]),
np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
Following is the expected output of the above
Expected Output:
dw [[ 0.99993216] [ 1.99980262]]
db 0.499935230625
cost 6.000064773192205
For the above propagate function I have used the below replacements, but the output is not what is expected. Please kindly help on how to get the expected output
A = sigmoid(X)
cost = -1*((np.sum(np.dot(Y,np.log(A))+np.dot((1-Y),(np.log(1-A))),axis=0))/m)
dw = (np.dot(X,((A-Y).T)))/m
db = np.sum((A-Y),axis=0)/m
Following is the sigmoid function used to calculate the Activation:
def sigmoid(z):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1 / (1+np.exp(-z))
### END CODE HERE ###
return s
Hope someone could help me understand how to solve this as I couldn't continue with rest of the tutorials without understanding this. Many thanks
You can calculate A,cost,dw,db as the following:
A = sigmoid(np.dot(w.T,X) + b)
cost = -1 / m * np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
dw = 1/m * np.dot(X,(A-Y).T)
db = 1/m * np.sum(A-Y)
where sigmoid is :
def sigmoid(z):
s = 1 / (1 + np.exp(-z))
return s
After going through the code and notes a few times was finally able to figure out the error.
First it needs calculating Z and then pass it to the sigmoid function, instead of X
Formula for Z = w(T)X+b. So in python this is calculated as below
Z=np.dot(w.T,X)+b
Then calculate A by passing z to sigmoid function
A = sigmoid(Z)
Then dw can be calculated as below
dw=np.dot(X,(A-Y).T)/m
Calculation of the other variables; cost and derivative of b will be as follows
cost = -1*((np.sum((Y*np.log(A))+((1-Y)*(np.log(1-A))),axis=1))/m)
db = np.sum((A-Y),axis=1)/m
def sigmoid(x):
#You have it right
return 1/(1 + np.exp(-x))
def derivSigmoid(x):
return sigmoid(x) * (1 - sigmoid(x))
error = targetSample - output
#Make sure to keep the sigmoided value around. For instance, an output that has already been sigmoided can be used to get the sigmoid derivative faster (output = sigmoid(x)):
dOutput = output * (1 - output)
Looks like you're already working on the backprop. Just thought I'd help simplify some of the forward prop for you.

How Can I change Theano gradients during backpropagation wrt the current output of the network?

I am trying to code up an example of the Inverting Gradient method from DEEP REINFORCEMENT LEARNING IN PARAMETERIZED ACTION SPACE (equation 11) in Lasagne/Theano. Basically what I am trying to do is ensure the output of the network is within some specified bounds, in this case [1,-1].
I have been looking at the example given here that inverts the gradient which has helped but at this point I am stuck. I think the best place to perform this operation is in the gradient computation method so I copied rmsprop and am trying to edit the gradients before the updates are applied.
This is what I have so far
def rmspropWithInvert(loss_or_grads, params, p, learning_rate=1.0, rho=0.9, epsilon=1e-6):
clip = 2.0
grads = lasagne.updates.get_or_compute_grads(loss_or_grads, params)
# grads = theano.gradient.grad_clip(grads, -clip, clip)
grads_ = []
for grad in grads:
grads_.append(theano.gradient.grad_clip(grad, -clip, clip) )
grads = grads_
a, p_ = T.scalars('a', 'p_')
z_lazy = ifelse(T.gt(a,0.0), (1.0-p_)/(2.0), (p_-(-1.0))/(2.0))
f_lazyifelse = theano.function([a,p_], z_lazy,
mode=theano.Mode(linker='vm'))
# compute the parameter vector to invert the gradients by
ps = theano.shared(
np.zeros((3, 1), dtype=theano.config.floatX),
broadcastable=(False, True))
for i in range(3):
ps[i] = f_lazyifelse(grads[-1][i], p[i])
# Apply vector through computed gradients
grads2=[]
for grad in grads.reverse():
grads2.append(theano.mul(ps, grad))
ps = grad
grads = grads2.reverse()
print "Grad Update: " + str(grads[0])
updates = OrderedDict()
# Using theano constant to prevent upcasting of float32
one = T.constant(1)
for param, grad in zip(params, grads):
value = param.get_value(borrow=True)
accu = theano.shared(np.zeros(value.shape, dtype=value.dtype),
broadcastable=param.broadcastable)
accu_new = rho * accu + (one - rho) * grad ** 2
updates[accu] = accu_new
updates[param] = param - (learning_rate * grad /
T.sqrt(accu_new + epsilon))
return updates
Maybe someone more skilled with Theano/Lasagne will see a solution? Conceptually I think the computation is easy but coding everything in the update step symbolically has proven challenging for me. I am still getting used to Theano.

How to monitor tensor values in Theano/Keras?

I know this question has been asked in various forms, but I can't really find any answer I can understand and use. So forgive me if this is a basic question, 'cause I'm a newbie to these tools(theano/keras)
Problem to Solve
Monitor variables in Neural Networks
(e.g. input/forget/output gate values in LSTM)
What I'm currently getting
no matter in which stage I'm getting those values, I'm getting something like :
Elemwise{mul,no_inplace}.0
Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0
Subtensor{int64}.0
Is there any way I can't monitor(e.g. print to stdout, write to a file, etc) them?
Possible Solution
Seems like callbacks in Keras can do the job, but it doesn't work either for me. I'm getting same thing as above
My Guess
Seems like I'm making very simple mistakes.
Thank you very much in advance, everyone.
ADDED
Specifically, I'm trying to monitor input/forget/output gating values in LSTM.
I found that LSTM.step() is for computing those values:
def step(self, x, states):
h_tm1 = states[0] # hidden state of the previous time step
c_tm1 = states[1] # cell state from the previous time step
B_U = states[2] # dropout matrices for recurrent units?
B_W = states[3] # dropout matrices for input units?
if self.consume_less == 'cpu': # just cut x into 4 pieces in columns
x_i = x[:, :self.output_dim]
x_f = x[:, self.output_dim: 2 * self.output_dim]
x_c = x[:, 2 * self.output_dim: 3 * self.output_dim]
x_o = x[:, 3 * self.output_dim:]
else:
x_i = K.dot(x * B_W[0], self.W_i) + self.b_i
x_f = K.dot(x * B_W[1], self.W_f) + self.b_f
x_c = K.dot(x * B_W[2], self.W_c) + self.b_c
x_o = K.dot(x * B_W[3], self.W_o) + self.b_o
i = self.inner_activation(x_i + K.dot(h_tm1 * B_U[0], self.U_i))
f = self.inner_activation(x_f + K.dot(h_tm1 * B_U[1], self.U_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1 * B_U[2], self.U_c))
o = self.inner_activation(x_o + K.dot(h_tm1 * B_U[3], self.U_o))
with open("test_visualization.txt", "a") as myfile:
myfile.write(str(i)+"\n")
h = o * self.activation(c)
return h, [h, c]
And as it's in the code above, I tried to write the value of i into a file, but it only gave me values like :
Elemwise{mul,no_inplace}.0
[for{cpu,scan_fn}.2, Subtensor{int64::}.0, Subtensor{int64::}.0]
Subtensor{int64}.0
So I tried i.eval() or i.get_value(), but both failed to give me values.
.eval() gave me this:
theano.gof.fg.MissingInputError: An input of the graph, used to compute Subtensor{::, :int64:}(<TensorType(float32, matrix)>, Constant{10}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.
and .get_value() gave me this:
AttributeError: 'TensorVariable' object has no attribute 'get_value'
So I backtracked those chains(which line calls which functions..) and tried to get values at every steps I found but in vain.
Feels like I'm in some basic pitfalls.
I use the solution described in the Keras FAQ:
http://keras.io/getting-started/faq/#how-can-i-visualize-the-output-of-an-intermediate-layer
In detail:
from keras import backend as K
intermediate_tensor_function = K.function([model.layers[0].input],[model.layers[layer_of_interest].output])
intermediate_tensor = intermediate_tensor_function([thisInput])[0]
yields:
array([[ 3., 17.]], dtype=float32)
However I'd like to use the functional API but I can't seem to get the actual tensor, only the symbolic representation. For example:
model.layers[1].output
yields:
<tf.Tensor 'add:0' shape=(?, 2) dtype=float32>
I'm missing something about the interaction of Keras and Tensorflow here but I'm not sure what. Any insight much appreciated.
One solution is to create a version of your network that is truncated at the LSTM layer of which you want to monitor the gate values, and then replace the original layer with a custom layer in which the stepfunction is modified to return not only the hidden layer values, but also the gate values.
For instance, say you want to access the access the gate values of a GRU. Create a custom layer GRU2 that inherits everything from the GRU class, but adapt the step function such that it returns a concatenation of the states you want to monitor, and then takes only the part containing the previous hidden layer activations when computing the next activations. I.e:
def step(self, x, states):
# get prev hidden layer from input that is concatenation of
# prev hidden layer + reset gate + update gate
x = x[:self.output_dim, :]
###############################################
# This is the original code from the GRU layer
#
h_tm1 = states[0] # previous memory
B_U = states[1] # dropout matrices for recurrent units
B_W = states[2]
if self.consume_less == 'gpu':
matrix_x = K.dot(x * B_W[0], self.W) + self.b
matrix_inner = K.dot(h_tm1 * B_U[0], self.U[:, :2 * self.output_dim])
x_z = matrix_x[:, :self.output_dim]
x_r = matrix_x[:, self.output_dim: 2 * self.output_dim]
inner_z = matrix_inner[:, :self.output_dim]
inner_r = matrix_inner[:, self.output_dim: 2 * self.output_dim]
z = self.inner_activation(x_z + inner_z)
r = self.inner_activation(x_r + inner_r)
x_h = matrix_x[:, 2 * self.output_dim:]
inner_h = K.dot(r * h_tm1 * B_U[0], self.U[:, 2 * self.output_dim:])
hh = self.activation(x_h + inner_h)
else:
if self.consume_less == 'cpu':
x_z = x[:, :self.output_dim]
x_r = x[:, self.output_dim: 2 * self.output_dim]
x_h = x[:, 2 * self.output_dim:]
elif self.consume_less == 'mem':
x_z = K.dot(x * B_W[0], self.W_z) + self.b_z
x_r = K.dot(x * B_W[1], self.W_r) + self.b_r
x_h = K.dot(x * B_W[2], self.W_h) + self.b_h
else:
raise Exception('Unknown `consume_less` mode.')
z = self.inner_activation(x_z + K.dot(h_tm1 * B_U[0], self.U_z))
r = self.inner_activation(x_r + K.dot(h_tm1 * B_U[1], self.U_r))
hh = self.activation(x_h + K.dot(r * h_tm1 * B_U[2], self.U_h))
h = z * h_tm1 + (1 - z) * hh
#
# End of original code
###########################################################
# concatenate states you want to monitor, in this case the
# hidden layer activations and gates z and r
all = K.concatenate([h, z, r])
# return everything
return all, [h]
(Note that the only lines I added are at the beginning and end of the function).
If you then run your network with GRU2 as last layer instead of GRU (with return_sequences = True for the GRU2 layer), you can just call predict on your network, this will give you all hidden layer and gate values.
The same thing should work for LSTM, although you might have to puzzle a bit to figure out how to store all the outputs you want in one vector and retrieve them again afterwards.
Hope that helps!
You can use theano's printing module for printing during execution (and not during definition, which is what you're doing and the reason why you're not getting values, but their abstract definition).
Print
Just use the Print function. Don't forget to use the output of Print to continue your graph, otherwise the output will be disconnected and Print will most likely be removed during optimisation. And you will see nothing.
from keras import backend as K
from theano.printing import Print
def someLossFunction(x, ref):
loss = K.square(x - ref)
loss = Print('Loss tensor (before sum)')(loss)
loss = K.sum(loss)
loss = Print('Loss scalar (after sum)')(loss)
return loss
Plot
A little bonus you might enjoy.
The Print class has a global_fn parameter, to override the default callback to print. You can provide your own function and directly access to the data, to build a plot for instance.
from keras import backend as K
from theano.printing import Print
import matplotlib.pyplot as plt
curve = []
# the callback function
def myPlottingFn(printObj, data):
global curve
# Store scalar data
curve.append(data)
# Plot it
fig, ax = plt.subplots()
ax.plot(curve, label=printObj.message)
ax.legend(loc='best')
plt.show()
def someLossFunction(x, ref):
loss = K.sum(K.square(x - ref))
# Callback is defined line below
loss = Print('Loss scalar (after sum)', global_fn=myplottingFn)(loss)
return loss
BTW the string you passed to Print('...') is stored in the print object under property name message (see function myPlottingFn). This is useful for building multi-curves plot automatically

Bayesian Lasso using PyMC3

I'm trying to reproduce the results of this tutorial (see LASSO regression) on PyMC3. As commented on this reddit thread, the mixing for the first two coefficients wasn't good because the variables are correlated.
I tried implementing it in PyMC3 but it didn't work as expected when using Hamiltonian samplers. I could only get it working with the Metropolis sampler, which achieves the same result as PyMC2.
I don't know if it's something related to the fact that the Laplacian is peaked (discontinuous derivative at 0), but it worked perfectly well with Gaussian priors. I tried with or without MAP initialization and the result is always the same.
Here is my code:
from pymc import *
from scipy.stats import norm
import pylab as plt
# Same model as the tutorial
n = 1000
x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)
y = 10 * x1 + 10 * x2 + 0.1 * x3
with Model() as model:
# Laplacian prior only works with Metropolis sampler
coef1 = Laplace('x1', 0, b=1/sqrt(2))
coef2 = Laplace('x2', 0, b=1/sqrt(2))
coef3 = Laplace('x3', 0, b=1/sqrt(2))
# Gaussian prior works with NUTS sampler
#coef1 = Normal('x1', mu = 0, sd = 1)
#coef2 = Normal('x2', mu = 0, sd = 1)
#coef3 = Normal('x3', mu = 0, sd = 1)
likelihood = Normal('y', mu= coef1 * x1 + coef2 * x2 + coef3 * x3, tau = 1, observed=y)
#step = Metropolis() # Works just like PyMC2
start = find_MAP() # Doesn't help
step = NUTS(state = start) # Doesn't work
trace = sample(10000, step, start = start, progressbar=True)
plt.figure(figsize=(7, 7))
traceplot(trace)
plt.tight_layout()
autocorrplot(trace)
summary(trace)
Here is the error I get:
PositiveDefiniteError: Simple check failed. Diagonal contains negatives
Am I doing something wrong or is the NUTS sampler not supposed to work on cases like this?
Whyking from the reddit thread gave the suggestion to use the MAP as scaling instead of the state and it actually worked wonders.
Here is a notebook with the results and the updated code.