I would like to integrate the equation of motion for the harmonic oscillator with a static and kinetic friction zone (as depicted in the following image).
For x < a the mass slides without friction over the surface. For x > a static and kinetic friction takes effect, with its respective coefficients of friction mu0 (static) and mu (kinetic). I tried to implement this with scipy.integrate.solve_ivp in a while-loop, making use of solve_ivp's event handling, stopping and restarting the integration each time one of the following events gets triggered:
the oscillator's velocity is zero (mass_stops in code below)
the oscillator reaches x=a and has positive velocity, i.e. entering the friction zone (entering_friction_zone)
the oscillator reaches x=a and has negative velocity, i.e. leaving the friction zone (leaving_friction_zone)
Using this approach, I faced the same problem as other users already did, documented here and here, namely that the initial conditions (inits) trigger an event and the while-loop gets stuck.
Using one of the workarounds, as described in the second thread, it kind of works but isn't as robust as I would like it to be. To be more specific, I now integrate the equations (initially and after each occurrence of an event) for a very short time interval (1e-13) and use the final state of this integration (sol_in_between) as initial condition for the "real integration" (sol). Of course I don't know beforehand how short the time interval needs to be for it to work. Too short of an interval results in an infinite loop (as before), too large of an interval and the results become crude.
Now I'm wondering if there is a more straight forward and more importantly a more robust way to integrate the given system, ideally using solve_ivp but I'm open for other suggestions.
import numpy as np
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt
import os, sys
def system(t,xv,params):
INPUT: current time t, current position x & velocity v in list xv = [x, v],
current system parameter in dict params
OUTPUT: xdot and vdot in list f = [xdot, vdot]"""
x, v = xv
m, c, g, r, muq = [params[par] for par in ['m', 'c', 'g', 'r', 'muq']]
return [v, -c*x/m - r*muq*g]
def mass_stops(t,xv,params):
"""Event function for event 'velocity zero'"""
x, v = xv
return v
mass_stops.terminal = True
mass_stops.direction = 0
def leaving_friction_zone(t,xv,params):
"""Event function for event 'mass crosses border with positive velocity', i.e. leaving friction zone"""
x, v = xv
return x-params['a']
leaving_friction_zone.terminal = True
leaving_friction_zone.direction = 1
def entering_friction_zone(t,xv,params):
"""Event function for event 'mass crosses border with negative velocity', i.e. entering friction zone"""
x, v = xv
return x-params['a']
entering_friction_zone.terminal = True
entering_friction_zone.direction = -1
def event_number(events):
"""Helper function for identifying which event got triggered"""
for i, event in enumerate(events):
if event.size > 0: return i
def integrate_oscillator(kinet_friction_coeff=.4, static_friction_coeff=.4, tend=20, m=20, c=300, a=.5, g=9.81, inits=[1, 0]):
mu = kinet_friction_coeff
mu0 = static_friction_coeff
params = {'m': m, 'c': c, 'g': g, 'a': a}
tspan = [0, tend]
params['r'] = -1 if inits[0] > 0 else 1
params['muq'] = mu if inits[0] > a else 0
if params['muq'] > 0:
print('no friction')
tout = list()
x = list()
v = list()
CONT = 1
while CONT == 1:
# we integrate the equation of motion for a short time interval without event handling, HOPING that the initial/last event doesn't get triggered again in sol
# we use the end state of this integration as initial condition for the "actual integration" and throw everything else away
sol_in_between = solve_ivp(system, [0, 1e-13], inits, args=(params,), rtol=1e-9, atol=1e-12)
inits = sol_in_between.y[:,-1]
sol = solve_ivp(system, tspan, inits, args=(params,), events=[mass_stops,leaving_friction_zone,entering_friction_zone], rtol=1e-9, atol=1e-12)
tout = np.concatenate([tout, sol.t])
x = np.concatenate([x, sol.y[0,:]]) #x.append(sol.y[0,:])
v = np.concatenate([v, sol.y[1,:]]) #y.append(sol.y[1,:])
tnow = tout[-1]
if tnow >= tend:
CONT = 0
tspan = [tnow, tend]
inits = [x[-1], v[-1]]
#CONT = 0
event_num = event_number(sol.y_events)
if event_num == 0: # Event: "velocity zero"
print('Velocity zero')
if np.abs(c*x[-1]) < mu0*m*g and x[-1] > a: # spring force is smaller than static friction force...
CONT = 0 # ... therefore stop!
print(f'Mass stopped at time t={tout[-1]}')
tout = np.concatenate([tout, np.array([tend])])
x = np.concatenate([x, np.array([x[-1]])])
v = np.concatenate([v, np.array([0])])
elif x[-1] - a < 0: # mass moves to the right
print('to the right')
params['r'] = 1 # change direction of kinetic friction force to the left
else: # mass moves to the left
print('to the left')
params['r'] = -1 # change direction of kinetic friction force to the right
elif event_num == 1: # entering friction zone
print('entering friction zone')
params['muq'] = mu
elif event_num == 2: # leaving friction zone
print('leaving friction zone')
params['muq'] = 0
return [tout,x,v]
[t,x,v] = integrate_oscillator(kinet_friction_coeff=.4,static_friction_coeff=.8,tend=50,inits=[1,0])
plt.rcParams['font.size'] = 16
fig, (ax1, ax2) = plt.subplots(2,figsize=(12,9),sharex=True )
ax1.plot([0,t[-1]], [.5, .5])
Python version: 3.8
Pytorch version: 1.9.0+cpu
Platform: Anaconda Spyder5.0
To reproduce this problem, just copy every code below to a single file.
The ILSVRC2012_val_00000293.jpg file used in this code is shown below, you also need to download it and then change its destination in the code.
Some background of this problem:
I am now working on a project that aims to develop a hardware accelerator to complete the inference process of the MobileNet V2 network. I used pretrained quantized Pytorch model to simulate the outcome, and the result comes out very well.
In order to use hardware to complete this task, I wish to know every inputs and outputs as well as intermidiate variables during runing this piece of pytorch code. I used a package named torchextractor to fetch the outcomes of first layer, which in this case, is a 3*3 convolution layer.
import numpy as np
import torchvision
import torch
from torchvision import transforms, datasets
from PIL import Image
from torchvision import transforms
import torchextractor as tx
import math
##### Processing of input image
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
test_transform = transforms.Compose([
preprocess = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
#image file destination
filename = "D:\Project_UM\MobileNet_VC709\MobileNet_pytorch\ILSVRC2012_val_00000293.jpg"
input_image =
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
#----First verify that the torchextractor class should not influent the inference outcome
# ofmp of layer1 before putting into torchextractor
a,b,c = quantize_tensor(input_batch)# to quantize the input tensor and return an int8 tensor, scale and zero point
input_qa = torch.quantize_per_tensor(torch.tensor(input_batch.clone().detach()), b, c, torch.quint8)# Using quantize_per_tensor method of torch
# Load a quantized mobilenet_v2 model
model_quantized = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
with torch.no_grad():
output = model_quantized.features[0][0](input_qa)# Ofmp of layer1, datatype : quantized_tensor
# print("FM of layer1 before tx_extractor:\n",output.int_repr())# Ofmp of layer1, datatype : int8 tensor
output1_clone = output.int_repr().detach().numpy()# Clone ofmp of layer1, datatype : ndarray
# ofmp of layer1 after adding torchextractor
model_quantized_ex = tx.Extractor(model_quantized, ["features.0.0"])#Capture of the module inside first layer
model_output, features = model_quantized_ex(input_batch)# Forward propagation
# feature_shapes = {name: f.shape for name, f in features.items()}
# print(features['features.0.0']) # Ofmp of layer1, datatype : quantized_tensor
out1_clone = features['features.0.0'].int_repr().numpy() # Clone ofmp of layer1, datatype : ndarray
if(out1_clone.all() == output1_clone.all()):
print('Model with torchextractor attached output the same value as the original model')
print('Torchextractor method influence the outcome')
Here I define a numpy quantization scheme based on the quantization scheme proposed by
Quantization and Training of Neural Networks for Efficient
Integer-Arithmetic-Only Inference
# Convert a normal regular tensor to a quantized tensor with scale and zero_point
def quantize_tensor(x, num_bits=8):# to quantize the input tensor and return an int8 tensor, scale and zero point
qmin = 0.
qmax = 2.**num_bits - 1.
min_val, max_val = x.min(), x.max()
scale = (max_val - min_val) / (qmax - qmin)
initial_zero_point = qmin - min_val / scale
zero_point = 0
if initial_zero_point < qmin:
zero_point = qmin
elif initial_zero_point > qmax:
zero_point = qmax
zero_point = initial_zero_point
# print(zero_point)
zero_point = int(zero_point)
q_x = zero_point + x / scale
q_x.clamp_(qmin, qmax).round_()
q_x = q_x.round().byte()
return q_x, scale, zero_point
# #############################################################################################
# --------- Simulate the inference process of layer0: conv33 using numpy
# #############################################################################################
# get the input_batch quantized buffer data
input_scale = b.item()
input_zero = c
input_quantized = a[0].detach().numpy()
# get the layer0 output scale and zero_point
output_scale = model_quantized.features[0][0].state_dict()['scale'].item()
output_zero = model_quantized.features[0][0].state_dict()['zero_point'].item()
# get the quantized weight with scale and zero_point
weight_scale = model_quantized.features[0][0].state_dict()["weight"].q_scale()
weight_zero = model_quantized.features[0][0].state_dict()["weight"].q_zero_point()
weight_quantized = model_quantized.features[0][0].state_dict()["weight"].int_repr().numpy()
# print(weight_quantized)
# print(weight_quantized.shape)
# bias_quantized,bias_scale,bias_zero= quantize_tensor(model_quantized.features[0][0].state_dict()["bias"])# to quantize the input tensor and return an int8 tensor, scale and zero point
# print(bias_quantized.shape)
bias = model_quantized.features[0][0].state_dict()["bias"].detach().numpy()
# print(input_quantized)
Then I write a quantized 2D convolution using numpy, hope to figure out every details in pytorch data flow during the inference.
#%% numpy simulated layer0 convolution function define
def conv_cal(input_quantized, weight_quantized, kernel_size, stride, out_i, out_j, out_k):
weight = weight_quantized[out_i]
input = np.zeros((input_quantized.shape[0], kernel_size, kernel_size))
for i in range(weight.shape[0]):
for j in range(weight.shape[1]):
for k in range(weight.shape[2]):
input[i][j][k] = input_quantized[i][stride*out_j+j][stride*out_k+k]
# print(,input))
# print(input,"\n")
# print(weight)
return np.multiply(weight,input).sum()
def QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, kernel_size, stride, padding, ofm_size):
output = np.zeros((weight_quantized.shape[0],ofm_size,ofm_size))
input_quantized_padding = np.full((input_quantized.shape[0],input_quantized.shape[1]+2*padding,input_quantized.shape[2]+2*padding),0)
zero_temp = np.full(input_quantized.shape,input_zero)
input_quantized = input_quantized - zero_temp
for i in range(input_quantized.shape[0]):
for j in range(padding,padding + input_quantized.shape[1]):
for k in range(padding,padding + input_quantized.shape[2]):
input_quantized_padding[i][j][k] = input_quantized[i][j-padding][k-padding]
zero_temp = np.full(weight_quantized.shape, weight_zero)
weight_quantized = weight_quantized - zero_temp
for i in range(output.shape[0]):
for j in range(output.shape[1]):
for k in range(output.shape[2]):
# output[i][j][k] = (weight_scale*input_scale)*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i] #floating_output
output[i][j][k] = weight_scale*input_scale/output_scale*conv_cal(input_quantized_padding, weight_quantized, kernel_size, stride, i, j, k) + bias[i]/output_scale + output_zero
output[i][j][k] = round(output[i][j][k])
# int_output
return output
Here I input the same image, weight, and bias together with their zero_point and scale, then compare this "numpy simulated" result to the PyTorch calculated one.
quantized_model_out1_int8 = np.squeeze(features['features.0.0'].int_repr().numpy())
out1_np = QuantizedConv2D(input_scale, input_zero, input_quantized, output_scale, output_zero, weight_scale, weight_zero, weight_quantized, bias, 3, 2, 1, 112)"out1_np.npy",out1_np)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(out1_np[i][j][k] < 0):
out1_np[i][j][k] = 0
flag = np.zeros(quantized_model_out1_int8.shape)
for i in range(quantized_model_out1_int8.shape[0]):
for j in range(quantized_model_out1_int8.shape[1]):
for k in range(quantized_model_out1_int8.shape[2]):
if(quantized_model_out1_int8[i][j][k] == out1_np[i][j][k]):
flag[i][j][k] = 1
out1_np[i][j][k] = 0
quantized_model_out1_int8[i][j][k] = 0
# Compare the simulated result to extractor fetched result, gain the total hit rate
If the "numpy simulated" results are the same as the extracted one, call it a hit. Print the total hit rate, it shows that numpy gets 92% of the values right. Now the problem is, I have no idea why the rest 8% of values come out wrong.
Comparison of two outcomes:
The picture below shows the different values between Numpy one and PyTorch one, the sample channel is index[1]. The left upper corner is Numpy one, and the upright corner is PyTorch one, I have set all values that are the same between them to 0, as you can see, most of the values just have a difference of 1(This can be view as the error brought by the precision loss of fixed point arithmetics), but some have large differences, e.g. the value[1][4], 121 vs. 76 (I don't know why)
Focus on one strange value:
This code is used to replay the calculation process of the value[1][4], originally I was expecting a trial and error process could lead me to solve this problem, to get my wanted number of 76, but no matter how I tried, it didn't output 76. If you want to try this, I paste this code for your convenience.
#%% A test code to check the calculation process
weight_quantized_sample = weight_quantized[2]
M_t = input_scale * weight_scale / output_scale
ifmap_t = np.int32(input_quantized[:,1:4,7:10])
weight_t = np.int32(weight_quantized_sample)
bias_t = bias[2]
bias_q = bias_t/output_scale
res_t = 0
for ch in range(3):
ifmap_offset = ifmap_t[ch]-np.int32(input_zero)
weight_offset = weight_t[ch]-np.int32(weight_zero)
res_ch = np.multiply(ifmap_offset, weight_offset)
res_ch = res_ch.sum()
res_t = res_t + res_ch
res_mul = M_t*res_t
# for n in range(1, 30):
# res_mul = multiply(n, M_t, res_t)
res_t = round(res_mul + output_zero + bias_q)
Could you help me out of this, have been stuck here for a long time.
I implemented my own version of quantized convolution and got from 99.999% to 100% hitrate (and mismatch of a single value is by 1 that I can consider to be a rounding issue). The link on the paper in the question helped a lot.
But I found that your formulas are the same as mine. So I don't know what was your issue. As I understand quantization in pytorch is hardware dependent.
Here is my code:
def my_Conv2dRelu_b2(input_q, conv_layer, output_shape):
input_q: quantized tensor
conv_layer: quantized tensor
output_shape: the pre-computed shape of the result
output = np.zeros(output_shape)
# extract needed float numbers from quantized operations
weights_scale = conv_layer.weight().q_per_channel_scales()
input_scale = input_q.q_scale()
weights_zp = conv_layer.weight().q_per_channel_zero_points()
input_zp = input_q.q_zero_point()
# extract needed convolution parameters
padding = conv_layer.padding
stride = conv_layer.stride
# extract float numbers for results
output_zp = conv_layer.zero_point
output_scale = conv_layer.scale
conv_weights_int = conv_layer.weight().int_repr()
input_int = input_q.int_repr()
biases = conv_layer.bias().numpy()
for k in range(input_q.shape[0]):
for i in range(conv_weights_int.shape[0]):
output[k][i] = manual_convolution_quant(
image_zp=input_zp, image_scale=input_scale,
kernel_zp=weights_zp[i].item(), kernel_scale=weights_scale[i].item(),
result_zp=output_zp, result_scale=output_scale
return output
def manual_convolution_quant(image, kernel, b, padding, stride, image_zp, image_scale, kernel_zp, kernel_scale,
result_zp, result_scale):
H = image.shape[1]
W = image.shape[2]
new_H = H // stride[0]
new_W = W // stride[1]
results = np.zeros([new_H, new_W])
M = image_scale * kernel_scale / result_scale
bias = b / result_scale
paddedIm = np.pad(
[(0, 0), (padding[0], padding[0]), (padding[1], padding[1])],
s = kernel.shape[1]
for i in range(new_H):
for j in range(new_W):
patch = paddedIm[
:, i * stride[0]: i * stride[0] + s, j * stride[1]: j * stride[1] + s
res = M * ((kernel - kernel_zp) * (patch - image_zp)).sum() + result_zp + bias
if res < 0:
res = 0
results[i, j] = round(res)
return results
Code to compare pytorch and my own version.
def calc_hit_rate(array1, array2):
good = (array1 == array2).astype(
all = array1.size
return good / all
# during inference
y2 = model.conv1(y1)
y2_int = torch.int_repr(y2)
y2_int_manual = my_Conv2dRelu_b2(y1, model.conv1, y2.shape)
print(f'y2 hit rate= {calc_hit_rate(y2.int_repr().numpy(), y2_int_manual)}') #hit_rate=1.0
In GPflow I have multiple time series and the sampling times are not aligned across time series, and the time series may have different length (longitudinal data). I assume that they are independent realizations from the same GP. What is the right way to handle this with svgp, and more generally with GPflow? Do i need to use coregionalization? The coregionalization notebook assumed correlated trajectories, while I want shared mean/kernel but independent.
Yes, the Coregion kernel implemented in GPflow is what you can use for your problem.
Let's set up some data from the generative model you describe, with different lengths for the timeseries:
import numpy as np
import gpflow
import matplotlib.pyplot as plt
Ns = [80, 90, 100] # number of observations for three different realizations
Xs = [np.random.uniform(0, 10, size=N) for N in Ns] # observation locations
# three different draws from the same GP:
k = gpflow.kernels.Matern52(variance=2.0, lengthscales=0.5) # kernel
Ks = [k(X[:, None]) for X in Xs]
Ls = [np.linalg.cholesky(K) for K in Ks]
vs = [np.random.randn(N, 1) for N in Ns]
fs = [(L # v).squeeze(axis=-1) for L, v in zip(Ls, vs)]
To actually set up the training data for the gpflow GP model:
# output indicator for the observations: which timeseries is this?
os = [o * np.ones(N) for o, N in enumerate(Ns)] # [0 ... 0, 1 ... 1, 2 ... 2]
# now assemble the three timeseries in single data set:
allX = np.concatenate(Xs)
allo = np.concatenate(os)
allf = np.concatenate(fs)
X = np.c_[allX, allo]
Y = allf[:, None]
assert X.shape == (sum(Ns), 2)
assert Y.shape == (sum(Ns), 1)
# now let's set up a copy of the original kernel:
k2 = gpflow.kernels.Matern52(active_dims=[0]) # the same as k above, but with different hyperparameters
# and a Coregionalization kernel that effectively says they are all independent:
kc = gpflow.kernels.Coregion(output_dim=len(Ns), rank=1, active_dims=[1])
gpflow.set_trainable(kc, False) # we want W and kappa fixed
The Coregion kernel defines a covariance matrix B = W Wᵀ + diag(kappa), so by setting W=0 we prescribe zero correlations (independent realizations) and kappa=1 (actually the default) ensures that the variance hyperparameter of the copy of the original kernel remains interpretable.
Now construct the actual model and optimize hyperparameters:
k2c = k2 * kc
m = gpflow.models.GPR((X, Y), k2c, noise_variance=1e-5)
opt = gpflow.optimizers.Scipy()
opt.minimize(m.training_loss, m.trainable_variables, compile=False)
which recovers the initial variance and lengthscale hyperparameters pretty well.
If you want to predict, you have to provide the extra "output" column in the Xnew argument to m.predict_f(), e.g. as follows:
Xtest = np.linspace(0, 10, 100)
Xtest_augmented = np.c_[Xtest, np.zeros_like(Xtest)]
f_mean, f_var = m.predict_f(Xtest_augmented)
(whether you set the output column to 0, 1, or 2 does not matter, as we set them all to be the same with our choice of W and kappa).
If your input was more than one-dimensional, you could set
active_dims=list(range(X.shape[1] - 1)) for the first kernel(s) and active_dims=[X.shape[1]-1] for the Coregion kernel.
I have been playing around with odeint in scipy and I could not understand what the function returns as return values. For example,
# -*- coding: utf-8 -*-
Created on Sat Feb 04 20:01:16 2017
#author: Esash
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import numpy as np
def MassSpring(state,t):
# unpack the state vector
x = state[0]
xd = state[1]
# these are our constants
k = -5.5 # Newtons per metre
m = 1.5 # Kilograms
g = 9.8 # metres per second
# compute acceleration xdd
xdd = ((k*x)/m) + g
# return the two state derivatives
return [xd, xdd]
state0 = [0.0, 0.0]
t = np.arange(0.0, 10.0, 0.1)
state = odeint(MassSpring, state0, t)
plt.plot(t, state)
plt.xlabel('TIME (sec)')
plt.title('Mass-Spring System')
plt.legend(('$x$ (m)', '$\dot{x}$ (m/sec)'))
In the above code, I have set the two parameters as 0.0 and 0.0 and the xd in the function is just 0.0 which I return as well. But the return value is not just 0.0, it varies.
In [14]: state
array([[ 0. , 0. ],
[ 0.04885046, 0.97402207],
[ 0.19361613, 1.91243899],
[ 0.10076832, -1.39206172],
[ 0.00941998, -0.42931942],
[ 0.01542821, 0.54911655]])
Also, if I have one differential equation for which I need to send many parameters, then I cannot send M parameters in the odeint call as a list or tuple and return only the solution of the ODE as a single array. It expects that the number of parameters sent should be equal to the number of parameters returned form the function. Why is this?
I am not able to understand how this function works. Can someone please explain this to me? My apologies if I sound too confusing.
Thanks a lot.
I could not understand what the function returns as return values.
The return value of odeint is the computed solution at the requested time values. That is, after this call
state = odeint(MassSpring, state0, t)
state[0] is [x(t[0]), x'(t[0])], state[1] is [x(t[1]), x'(t[1])], etc. If you wanted to plot just the x coordinate, you could call plt.plot(t, state[:, 0]) to plot the first column of state.
I have set the two parameters as 0.0 and 0.0 [...]
What you are calling the "parameters" are usually called the initial conditions. They are the values of x(t) and x'(t) at t=0.
But the return value is not just 0.0, it varies.
That is because (0, 0) is not an equilibrium of the system. Look at the equation
xdd = ((k*x)/m) + g
When x is 0, you get xdd = g, so xdd is initially positive. That is, there is a nonzero force (gravity) acting on the mass, so it accelerates.
The equilibrium state is [-g*m/k, 0].
Also, if I have one differential equation for which I need to send many parameters, then I cannot send M parameters in the odeint call as a list or tuple and return only the solution of the ODE as a single array. It expects that the number of parameters sent should be equal to the number of parameters returned form the function. Why is this?
odeint only solves the system for one set of initial conditions at a time. If you want to generate several solutions (corresponding to different initial conditions), you'll have to call odeint multiple times.
This is the error that I'm receiving:
File "mtrand.pyx",line 1192, in mtrand.RandomState.randint(numpy/random/mtrand/mtrand.c:14128)
I am somewhat new to coding, but I really want to get started with simple ANNs so I decided to start this project.
TypeError: randint() takes at least 1 positional argument (0 given)
# -- coding: utf-8 --
Created on Sun Sep 18 14:56:44 2016
#author: Jamoonie
##theano practice
import numpy as np
import theano
import theano.tensor as T
from sklearn.datasets import load_digits
print (
train_x = list(
#print train_x.count
train_x = np.array(train_x)
#print train_x
train_y = list(
#print train_y.count
train_y = np.array(train_y)
#print train_y
#q = T.matrix('q') checking how matrix dot products work, and how the row,col of the W0 should be set up
#q = np.zeros([5,10])
#print q
#p = T.matrix('p')
#p = np.zeros([10,5])
nn_input_dim = train_x.shape[1] ## if shape[0] it yields 1797, which is the number of rows
print nn_input_dim ##shows 64; shape[1] yields 1 row thus 64 columns! which are the layers of data we want to apply
nn_hdim0 = 10
nn_output_dim = len(train_y)
#nn_hdim0 = np.transpose(np.zeros(
#print nn_hdim0
epsilon = 0.008
batch_size = 100 ## how much data input per iteration
X = T.matrix('X')
y = T.lvector('y')
## set weight shapeswith random values
#W0 = np.transpose(np.zeros(
W0 = theano.shared(np.random.randn(nn_input_dim,nn_hdim0),name='W0') ##the shape of W0 should be row=input_dim, col=# hidden nodes
b0 = theano.shared(np.zeros(nn_hdim0),name='b0')
W1 = theano.shared(np.random.randn(nn_hdim0,nn_output_dim),name='W1') ## shape of W1 should have row=#hidden nodes, col = output dimension
b1 = theano.shared(np.zeros(nn_output_dim),name='b1')
z0 =
a0 = T.nnet.softmax(z0) ## first hidden layer result
z1 =
a1 = T.nnet.softmax(z1) ## final result or prediction
loss = T.nnet.categorical_crossentropy(a1,y).mean() ## howmuch the prediction differrs from the real result
prediction = T.argmax(a1,axis=1) ## the maximum values of a1, presented in index posn 1
fwd_propagation = theano.function([X],a1) ## forward propagation function dpeneding on the array of X values and final prediction
calc_loss = theano.function([X,y],loss)
predict= theano.function([X],prediction)
accuracy = theano.function([X],T.sum(T.eq(prediction,train_y))) ## T.eq is elementwise. so this does an elementwise sum of prediction and train_y
dW0 = T.grad(loss,W0)
dW1 = T.grad(loss,W1)
gradient_step = theano.function(
[X,y], ##for each set of X,y values
updates=((W1,W1-epsilon*dW1), ##updates W1 by deltaW1(error)*learning rate and subtracting from original W1
def build(iterations = 80000):
W1.set_value(np.random.randn(nn_hdim0,nn_output_dim)/np.sqrt(nn_input_dim)) ## why dividing by the sqrt of nn_input_dim,i'm not sure, but they're meant to be random anyway.
for i in range(0, iterations):
##so we're providing the values now for the weights, biases and input output values
if i%2000==0:
print("loss after iteration %r: %r" % (i, calc_loss(train_x,train_y)))
if i==80000:
print (W0,b0,W1,b1)
As per the documentation, you need to at-least specify the lowest value of integer to be drawn from the distribution. If you want a random number less than 213 (to be exact between 0 and 213) then you would do r = np.random.randint(213), and if you want a random number between some range let's say 213 and 537 then you would do, r = np.random.randint(213, 537). Also you are trying to get a random number from randint(..) without even storing it to any variable (or passing to any function), which is useless. I would suggest going through basic Theano tutorials to get started, start from here.