Scipy.stats / Why can't I get the value for randint?

Scipy.stats / Why can't I get the value for randint? - scipy

Hello
Here is my code :
from scipy.stats import randint
param_distributions = {'n_estimators': randint(1, 5),
'max_depth': randint(5, 10)}
param_distributions["max_depth"] gives as a result :
{'n_estimators': <scipy.stats._distn_infrastructure.rv_frozen object
at 0x7f05f1b05210>, 'max_depth':
<scipy.stats._distn_infrastructure.rv_frozen object at
0x7f05f1b053d0>}
Why can't I get a value for this ?

randint(low, high) will return a distribution object. To sample from it, you need to use the rvs() method:
from scipy.stats import randint
param_distributions = {'n_estimators': randint(1, 5).rvs(),
'max_depth': randint(5, 10).rvs()}
>>> param_distributions
{'n_estimators': 3, 'max_depth': 9}
The docs list all methods available.

Related

predicting time series: my python code prints out a (very long) list rather than a (small) array

I am learning neural network modeling and its uses in time series prediction.
First, thank you for reading this post and for your help :)
On this page there are various NN models (LSTM, CNN etc.) for predicting "traffic volume":
https://michael-fuchs-python.netlify.app/2020/11/01/time-series-analysis-neural-networks-for-forecasting-univariate-variables/#train-validation-split
I got inspired and decided to use/shorten/adapt the code in there for a problem of my own: predicting the bitcoin price.
I have the bitcoin daily prices starting 1.1.2017
in total 2024 daily prices
I use the first 85% of the data for the training data, and the rest as the validation (except the last 10 observation, which I would like to use as test data to see how good my model is)
I would like to use a Feedforward model
My goal is merely having a code that runs.
I have managed so far to have most of my code run. However, I get a strange format for my test forecast results: It should be simply an array of 10 numbers (i.e. predicted prices corresponding to the 10 day at the end of my data). To my surprise what is printed out is a long list of numbers. I need help to find out what changes I need to make to make to the code to make it run.
Thank you for helping me :)
The code is pasted down there, followed by the error:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing #import MinMaxScaler
from sklearn import metrics #import mean_squared_error
import seaborn as sns
sns.set()
import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Dense, Flatten
from keras.optimizers import Adam
from keras.models import Sequential
from keras.callbacks import EarlyStopping
tf.__version__
df = pd.read_csv('/content/BTC-USD.csv')
def mean_absolute_percentage_error_func(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def timeseries_evaluation_metrics_func(y_true, y_pred):
print('Evaluation metric results: ')
print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
print(f'MAPE is : {mean_absolute_percentage_error_func(y_true, y_pred)}')
print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n')
def univariate_data_prep_func(dataset, start, end, window, horizon):
X = []
y = []
start = start + window
if end is None:
end = len(dataset) - horizon
for i in range(start, end):
indicesx = range(i-window, i)
X.append(np.reshape(dataset[indicesx], (window, 1)))
indicesy = range(i,i+horizon)
y.append(dataset[indicesy])
return np.array(X), np.array(y)
# Generating the test set
test_data = df['close'].tail(10)
df = df.drop(df['close'].tail(10).index)
df.shape
# Defining the target variable
uni_data = df['close']
uni_data.index = df['formatted_date']
uni_data.head()
#scaling
from sklearn import preprocessing
uni_data = uni_data.values
scaler_x = preprocessing.MinMaxScaler()
x_scaled = scaler_x.fit_transform(uni_data.reshape(-1, 1))
# Single Step Style (sss) modeling
univar_hist_window_sss = 50
horizon_sss = 1
# 2014 observations in total
# 2014*0.85=1710 should be part of the training (304 validation)
train_split_sss = 1710
x_train_uni_sss, y_train_uni_sss = univariate_data_prep_func(x_scaled, 0, train_split_sss,
univar_hist_window_sss, horizon_sss)
x_val_uni_sss, y_val_uni_sss = univariate_data_prep_func(x_scaled, train_split_sss, None,
univar_hist_window_sss, horizon_sss)
print ('Length of first Single Window:')
print (len(x_train_uni_sss[0]))
print()
print ('Target horizon:')
print (y_train_uni_sss[0])
BATCH_SIZE_sss = 32
BUFFER_SIZE_sss = 150
train_univariate_sss = tf.data.Dataset.from_tensor_slices((x_train_uni_sss, y_train_uni_sss))
train_univariate_sss = train_univariate_sss.cache().shuffle(BUFFER_SIZE_sss).batch(BATCH_SIZE_sss).repeat()
validation_univariate_sss = tf.data.Dataset.from_tensor_slices((x_val_uni_sss, y_val_uni_sss))
validation_univariate_sss = validation_univariate_sss.batch(BATCH_SIZE_sss).repeat()
n_steps_per_epoch = 55
n_validation_steps = 10
n_epochs = 100
#FFNN architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=x_train_uni_sss.shape[-2:]),
tf.keras.layers.Dense(units=horizon_sss)])
model.compile(loss='mse',
optimizer='adam')
#fit the model
model_path = '/content/FFNN_model_sss.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
min_delta=0, patience=10,
verbose=1, mode='min'),
tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss',
save_best_only=True,
mode='min', verbose=0)]
history = model.fit(train_univariate_sss, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
validation_data=validation_univariate_sss, validation_steps=n_validation_steps, verbose =1,
callbacks = keras_callbacks)
#validation
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
# Testing our model
trained_ffnn_model_sss = tf.keras.models.load_model(model_path)
df_temp = df['close']
test_horizon = df_temp.tail(univar_hist_window_sss)
test_history = test_horizon.values
result = []
# Define Forecast length here
window_len = len(test_data)
test_scaled = scaler_x.fit_transform(test_history.reshape(-1, 1))
for i in range(1, window_len+1):
test_scaled = test_scaled.reshape((1, test_scaled.shape[0], 1))
# Inserting the model
predicted_results = trained_ffnn_model_sss.predict(test_scaled)
print(f'predicted : {predicted_results}')
result.append(predicted_results[0])
test_scaled = np.append(test_scaled[:,1:],[[predicted_results]])
result_inv_trans = scaler_x.inverse_transform(result)
result_inv_trans
I believe the problem might have to do with the shapes of data. How exactly I do not yet know.
Data:
click here
Traceback:
click here

Updating a Dash Callback using RadioItems

I am fairly new to python coding so I apologize in advance for my ignorance. I am trying to create a Dash App that drops outliers using standard deviation. The user selects a standard deviation using RadioItem inputs.
My question is what amendments do I need to make to my code so that the RadioItem value updates max_deviations using a callback?
Import packages, clean the data and define a query
import dash
import plotly.express as px
from dash import Dash, dcc, html, Input, Output, State
import pandas as pd
import numpy as np
app = dash.Dash(__name__)
server = app.server
df=pd.read_csv(r'C:\SVS_GIS\POWER BI\CSV_DATA\QSAS2021.csv', encoding='unicode_escape')
#SET DATE OF VALUATION
df['TIME'] = ((pd.to_datetime(df['Sale Date'], dayfirst=True)
.rsub(pd.to_datetime('01/10/2021', dayfirst=True))
.dt.days
)*-1)
df=df[df['TIME'] >= -365]
df = df.query("(SMA >=1 and SMA <= 3) and (LGA==60)")
prepare dataframe for dropping outliers
data = pd.DataFrame(data=df)
x = df.TIME
y = df.CHANGE
mean = np.mean(y)
standard_deviation = np.std(y)
distance_from_mean = abs(y - mean)
app layout
app.layout = html.Div([
html.Label("Standard Deviation Picker:", style={'fontSize':25, 'textAlign':'center'}),
html.Br(),
html.Label("1.0 = 68%, 2.0 = 95%, 3.0 = 99.7%", style={'fontSize':15,
'textAlign':'center'}),
html.Div(id="radio_items"),
dcc.RadioItems(
options=[{'label': i, 'value': i} for i in [1.0, 2.0, 3.0]],
value=2.0
),
html.Div([
dcc.Graph(id="the_graph")]
)])
callback
#app.callback(
Output("the_graph", "figure"),
Input("radio_items", 'value')
)
def update_graph(max_deviations):
not_outlier = distance_from_mean < max_deviations * standard_deviation
no_outliers = y[not_outlier]
trim_outliers = pd.DataFrame(data=no_outliers)
dff = pd.merge(trim_outliers, df, left_index=True, right_index=True)
return (dff)
fig = px.scatter(dff, x='TIME', y='CHANGE_y',
color ='SMA',
trendline='ols',
size='PV',
height=500,
width=800,
hover_name='SMA',
)
return dcc.Graph(id='the_graph', figure=fig)
if __name__ == '__main__':
app.run_server(debug=False)

Your dcc.RadioItems doesn't have an id prop. Add that, and make sure it matches the ID given in the callback, and you should be good.

I am trying to write output of a python dataframe in a csv file.

I am getting an error on line 35 in the spamwriter.writerow() funtion. The error states that wRR should be byte instead of string. Please check the code given below:
import csv
import sys
import numpy as np
from numpy import genfromtxt
from numpy.linalg import inv
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d, Axes3D
Lambda = float(sys.argv[1])
Sigma2 = float(sys.argv[2])
X_train = genfromtxt(sys.argv[3], delimiter=',')
y_train = genfromtxt(sys.argv[4], delimiter=',')
X_test = genfromtxt(sys.argv[5], delimiter=',')
# Get the number of columns -> dimension of the input vector -> size of identity_matrix
N = X_train.shape[0]
d = X_train.shape[1]
N_test = X_test.shape[0]
######################
## PART 1 ##
######################
identity_matrix = np.identity(d)
LambdaDotIdentityMatrix = np.multiply(identity_matrix,Lambda)
XTransposeX = np.transpose(X_train).dot(X_train)
Inverse = inv(LambdaDotIdentityMatrix+XTransposeX)
XtransposeY = np.transpose(X_train).dot(y_train)
wRR = Inverse.dot(XtransposeY)
nameOfThePart1File = "wRR_"+str(int(Lambda))+".csv"
with open(nameOfThePart1File, 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='\n', quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(np.transpose(wRR))

Why does Pybrain predict always the same value ?

I'm trying to use PyBrain for predicition, but my code's output gives my almost always the same prediction on a test set. Could anyone explain me why ?
Thanks !
## ----------------------- Data ---------------------------- ##
import pandas as pd
bdata = pd.read_csv(r'C:\Users\philippe.colo\Projects\regret\data\MT_161.csv', delimiter=';', na_values=0, nrows=96*3,
index_col=0, parse_dates=True, infer_datetime_format=True)
# 0: weekday, 1: month, 2: time, 3: monthday
X = []
for k, v in bdata.iterrows():
dow = k.dayofweek
day = k.day
mth = k.month
sec = k.hour * 3600 + k.minute * 60
X.append([dow, day, mth, sec])
Y = bdata.values
from pybrain.datasets import SupervisedDataSet
DS = SupervisedDataSet(4, 1)
for i in range(0, 96*2):
DS.addSample((X[i][0], X[i][1], X[i][2], X[i][3],), (float(Y[i]),))
## ----------------------- ANN ---------------------------- ##
from pybrain.structure import FeedForwardNetwork
n = FeedForwardNetwork()
from pybrain.structure import LinearLayer, SigmoidLayer
from pybrain.structure import FullConnection
n.addInputModule(LinearLayer(4, name='in'))
n.addModule(SigmoidLayer(3, name='hidden'))
n.addOutputModule(LinearLayer(1, name='out'))
n.addConnection(FullConnection(n['in'], n['hidden'], name='c1'))
n.addConnection(FullConnection(n['hidden'], n['out'], name='c2'))
n.sortModules() #initialisation
## ----------------------- Trainer ---------------------------- ##
from pybrain.supervised.trainers import BackpropTrainer
tstdata, trndata = DS.splitWithProportion(0.25)
#print len(tstdata)
#print len(trndata)
trainer = BackpropTrainer(module=n, dataset=DS)
#print trainer.trainUntilConvergence()
trainer.trainOnDataset(trndata, 100)
print n.activate((2, 1, 3, 0))
print n.activate((2, 1, 3, 90))
The first part of my code is just the data set. Then comes the artificial neural network and finally the trainer. I suspect the trainer to be totally badly coded.

Column widths with TabularAdapters?

Using Enthought Canopy's TraitsUI, I'm using TabularAdapters to display some Arrays, but they always produce evenly proportioned column widths...I'd like to make some widths smaller than others, but haven't found any simple way yet...Anyone have any suggestions?

One way to control the widths of the columns is to override the get_width() method of the TabularArrayAdapter. For example,
import numpy as np
from traits.api import HasTraits, Array
from traitsui.api import View, Item, TabularEditor
from traitsui.tabular_adapter import TabularAdapter
test_dtype = np.dtype([('x', 'int'),
('y', 'int'),
('r', 'float')])
class TestArrayAdapter(TabularAdapter):
columns = [(name, idx) for idx, name in enumerate(test_dtype.names)]
even_bg_color = 0xF0F4FF
def get_width(self, object, name, col):
widths = {0: 50, 1: 50, 2: 150}
return widths[col]
class Test(HasTraits):
array1 = Array(dtype=test_dtype)
view = \
View(
Item(name='array1', show_label=False,
editor=TabularEditor(adapter=TestArrayAdapter())),
resizable=True,
)
a1 = np.array([(10, 20, 1.5), (15, 31, 2.4), (14, 11, 1.9), (21, 13, 2.5)],
dtype=test_dtype)
test = Test(array1=a1)
test.configure_traits()
Screenshot: