Changing Date Format X-Axis Matplotlib - date

I have a plot that I need to change the date format along the x-axis to 'YYYY-mm' format. The plot looks like this:
The code looks like this -
import matplotlib.dates as mdates
import datetime
from matplotlib.dates import DateFormatter
def mean_absolute_percentage_error(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
mape = mean_absolute_percentage_error(testarray.monthly_flow, testarray.predicted).round(2) # PRINT MAPE
print(mape)
fpm = (1+mape/100) #'fitted plus mape -- fpm
fmm = (1-mape/100) #'fitted minus mape -- fmm
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
plus_mape = fitted_series2.multiply(other=fpm)
minus_mape= fitted_series2.multiply(other=fmm)
# Plot WITH MAPE +/-
fig,ax = plt.subplots()
ax.plot(df2.monthly_flow[-24:])
ax.plot(fitted_series2[:12], color='darkgreen')
ax.plot(plus_mape[:12], color='black',linestyle='dotted')
ax.plot(minus_mape[:12], color='black',linestyle='dotted')
ax.fill_between(lower_series.index[:12],
lower_series[:12],
upper_series[:12],
color='k',
alpha=.15)
plt.title("SARIMAX Forecast of Monthly Col River Flow")
plt.show()
date_form = DateFormatter('%YYYY-%mm')
ax.xaxis.set_major_formatter(date_form)
plt.show()
And, despite using the 'major_formatter' with the date format specified, it does nothing as you can see with the above plot. I'm not sure what else to do. Thank you for suggestions,

Related

predicting time series: my python code prints out a (very long) list rather than a (small) array

I am learning neural network modeling and its uses in time series prediction.
First, thank you for reading this post and for your help :)
On this page there are various NN models (LSTM, CNN etc.) for predicting "traffic volume":
https://michael-fuchs-python.netlify.app/2020/11/01/time-series-analysis-neural-networks-for-forecasting-univariate-variables/#train-validation-split
I got inspired and decided to use/shorten/adapt the code in there for a problem of my own: predicting the bitcoin price.
I have the bitcoin daily prices starting 1.1.2017
in total 2024 daily prices
I use the first 85% of the data for the training data, and the rest as the validation (except the last 10 observation, which I would like to use as test data to see how good my model is)
I would like to use a Feedforward model
My goal is merely having a code that runs.
I have managed so far to have most of my code run. However, I get a strange format for my test forecast results: It should be simply an array of 10 numbers (i.e. predicted prices corresponding to the 10 day at the end of my data). To my surprise what is printed out is a long list of numbers. I need help to find out what changes I need to make to make to the code to make it run.
Thank you for helping me :)
The code is pasted down there, followed by the error:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing #import MinMaxScaler
from sklearn import metrics #import mean_squared_error
import seaborn as sns
sns.set()
import tensorflow as tf
from tensorflow import keras
from keras.layers import Input, Dense, Flatten
from keras.optimizers import Adam
from keras.models import Sequential
from keras.callbacks import EarlyStopping
tf.__version__
df = pd.read_csv('/content/BTC-USD.csv')
def mean_absolute_percentage_error_func(y_true, y_pred):
y_true, y_pred = np.array(y_true), np.array(y_pred)
return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
def timeseries_evaluation_metrics_func(y_true, y_pred):
print('Evaluation metric results: ')
print(f'MSE is : {metrics.mean_squared_error(y_true, y_pred)}')
print(f'MAE is : {metrics.mean_absolute_error(y_true, y_pred)}')
print(f'RMSE is : {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
print(f'MAPE is : {mean_absolute_percentage_error_func(y_true, y_pred)}')
print(f'R2 is : {metrics.r2_score(y_true, y_pred)}',end='\n\n')
def univariate_data_prep_func(dataset, start, end, window, horizon):
X = []
y = []
start = start + window
if end is None:
end = len(dataset) - horizon
for i in range(start, end):
indicesx = range(i-window, i)
X.append(np.reshape(dataset[indicesx], (window, 1)))
indicesy = range(i,i+horizon)
y.append(dataset[indicesy])
return np.array(X), np.array(y)
# Generating the test set
test_data = df['close'].tail(10)
df = df.drop(df['close'].tail(10).index)
df.shape
# Defining the target variable
uni_data = df['close']
uni_data.index = df['formatted_date']
uni_data.head()
#scaling
from sklearn import preprocessing
uni_data = uni_data.values
scaler_x = preprocessing.MinMaxScaler()
x_scaled = scaler_x.fit_transform(uni_data.reshape(-1, 1))
# Single Step Style (sss) modeling
univar_hist_window_sss = 50
horizon_sss = 1
# 2014 observations in total
# 2014*0.85=1710 should be part of the training (304 validation)
train_split_sss = 1710
x_train_uni_sss, y_train_uni_sss = univariate_data_prep_func(x_scaled, 0, train_split_sss,
univar_hist_window_sss, horizon_sss)
x_val_uni_sss, y_val_uni_sss = univariate_data_prep_func(x_scaled, train_split_sss, None,
univar_hist_window_sss, horizon_sss)
print ('Length of first Single Window:')
print (len(x_train_uni_sss[0]))
print()
print ('Target horizon:')
print (y_train_uni_sss[0])
BATCH_SIZE_sss = 32
BUFFER_SIZE_sss = 150
train_univariate_sss = tf.data.Dataset.from_tensor_slices((x_train_uni_sss, y_train_uni_sss))
train_univariate_sss = train_univariate_sss.cache().shuffle(BUFFER_SIZE_sss).batch(BATCH_SIZE_sss).repeat()
validation_univariate_sss = tf.data.Dataset.from_tensor_slices((x_val_uni_sss, y_val_uni_sss))
validation_univariate_sss = validation_univariate_sss.batch(BATCH_SIZE_sss).repeat()
n_steps_per_epoch = 55
n_validation_steps = 10
n_epochs = 100
#FFNN architecture
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(8, input_shape=x_train_uni_sss.shape[-2:]),
tf.keras.layers.Dense(units=horizon_sss)])
model.compile(loss='mse',
optimizer='adam')
#fit the model
model_path = '/content/FFNN_model_sss.h5'
keras_callbacks = [tf.keras.callbacks.EarlyStopping(monitor='val_loss',
min_delta=0, patience=10,
verbose=1, mode='min'),
tf.keras.callbacks.ModelCheckpoint(model_path,monitor='val_loss',
save_best_only=True,
mode='min', verbose=0)]
history = model.fit(train_univariate_sss, epochs=n_epochs, steps_per_epoch=n_steps_per_epoch,
validation_data=validation_univariate_sss, validation_steps=n_validation_steps, verbose =1,
callbacks = keras_callbacks)
#validation
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, 'r', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
# Testing our model
trained_ffnn_model_sss = tf.keras.models.load_model(model_path)
df_temp = df['close']
test_horizon = df_temp.tail(univar_hist_window_sss)
test_history = test_horizon.values
result = []
# Define Forecast length here
window_len = len(test_data)
test_scaled = scaler_x.fit_transform(test_history.reshape(-1, 1))
for i in range(1, window_len+1):
test_scaled = test_scaled.reshape((1, test_scaled.shape[0], 1))
# Inserting the model
predicted_results = trained_ffnn_model_sss.predict(test_scaled)
print(f'predicted : {predicted_results}')
result.append(predicted_results[0])
test_scaled = np.append(test_scaled[:,1:],[[predicted_results]])
result_inv_trans = scaler_x.inverse_transform(result)
result_inv_trans
I believe the problem might have to do with the shapes of data. How exactly I do not yet know.
Data:
click here
Traceback:
click here

boto 3 - loosing date format

I'm trying to read a parquet file using boto3. The original file has dates with the following format:
2016-12-07 23:00:00.000
And they are stored as timestamps.
My code in Sage Maker is:
boto_s3 = boto3.client('s3')
r = boto_s3.select_object_content(
Bucket='bucket_name',
Key='path/file.gz.parquet',
ExpressionType='SQL',
Expression=f"select fecha_instalacion,pais from s3object s ",
InputSerialization = {'Parquet': {}},
OutputSerialization = {'CSV': {}},
)
rl0 = list(r['Payload'])[0]
from io import StringIO
string_csv = rl0['Records']['Payload'].decode('ISO-8859-1')
csv = StringIO(string_csv)
pd.read_csv(csv, names=['fecha_instalacion', 'pais'])
But instead of the date I get:
fecha_instalacion pais
45352962065516692798029824 ESPAÃA
I loooked for dates with only one day in between and the nyuumber of digits that are the same are the first 6. As an example:
45337153205849123712294912--> 2016-12-09 23:00:00.000
45337116312360976293191680--> 2016-12-07 23:00:00.000
I would need to get the correct formated date, and avoid the especial characters.
Thanks.
The problem is the format. That Parquet file is using Int96 numbers to represent timestamp.
Here is a function to convert the int96Timestamp to python DateTime
import datetime
def dateFromInt96Timestamp(int96Timestamp):
julianCalendarDays = int96Timestamp >> 8*8
time = int((int96Timestamp & 0xFFFFFFFFFFFFFFFF) / 1_000)
linuxEpoch = 2_440_588
return datetime.datetime(1970, 1, 1) + datetime.timedelta(days=julianCalendarDays - linuxEpoch, microseconds=time)

How to calculate hour to day in NetCDf file using scala

is there a method to convert the unit from hours to days in this dataset ?
double time(time) ;
time:units = "hours since 1800-01-01 00:00:0.0" ;
time:long_name = "Time" ;
time:delta_t = "0000-01-00 00:00:00" ;
time:avg_period = "0000-01-00 00:00:00" ;
time:standard_name = "time" ;
time:axis = "T" ;
time:actual_range = 1569072., 1895592. ;
If you can use Python, it's an easy process:
The first step is to convert the numeric dates to a datetime object using netCDF4 num2date.
The second step is to compute the number of days between each datetime object and the time stamp (or original date) in the time variable (i.e. 1800-01-01).
import netCDF4
import datetime
ncfile = netCDF4.Dataset('./precip.mon.mean.nc', 'r')
time = ncfile.variables['time']
# Convert from numeric times to datetime objects
dates = netCDF4.num2date(time[:], time.units)
# Compute number of days since the original date
orig_date = datetime.datetime(1800,1,1)
days_since = [(t - orig_date).days for t in dates]

Converting Epoch to Date in Matlab

I have an array of Epoch milliseconds (array of numbers) in Matlab. I would like to convert these into UTC date-time format, such as DD-MM-YYYY HH:MM.
Is there a pre-defined Matlab way to do this or will I have to write my own function?
Suppose, you start with a vector time_unix, then:
>> time_unix = 1339116554872; % example time
>> time_reference = datenum('1970', 'yyyy');
>> time_matlab = time_reference + time_unix / 8.64e7;
>> time_matlab_string = datestr(time_matlab, 'yyyymmdd HH:MM:SS.FFF')
time_matlab_string =
20120608 00:49:14.872
Notes:
1) See the definition of matlab's time.
2) 8.64e7 is number of milliseconds in a day.
3) Matlab does not apply any time-zone shifts, so the result is the same UTC time.
4) Example for backward transformation:
>> matlab_time = now;
>> unix_time = round(8.64e7 * (matlab_time - datenum('1970', 'yyyy')))
unix_time =
1339118367664
To summarize, here are two functions:
function tm = unix2matlab(tu)
tm = datenum('1970', 'yyyy') + tu / 864e5;
end
function tu = matlab2unix(tm)
tu = round(864e5 * (tm - datenum('1970', 'yyyy')));
end
The matlab time here is numeric. You can always convert it to string using datestr()
Update for nanoseconds
time_unix_nanos = 1339116554872666666;
millis = round(time_unix_nanos / 1e6);
nanos = time_unix_nanos - 1e6 * millis;
time_matlab = unix2matlab(millis);
s = [datestr(time_matlab, 'yyyymmdd HH:MM:SS.FFF'), num2str(nanos)];
s =
20120608 00:49:14.872666666
I tried the above code, but the results were wrong. I realised the main error is related to the awkward definition of the Unix time (epoch time). Unix time (epoch time) is defined as the number of seconds after 1-1-1970, 00h:00, not the number of **milli**seconds (http://en.wikipedia.org/wiki/Unix_time). With this definition, the Unix time should therefore be divided by 8.64e5 instead of 8.64e7.
In addition, datenum('1970', 'yyyy') does not seem to result in the desired reference time of 1-1-1970, 00h:00.
Here's my improved code:
tMatlab = datenum (1970,1,1,0,0) + tUnix / 86400;
Serg's answer is what I normally use, when I'm working in MATLAB. Today I found myself wanting to do the conversion to date in MATLAB as the title says - without the datestring conversion  specified in the question body - and output the date number from the shell.
Here is what I settled on for the rounded date number:
TODAY_MATLAB="$[719529 + $[`date +%s` / 24/60/60]]"
This is really just the bash equivalent of what you would expect: 719529 is the datenum of the epoch (1970-01-01 or datenum(1970,1,1) in MATLAB). I'm also fumbling through ksh lately and it seems this can be done there with:
TODAY_EPOCH=`date +%s`
TODAY_MATLAB=`expr $TODAY_EPOCH / 24 / 60 / 60 + 719529`
As a side exercise, I added the decimal portion back onto the date in bash - I didn't bother in ksh, but it's only arithmetic and goes similarly:
N_DIGITS=7
FORMAT=$(printf "%%d.%%0%dd" $N_DIGITS)
NOW_EP_SEC=`date +%s`
SEC_PER_DAY=$(( 24*60*60 ))
NOW_EP_DAY=$(( $NOW_EP /$SEC_PER_DAY ))
SEC_TODAY=$(( $NOW_EP_SEC - $NOW_EP_DAY*$SEC_PER_DAY ))
TODAY_MATLAB="$(( NOW_EP_DAY+719529 ))"
FRACTION_MATLAB="$( printf '%07d' $(( ($SEC_TODAY*10**$N_DIGITS)/SEC_PER_DAY )) )"
MATLAB_DATENUM=$( printf $FORMAT $TODAY_MATLAB $FRACTION_MATLAB )
echo $MATLAB_DATENUM

Matlab: Converting Timestamps to Readable Format given the Reference Date-Time

I have a text file that contains timestamps out of a camera that captures 50 frames per second .. The data are as follows:
1 20931160389
2 20931180407
3 20931200603
4 20931220273
5 20931240360
.
.
50 20932139319
... and so on.
It gives also the starting time of capturing like
Date: **02.03.2012 17:57:01**
The timestamps are in microseconds not in milliseconds, and MATLAB can support only till milliseconds but its OK for me.
Now I need to know the human format of these timestamps for each row..like
1 20931160389 02.03.2012 17:57:01.045 % just an example
2 20931180407 02.03.2012 17:57:01.066
3 20931200603 02.03.2012 17:57:01.083
4 20931220273 02.03.2012 17:57:01.105
5 20931240360 02.03.2012 17:57:01.124
and so on
I tried this:
%Refernce Data
clc; format longg
refTime = [2012,03,02,17,57,01];
refNum = datenum(refTime);
refStr = datestr(refNum,'yyyy-mm-dd HH:MM:SS.FFF');
% Processing data
dn = 24*60*60*1000*1000; % Microseconds! I have changed this equation to many options but nothing was helpful
for i = 1 : size(Data,1)
gzTm = double(Data{i,2}); %timestamps are uint64
gzTm2 = gzTm / dn;
gzTm2 = refNum + gzTm2;
gzNum = datenum(gzTm2);
gzStr = datestr(gzNum,'yyyy-mm-dd HH:MM:SS.FFF'); % I can't use 'SS.FFFFFF'
fprintf('i = %d\t Timestamp = %f\t TimeStr = %s\n', i, gzTm, gzStr);
end;
But I got always strange outputs like
i = 1 Timestamp = 20931160389.000000 TimeStr = **2012-03-08 13:29:28.849**
i = 2 Timestamp = 20931180407.000000 TimeStr = **2012-03-08 13:29:29.330**
i = 3 Timestamp = 20931200603.000000 TimeStr = **2012-03-08 13:29:29.815**
The output time is about some hours late/earlier than the Referenced Time. The day is different.
The time gap between each entry in the array should be nearly 20 seconds..since I have 50 frames per second(1000 millisecond / 50 = 20) ..and the year,month, day,hour,minute and seconds should also indicate the initial time given as reference time because it is about some seconds earlier.
I expect something like:
% just an example
1 20931160389 02.03.2012 **17:57:01.045**
2 20931180407 02.03.2012 **17:57:01.066**
Could one help me please..! Where is my mistake?
It looks like you can work out the number of microseconds between a record and the first record:
usecs = double(Data{i,2}) - double(Data{1,2});
convert that into seconds:
secsDiff = usecs / 1e6;
then add that to the initial datetime you'd calculated:
matDateTime = refNum + secsDiff / (24*60*60);