Transforming dates in tensorflow or tensorflow extended - date

I am working with Tensorflow Extended, preprocessing data and among this data are date values (e.g. values of the form 16-04-2019). I need to apply some preprocessing to this, like the difference between two dates and extracting the day, month and year from it.
For example, I could need to have the difference in days between 01-04-2019 and 16-04-2019, but this difference could also span days, months or years.
Now, just using Python scripts this is easy to do, but I am wondering if it is also possible to do this with Tensorflow? It's important for my use case to do this within Tensorflow, because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline.
I am using Tensorflow 1.13.1, Tensorflow Extended and Python 2.7 for this.

Posting from similar issue on tft github.
Here's a way to do it:
import tensorflow_addons as tfa
import tensorflow as tf
from typing import TYPE_CHECKING
#tf.function(experimental_follow_type_hints=True)
def fn_seconds_since_1970(date_time: tf.string, date_format: str = "%Y-%m-%d %H:%M:%S %Z"):
seconds_since_1970 = tfa.text.parse_time(date_time, date_format, output_unit='SECOND')
seconds_since_1970 = tf.cast(seconds_since_1970, dtype=tf.int64)
return seconds_since_1970
string_date_tensor = tf.constant("2022-04-01 11:12:13 UTC")
seconds_since_1970 = fn_seconds_since_1970(string_date_tensor)
seconds_in_hour, hours_in_day = tf.constant(3600, dtype=tf.int64), tf.constant(24, dtype=tf.int64)
hours_since_1970 = seconds_since_1970 / seconds_in_hour
hours_since_1970 = tf.cast(hours_since_1970, tf.int64)
hour_of_day = hours_since_1970 % hours_in_day
days_since_1970 = seconds_since_1970 / (seconds_in_hour * hours_in_day)
days_since_1970 = tf.cast(days_since_1970, tf.int64)
day_of_week = (days_since_1970 + 4) % 7 #Jan 1st 1970 was a Thursday, a 4, Sunday is a 0
print(f"On {string_date_tensor.numpy().decode('utf-8')}, {seconds_since_1970} seconds had elapsed since 1970.")
My two cents on the broader underlying issue, here the question is computing time differences, for which we want to do these computations on tensors. Then the question becomes "What are the units of these tensors?" This is a question of granularity. "The next question is what are the data types involved?" Start with a string likely, end with a numeric. Then the next question becomes is there a "native" tensorflow function that can do this? Enter tensorflow addons!
Just like we are trying to optimize training by doing everything as tensor operations within the graph, similarly we need to optimize "getting to the graph". I have seen the way datetime would work with python functions here, and I would do everything I could do avoid going into python function land as the code becomes so complex and the performance suffers as well. It's a lose-lose in my opinion.
PS - This op is not yet implemented on windows as per this, maybe because it only returns unix timestamps :)

I had a similar problem. The issue because of an if-check with in TFX that doesn't take dates types into account. As far as I've been able to figure out, there are two options:
Preprocess the date column and cast it to an int (e.g. calling toordinal() on each element) field before reading it into TFX
Edit the TFX function that checks types to account for date-like types and cast them to ordinal on the fly.
You can navigate to venv/lib/python3.7/site-packages/tfx/components/example_gen/utils.py and look for the function dict_to_example. You can add a datetime check there like so
def dict_to_example(instance: Dict[Text, Any]) -> tf.train.Example:
"""Converts dict to tf example."""
feature = {}
for key, value in instance.items():
# TODO(jyzhao): support more types.
if isinstance(value, datetime.datetime): # <---- Check here
value = value.toordinal()
if value is None:
feature[key] = tf.train.Feature()
...
value will become an int, and the int will be handled and cast to a Tensorflow type later on in the function.

Related

Testing a trading system on bootstrap samples using Arch library in python

I am trying to test a hypothesis on outperformance of a trading strategy over the buy and hold. I have original data's returns containing 1261 observations as a sample to be used for bootstrap.
I want to know if I have applied it correctly.
def back_test_series(x):
df= pd.DataFrame(x, columns= ['Close'])
return df.Close
from arch.bootstrap import CircularBlockBootstrap
bs = CircularBlockBootstrap(40, sample_return)
results = bs.apply(back_test_series, 2500)
Above, sample_return is the sample containing 2761 returns on actual data. I created 2500 bootstrapped samples containing 2761 observations each.
and then created a cummulative return to get price time series.
time_series = []
for simu in results:
df = pd.DataFrame(simu, columns=["Close"])
df['Close'] = (1+df).cumprod()
time_series.append(df)
and finally ran my backtesting in the price series obatained from bootstrap.
final_results = []
for simulation in enumerate(time_series):
x = Backtesting.scrip_backtest(simulation)
final_results.append(x)
Backtesting.scrip_backtest is my trading strategy which will return stats like buy and hold cagr, strategy cagr, std dev of strategy returns.
My question is can I use bootstrap in this way? Should I use MovingBlockBootstrap or CircularBlockBootstrap?
It it correct to run trading strategy on bootstrapped time series as mentioned above?

FMU 2.0 interaction - requires parallel "container" for parameter values etc?

I work with pyfmi in Jupyter notebooks to run simulations and I like to work interactively and evaluate incremental changes in parameters etc. Long time ago I found it necessary to introduce a dictionary that work as a "container" for parameter and initial values. Now I wonder if here is a way to get rid of this "container" that after all is partly a parallel structure to "model"?
A typical workflow look like this:
create a diagram where results from different simulations below should be shown
model = load_fmu(fmu_model)
parDict['model.x_0'] = 1
parDict['model.a'] = 2
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
model = load_fmu(fmu_model)
parDict['model.x_0'] = 3
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
There is a function model.reset() that brings the state back to default values at compilation without loading again, but you need to do more than the following
model.reset()
parDict['model.x_0'] = 3
for key in parDict.keys(): model.set(key,parDict[key])
sim_res = model.simulate(10)
plot results...
So,  this does NOT work...
and after all parameters and initial values needs to be brought back and we still need parDict, but we may avoid the load-command though.

Visualizing an AutoDiff MultibodyPlant in PyDrake

I am trying to build a simple multibody plant system in Drake using the basic DrakeVisualizer. However, for my use case, I also want to be able to automatically track the derivatives through the physics simulation, so am using the AutoDiffXd version of system:
timestep = 1e-3
builder = DiagramBuilder_[AutoDiffXd]()
plant = MultibodyPlant(timestep)
scene_graph = SceneGraph_[AutoDiffXd]()
brick_file = FindResourceOrThrow("drake/examples/manipulation_station/models/061_foam_brick.sdf")
parser = Parser(plant)
brick = parser.AddModelFromFile(brick_file, model_name="brick")
plant.Finalize()
plant_ad = plant.ToAutoDiffXd()
plant_ad.RegisterAsSourceForSceneGraph(scene_graph)
scene_graph.AddRenderer("renderer", MakeRenderEngineVtk(RenderEngineVtkParams()))
DrakeVisualizer.AddToBuilder(builder, scene_graph)
builder.AddSystem(plant_ad)
builder.AddSystem(scene_graph)
builder.Connect(plant_ad.get_geometry_poses_output_port(), scene_graph.get_source_pose_port(plant_ad.get_source_id()))
builder.Connect(scene_graph.get_query_output_port(), plant_ad.get_geometry_query_input_port())
diagram = builder.Build()
context = diagram.CreateDefaultContext()
simulator = Simulator_[AutoDiffXd](diagram, context)
simulator.AdvanceTo(2.0)
However, when I run this, I get the following error:
File "/home/craig/Repos/drake-exps/autoDiffExperiment.py", line 102, in auto_phys
DrakeVisualizer.AddToBuilder(builder, scene_graph)
TypeError: AddToBuilder(): incompatible function arguments. The following argument types are supported:
1. (builder: pydrake.systems.framework.DiagramBuilder_[float], scene_graph: drake::geometry::SceneGraph<double>, lcm: pydrake.lcm.DrakeLcmInterface = None, params: pydrake.geometry.DrakeVisualizerParams = <pydrake.geometry.DrakeVisualizerParams object at 0x7ff6274e14b0>) -> pydrake.geometry.DrakeVisualizer
2. (builder: pydrake.systems.framework.DiagramBuilder_[float], query_object_port: pydrake.systems.framework.OutputPort_[float], lcm: pydrake.lcm.DrakeLcmInterface = None, params: pydrake.geometry.DrakeVisualizerParams = <pydrake.geometry.DrakeVisualizerParams object at 0x7ff627736730>) -> pydrake.geometry.DrakeVisualizer
Invoked with: <pydrake.systems.framework.DiagramBuilder_[AutoDiffXd] object at 0x7ff65654f8f0>, <pydrake.geometry.SceneGraph_[AutoDiffXd] object at 0x7ff656562130>
From this error, it appears the DrakeVisualizer class only accepts systems which use float scalars exlusively. So I am stuck --- either I can go back to floats (but lose the autodiff differentiable simulation functionality I was after in the first place), or continue to use autodiffxd systems (but be completely unable to visualize what is going on in my simulation).
Is there a way to get both that I am missing?
Sorry for the pain and inconvenience. Your description and assessment are all spot on. Most of the visualization mechanisms are float only and, in its current state, attempts to visualizing an AutoDiff diagram will fail.
You have a couple of options (neither of which is appealing):
Go with one of the outcomes you've described above (no vis or no derivatives).
Put in a Drake feature request to be able to attach a visualizer to an AutoDiff diagram.
I can come up with some hacky workarounds (that aren't immediately clear would even work). So, if you're desperate for derivatives and visualization, they could be explored. But, ultimately, the feature request and a formal Drake solution would be the best long-term resolution.
=====================================
Big update. As of #14569, the DrakeVisualizer class is now templated on the scalar type (item 2 in the list above). That has two implications:
You can build an AutoDiffXd-valued diagram with a visualizer in it (as in your example), or
You can create a double-valued diagram and scalar convert it (i.e., diagram.ToAutoDiffXd() into an AutoDiffXd-valued diagram.

Why am i getting a difference between matlab's "lla2eci" and "sgp4.propagate"?

I am not experienced in this area but over the past few days I've put together some code in python that tracks (hopefully) the ISS. I've done the math and have that side of things working, but only when I inject the satellite position using matlab's lla2eci. To get a correct answer, I take the latitude and longitude of the satellite's subpoint from live data and convert that to eci using matlab. This method gives me correct look angles (azimuth and elevation) for the ISS, and I've confirmed them with the pyephem method using iss.compute(home) where "home" is my lla.
I'm comparing matlab's lla2eci to what satellite.propagate(...) is getting me and at time = 2019 12 16 8 53 19, i get the following results:
Matlab lla2eci: x,y,z = (3873.9, -902.18, -4969.9)
sgp4 propagate: x,y,z= (-4082.5, 3458.3, -4195.1)
I have to be missing something here! Any help would be greatly appreciated, and I'm glad to answer any questions to clarify.
Looking at the question, seems like you are not taking Altitude into account?
Since your aim is to track the ISS using a python code may I suggest a slightly different approach?
TLE values for space objects are available at: https://www.space-track.org/, so sign-up there.
Then find the position of the satellite in python by using sgp4(https://pypi.org/project/sgp4/) and spacetrack(https://pypi.org/project/spacetrack/) libraries.
An example code would look like this:
from sgp4.earth_gravity import wgs84
from sgp4.io import twoline2rv
from spacetrack import SpaceTrackClient
from datetime import datetime
#generate TLE from database
st = SpaceTrackClient('YOUR_USERNAME', 'YOUR_PASSWORD')
tle = st.tle_latest(norad_cat_id=[<ISS_NORAD_CAT_ID>], ordinal=1, format='tle')
line1 = tle[:69]
line2 = tle[70:-7]
#create satellite object
satellite = twoline2rv(line1, line2, wgs84)
date_time = datetime.utcnow()
#find position
sat_position, sat_velocity = satellite.propagate(date_time.year, date_time.month,...
date_time.day, date_time.hour, date_time.minute, date_time.second)
Use your own username, password and norad ID.
welcome to stackoverflow :)

FITS_rec and selection of data: masking instead of "true" filtering?

Probably a duplicate to Ashley's post (but I can't comment -yet ;) ).
I have the same issue when trying to add a column to a sub-selection/sample of my initial FITS_rec (based on numpy's recarray); all the rows reappear (AND the filling of this new column doesn't seem to be respected...). "hdu_sliced._get_raw_data()" proposed by Vlas Sokolov is a solution that is working very fine for me, but I was wondering:
1) What are "the better ways" suggested by Iguananaut? I certainly need someone to just google it for me; the newbie me is feeling stuck :$ (Staying in a FITS_rec would be required).
2) Is that an expected behaviour? Meaning, are we really wanting to work on a "masked array" which would a copy of our original array? What is worrying me the most is the "collapse" of the values in the new computed column. See below:
# A nice FITS_rec
a1 = np.array(['NGC1001', 'NGC1002', 'NGC1003'])
a2 = np.array([11.1, 12.3, 15.2])
col1 = fits.Column(name='target', format='20A', array=a1)
col2 = fits.Column(name='V_mag', format='E', array=a2)
cols = fits.ColDefs([col1, col2])
hdu = fits.BinTableHDU.from_columns(cols)
ori_rec=hdu.data
ori_rec
`
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '
# Sub-selection
bug=ori_rec[ori_rec["V_mag"]>12.]
bug
FITS_rec([('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '
So far so good...
# Let's add a new column
col0=bug.columns
col1 =fits.ColDefs([fits.Column(name='new',format='D',array=bug.field("V_mag")+1.)])
newbug = fits.BinTableHDU.from_columns(col0 + col1).data
FITS_rec([('NGC1001', 11.1, 13.30000019), ('NGC1002', 12.3, 16.20000076),
('NGC1003', 15.2, 0. )],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '
...AND ... the values of the new column for NGC1002 and NGC1003 are correct but in the row of NGC1001 and NGC1002 respectively... :|
Any enlightenment will be welcomed :)
This is a confusing problem, and it stems from the fact that there are many layers of legacy classes and data structures in astropy.io.fits (stemming back from earlier versions of PyFITS). For example, you can see in your example that hdu.data is a FITS_rec object, which is like a Numpy recarray (itself a soft-deprecated legacy class), but it also has a .columns attribute (as you've noted):
>>> bug.columns
ColDefs(
name = 'target'; format = '20A'
name = 'V_mag'; format = 'E'
)
This in turn actually holds references back to the original arrays from which you described the columns. For example:
>>> bug.columns['target'].array
chararray(['NGC1001', 'NGC1002', 'NGC1003'],
dtype='|S20')
You can see here that even though bug is a "slice" of your original table, the arrays referenced through bug.columns are still contain the original, unsliced array data. So when you do something like in your original post
>>> col0 = bug.columns
>>> col1 = fits.ColDefs([fits.Column(name='new',format='D',array=bug.field("V_mag")+1.)])
it's doing its best here to figure out the intent, but col0 here has no idea that bug is a slice of the original table anymore, it only has the original "coldefs" with the full columns to rely on here.
Most of these classes, including FITS_rec, Column, and especially ColDefs almost never need to be used directly anymore. Unfortunately not all of the documentation has been updated to reflect this fact, and there are a lot of older tutorials and example code that show usage of these classes. Nobody with the requisite expertise has been able to take the time to update the docs and clarify this point.
On occasion Column is useful if you already have columnar data with each column in a separate array, and you want to build a table from it and give some specific FITS attributes to the table columns. But I have redesigned much of the API so that you can take native Python data structures like Numpy arrays and save them to FITS files without worrying about the details of how FITS is implemented or annoying things like FITS data format codes in many cases.
This work is slightly incomplete, because it seems if you want to define a FITS table from some columnar arrays, you still need to use the Column class and specify a FITS format at a minimum (but you never need to use ColDefs directly):
>>> hdu = fits.BinTableHDU.from_columns([fits.Column(name='target', format='20A', array=a1), fits.Column(name='V_mag', format='E', array=a2)])
>>> hdu.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
However, you can also work with Numpy structured arrays directly, and I usually find that simpler personally, as it allows you to ignore most FITS-isms and just focus on your data, for those cases where it's not important to finely tweak the FITS-specific stuff. For example, to define a structured array for your data, there are several ways to go about that, but you might try:
>>> nrows = 3
>>> data = np.empty(nrows, dtype=[('target', 'S20'), ('V_mag', np.float32)])
>>> data['target'] = a1
>>> data['V_mag'] = a2
>>> data
array([('NGC1001', 11.100000381469727), ('NGC1002', 12.300000190734863),
('NGC1003', 15.199999809265137)],
dtype=[('target', 'S20'), ('V_mag', '<f4')])
and then you can instantiate a BinTableHDU directly from this array:
>>> hdu = fits.BinTableHDU(data)
>>> hdu.data
FITS_rec([('NGC1001', 11.1), ('NGC1002', 12.3), ('NGC1003', 15.2)],
dtype=(numpy.record, [('target', 'S20'), ('V_mag', '<f4')]))
>>> hdu.header
XTENSION= 'BINTABLE' / binary table extension
BITPIX = 8 / array data type
NAXIS = 2 / number of array dimensions
NAXIS1 = 24 / length of dimension 1
NAXIS2 = 3 / length of dimension 2
PCOUNT = 0 / number of group parameters
GCOUNT = 1 / number of groups
TFIELDS = 2 / number of table fields
TTYPE1 = 'target '
TFORM1 = '20A '
TTYPE2 = 'V_mag '
TFORM2 = 'E '
Likewise when it comes to things like masking and slicing and adding new columns, working directly with the native Numpy data structures is best.
Or, as suggested in the answers to other question, use the Astropy Table API and don't mess with low-level FITS stuff at all if you can help it. Because as I discussed, it contains several layers of legacy interfaces that make things confusing (and that long term should probably be cleaned up, but it's hard to do because code that uses them in some way or another are pervasive). The Table API was designed from the ground-up to make table manupulations, including things like masking rows and adding columns, relatively easy. Whereas the old PyFITS APIs never quite worked for many simple cases.
I hope this answer was edifying--I know it's maybe a bit long and confusing. If there is anything specific I can clear up let me know.