Calculating IV60, and IV90 on interactive brokers - matlab

I am trading options, but I need to calculate the historical implied volatility in the last year. I am using Interactive Broker's TWS. Unfortunately they only calculate V30 (the implied volatility of the stock using options that will expire in 30 days). I need to calculate the implied volatility of the stock using options that will expire in 60 days, and 90 days.
The problem: Calculate the implied volatility of at least a whole year of an individual stock using options that will expire in 60 days and 90 days giving that:
TWS does not provide V60 or V90.
TWS does not provide historical pricing data for individual options for more than 3 months.
The attempted solution:
Use the V30 that TWS provide too come up with V60 and V90 giving the fact that usually option prices will behave like a skew (horizontal skew). However, the problem to this attempted solution is that the skew does not always have a positive slope, so I can't come up with a mathematical solution to always correctly estimate IV60 and IV90 as this can have a positive or negative slope like in the picture below.
Any ideas?

Your question is either confusing or isn't about programming. This is what IB says.
The IB 30-day volatility is the at-market volatility estimated for a
maturity thirty calendar days forward of the current trading day, and
is based on option prices from two consecutive expiration months.
It makes no sense to me and I can't even get those ticks to arrive (generic type 24). But even if you get them, they don't seem to be useful. My guess is it's an average to estimate what the IV would be for an option expiring exactly 30 days in the future. I can't imagine the purpose for this. The data would be impossible to trade with and doesn't represent reality. Imagine an earnings report at 29 or 31 days!
If you'd like the IV about 60 or 90 days in the future call reqMktData with an option contract that expires around then and an empty generic tick list. You will get tick types 10, 11, 12, and 13 which all have an IV. That's how you build the IV surface. If you'd like to build it with a weighted average to estimate 60 days, it's possible.
This is python but should be self explanatory
tickerId = 1
optCont = Contract()
optCont.m_localSymbol = "AAPL 170120C00130000"
optCont.m_exchange = "SMART"
optCont.m_currency = "USD"
optCont.m_secType = "OPT"
tws.reqMktData(tickerId, optCont, "", False)
Then I get data like
<tickOptionComputation tickerId=1, field=10, impliedVol=0.20363398519176756, delta=0.0186015418248492, optPrice=0.03999999910593033, pvDividend=0.0, gamma=0.007611155331932943, vega=0.012855970569816431, theta=-0.005936076573849303, undPrice=116.735001>
If there's something I'm missing about options, you should ask this at https://quant.stackexchange.com/

Related

AnyLogic "triggered by rate" implementation

Anybody has a reference for how AnyLogic implements it's rate per day? Specifically, my agent is at different locations (based on time of day) throughout the day. If there are 10 triggers a day, do they happen randomly for each agent throughout the day, or only at the beginning of a day (when agent is at home), etc.?
Thank you, Amy! Your explanation was very helpful.
The rate follows the Poisson distribution. If you divide 1/rate, you will get an inter-arrival time that follows the exponential.
As this is random, you may not actually get 10 a day - you may get 9 one day and 11 the next. If you want to get exactly 10 in a day, you need to think about writing your own code to make that happen. That might be something like generating 10 dynamic events randomly sampled times that all trigger a transition in their action code (that would not be exponential between events).

Cyclic transformation of dates

I would like to use the day of the year in a machine learning model. As the day of the year is not continuous (day 365 of 2019 is followed by day 1 in 2020), I think of performing cyclic (sine or cosine) transformation, following this link.
However, in each year, there are no unique values of the new transformed variable; for example, two values for 0.5 in the same year, see figures below.
I need to be able to use the day of the year in model training and also in prediction. For a value of 0.5 in the sine transformation, it can be on either 31.01.2019 or 31.05.2019, then using 0.5 value can be confusing for the model.
Is it possible to make the model to differentiate between the two values of 0.5 within the same year?
I am modelling the distribution of a species using Maxent software. The species data is continuous every day in 20 years. I need the model to capture the signal of the day or the season, without using either of them explicitly as categorical variable.
Thanks
EDIT1
Based on furcifer's comment below. However, I find the Incremental modelling approach not useful for my application. It solves the issue of consistent difference between subsequent days; e.g. 30.12.2018, 31.12.2018, and 01.01.2019. But it does not differ than counting the number of days from a certain reference day (weight = 1). Having much higher values on the same date for 2019 than 2014 does not make ecological sense. I hope that interannual changes to be captured from the daily environmental conditions used (explanatory variables). The reason for my need to use day in the model is to capture the seasonal trend of the distribution of a migratory species, without the explicit use of month or season as a categorical variable. To predict suitable habitats for today, I need to make this prediction not only depends on the environmental conditions of today but also on the day of the year.
This is a common problem, but I'm not sure if there is a perfect solution. One thing I would note is that there are two things that you might want to model with your date variable:
Seasonal effects
Season-independent trends and autocorrelation
For seasonal effects, the cyclic transformation is sometimes used for linear models, but I don't see the sense for ML models - with enough data, you would expect a nice connection at the edges, so what's the problem? I think the posts you link to are a distraction, or at least they do not properly explain why and when a cyclic transformation is useful. I would just use dYear to model the seasonal effect.
However, the discontinuity might be a problem for modelling trends / autocorrelation / variation in the time series that is not seasonal, or common between years. For that reason, I would add an absolute date to the model, so use
y = dYear + dAbsolute + otherPredictors
A well-tuned ML model should be able to do the rest, with the usual caveats, and if you have enough data.
This may not the right choice depending on your needs, there are two choices that comes to my mind.
Incremental modeling
In this case, the dates are modeled in a linear fashion, so say 12 Dec, 2018 < 12, Dec, 2019.
For this you just need some form of transformation function that converts dates to numeric values.
As there are many dates that need to be converted to numeric representation, the first thing to make sure is that the output list also has the same order as Lukas mentioned. The easiest way to do this is by adding weight to each unit (weight_year > weight_month > weight_day).
def date2num(date_time):
d, m, y = date_time.split('-')
num = int(d)*10 + int(m)*100 + int(y)*1000 # these weights can be anything as long as
# they are ordered
return num
Now, it's important to normalize the numeric values.
import numpy as np
date_features = []
for d in list(df['date_time']):
date_features.append(date2num(d))
date_features = np.array(date_features)
date_features_normalized = (date_features - np.min(date_features))/(np.max(date_features) - np.min(date_features))
Using the day, month, year as separate features. So, instead of considering the date as whole, we segregate. The motivation is that maybe there will be some relations between the output and a specific date, month, etc. Like, maybe the output suddenly increases in the summer season (specific months) or maybe on weekends (specific days)

Interesting results from LSTM RNN : lagged results for train and validation data

As an introduction to RNN/LSTM (stateless) I'm training a model with sequences of 200 days of previous data (X), including things like daily price change, daily volume change, etc and for the labels/Y I have the % price change from current price to that in 4 months. Basically I want to estimate the market direction, not to be 100% accurate. But I'm getting some odd results...
When I then test my model with the training data, I notice the output from the model is a perfect fit when compared to the actual data, it just lags by exactly 4 months:
When I shift the data by 4 months, you can see it's a perfect fit.
I can obviously understand why the training data would be a very close fit as it has seen it all during training - but why the 4 months lag?
It does the same thing with the validation data (note the area I highlighted with the red box for future reference):
Time-shifted:
It's not as close-fitting as the training data, as you'd expect, but still too close for my liking - I just don't think it can be this accurate (see the little blip in the red rectangle as an example). I think the model is acting as a naive predictor, I just can't work out how/why it's possibly doing it.
To generate this output from the validation data, I input a sequence of 200 timesteps, but there's nothing in the data sequence that says what the %price change will be in 4 months - it's entirely disconnected, so how is it so accurate? The 4-month lag is obviously another indicator that something's not right here, I don't know how to explain that, but I suspect the two are linked.
I tried to explain the observation based on some general underlying concept:
If you don't provide a time-lagged X input dataset (lagged t-k where k is the time steps), then basically you will be feeding the LSTM with like today's closing price to predict the same today's closing price..in the training stage. The model will (over fit) and behave Exactly as the answer is known already (data leakage)
If the Y is the predicted percentage change (ie. X * (1 + Y%) = 4 months future price), the present value Yvalue predicted really is just the future discounted by the Y%
so the predicted value will have 4 months shift
Okay, I realised my error; the way I was using the model to generate the forecast line was naive. For every date in the graph above, I was getting an output from the model, and then apply the forecasted % change to the actual price for that date - that would give predicted price in 4 months' time.
Given the markets usually only move within a margin of 0-3% (plus or minus) over a 4 month period, that would mean my forecasts was always going to closely mirror the current price, just with a 4 month lag.
So at every date the predicted output was being re-based, so the model line would never deviate far from the actual; it'd be the same, but within a margin of 0-3% (plus or minus).
Really, the graph isn't important, and it doesn't reflect the way I'll use the output anyway, so I'm going to ditch trying to get a visual representation, and concentrate on trying to find different metrics that lower the validation loss.

Matlab average number of customers during a single day

I'm having problems creating a graph of the average number of people inside a 24h shopping complex. I have two columns of data on a spreadsheet of the times a customer comes in (intime) and when he leaves (outtime). The data spans a couple of years and is in datetime format (dd-mm-yyyy hh:mm:ss).
I want to make a graph of the data with time of day as x-axis, and average number of people as y-axis. So the graph would display the average number of people inside during the day.
Problems arise because the place is open 24h and the timespan of data is years. Also customer intime & outtime might be on different days.
Example:
intime 2.1.2017 21:50
outtime 3.1.2017 8:31
Any idea how to display the data easily using Matlab?
Been on this for multiple hours without any progress...
Seems like you need to decide what defines a customer being in the shop during the day, is 1 min enough? is there a minimum time length under which you don't want to count it as a visit?
In the former case you shouldn't be concerned with the hours at all, and just count it as 1 entry if the entry and exit are in the same day or as 2 different entries if not.
It's been a couple of years since I coded actively in matlab and I don't have a handy IDE but if you add the code you got so far, I can fix it for you.
I think you need to start by just plotting the raw count of people in the complex at the given times. Once that is visualized it may help you determine how you want to define "average people per day" and how to go about calculating it. Does that mean average at a given time or total "ins" per day? Ex. 100 people enter the complex in a day ... but on average there are only 5 in the complex at a given time. Which stat is more important? Maybe you want both.
Here is an example of how to get the raw plot of # of people at any given time. I simulated your in & out time with random numbers.
inTime = cumsum(rand(100,1)); %They show up randomly
outTime = inTime + rand(100,1) + 0.25; % Stay for 0.25 to 1.25 hrs
inCount = ones(size(inTime)); %Add one for each entry
outCount = ones(size(outTime))*-1; %Subtract one for each exit.
allTime = [inTime; outTime]; %Stick them together.
allCount = [inCount; outCount];
[allTime, idx] = sort(allTime);%Sort the timestamps
allCount = allCount(idx); %Sort counts by the timestamps
allCount = cumsum(allCount); %total at any given time.
plot(allTime,allCount);%total at any given time.
Note that the x-values are not uniformly spaced.
IF you decide are more interested in total customers per day then you could just find the intTimes with in a given time range (each day) & probably just ignore the outTimes all together.

Calculate interest on postgresql with trigger/function

I'm currently working on a simple banking application.
I have built a postgresql database, with the right tables and functions.
My problem is, that I am not sure how to calculate the interest rate on the accounts. I have a function that will tell me the balance, on a time.
If we say that we have a 1 month period, where I want to calculate the interest on the account. The balance looks like this:
February Balance
1. $1000
3. $300
10. $700
27. $500
Balance on end of month: $500
My initial thoughts are to make a for loop, looping from the 1st in the month, to the last day in month, and adding the interest earned for that particular day in a row.
The function I want to use at end of month should be something like addInterest(startDate,endDate,accountNumber), which should insert one row into the table, adding the earned rate.
Could someone bring me on the right track, or show me some good learning resources on PL/PGSQL?
Edit
I have been reading a bit on cursors. Should I use a cursor to walk through the table?
I find it a bit confusing to use cursors, anyone here with some well explained examples?
There are various ways of interest calculation in banking system.
Interest = Balance x Rate x Days / Year
Types of Balances
Periodical Aggregate Balance
Daily Aggregate Balance
Types of Rates
Fixed Rate Dynamic Rate (according to balance)
Dynamic Rate (according to term)
Dynamic Rate (according to schedule)
Types of Days/Schedules
End of Day Processing (One day)
End of Month Processing (One month)
End of Quarter Processing (Three months)
End of Half Processing (Six months)
End of Year Processing (One year)
Year Formula
A year could consist of 365 or 366 days.
Your user might want to override number of days in a year, maintain a separate year variable property in your application.
Conclusion
Interest should be calculated as a routine task. Best approach would be that would run on a schedule depending upon the frequency setup of individual accounts.
The manual has a section about loops and looping through query results. There are also examples of trigger functions written in pl/pgsql. The manual is very complete, it's the best source I know of.