I am analysing intra-day volume profiles on stocks. I have built a (rough) piece of code that does 2 things well, but slowly. One stock can have north of 200k trades over a given period and I want to analyse around 200 stocks.
My code looks over 3 months' worth of trade data, binning the data into 10 minute buckets for each day. I do this to make sure a stock trades at least x value per bucket. I then aggregate the intra day buckets to just time buckets to get a sense of the average volume distribution.
Code sample below just shows how I bin data and then aggregate by bin:
% Totals by time bucket
for i = 1:size(VALUE,1)
MyDay = day(datenum(sprintf('%d',VALUE(i,1)),'yyyymmdd'));
MyMonth = month(datenum(sprintf('%d',VALUE(i,1)),'yyyymmdd'));
MyYear = year(datenum(sprintf('%d',VALUE(i,1)),'yyyymmdd'));
StartHour = hour(VALUE(i,2));
StartMinute = minute(VALUE(i,2));
EndHour = hour(VALUE(i,3));
EndMinute = minute(VALUE(i,3));
if StartMinute ~= 50
t = (day(MyTrades(:,1)) == MyDay & month(MyTrades(:,1)) == MyMonth & year(MyTrades(:,1)) == MyYear & hour(MyTrades(:,1)) == StartHour & minute(MyTrades(:,1)) >= StartMinute & minute(MyTrades(:,1)) <= EndMinute);
else
t = (day(MyTrades(:,1)) == MyDay & month(MyTrades(:,1)) == MyMonth & year(MyTrades(:,1)) == MyYear & hour(MyTrades(:,1)) == StartHour & hour(MyTrades(:,1)) < EndHour & minute(MyTrades(:,1)) >= StartMinute);
end
tt = MyTrades(t,:);
MyVALUE(i,1) = sum(tt(:,5));
end
% Aggregate totals
for ii = 1:50
VWAP(ii,1) = datenum(0,0,0,9,0,0)+datenum(0,0,0,0,10,0)*ii-datenum(0,0,0,0,10,0) ;
VWAP(ii,2) = datenum(0,0,0,9,0,0)+datenum(0,0,0,0,10,0)*ii;
StartTime = VWAP(ii,1);
temp = (VALUE(:,2) == StartTime);
temp2 = VALUE(temp,:);
VWAP(ii,3) = sum(temp2(:,4))/100;
end
Is there a more elegant and (more importantly) faster way of calculating these types of "brute force" analyses?
instead of using a complex data like DateNumber, use the timestamp and make the dateenum only one time for value.
you'll have to rewrite completly your code, but thinking with timestamp is more computing (and DB) friendly
Here some help to convert from timestamp in DateNumber: http://www.mathworks.it/matlabcentral/newsreader/view_thread/119237
Related
I an using a discrete event simulator in AnyLogic. I am having an issue with some code which updates a variable in my simulation. I store both the datetime at which the agent leaves the source block and the datetime at which it enters the sink block. I am trying to record the number of "rule breaks" for all agents. The rule break is defined below (two ways to break):
1) If the agent is received before a certain time (called SDC) and the agent is not completed by 5pm the same day, then the agent has broken the rule
2) If the agent is not completed by the next day at a certain time (called NDC), then the agent has broken the rule
I record a zero or a one for each agent if they break either rule in the variable called RuleBreak. However, in my simulation runs, the variable does not update at all. I hope I am just missing something small. Would appreciate any help! (code below)
Calendar received = Calendar.getInstance();
received.setTime(ReceivedDate);
Calendar completion = Calendar.getInstance();
completion.setTime(Completion);
Calendar SD_at_5 = Calendar.getInstance();
SD_at_5.setTime(ReceivedDate);
SD_at_5.set(Calendar.HOUR_OF_DAY,17);
SD_at_5.set(Calendar.MINUTE, 0);
SD_at_5.set(Calendar.SECOND, 0);
Calendar Tomorrow_at_NDC = Calendar.getInstance();
Tomorrow_at_NDC.setTime(ReceivedDate);
if(Tomorrow_at_NDC.get(Calendar.DAY_OF_WEEK) == 6)
Tomorrow_at_NDC.add(Calendar.DATE, 3);
else
Tomorrow_at_NDC.add(Calendar.DATE, 1);
Tomorrow_at_NDC.add(Calendar.DATE, 1);
Tomorrow_at_NDC.set(Calendar.HOUR_OF_DAY,NDC);
Tomorrow_at_NDC.set(Calendar.MINUTE, 0);
Tomorrow_at_NDC.set(Calendar.SECOND, 0);
int Either_rule_break = 0;
double time_diff_SDC = differenceInCalendarUnits(TimeUnits.SECOND,completion.getTime(),SD_at_5.getTime());
double time_diff_NDC = differenceInCalendarUnits(TimeUnits.SECOND,completion.getTime(),Tomorrow_at_NDC.getTime());
if((received.get(Calendar.HOUR_OF_DAY) < SDC) && (time_diff_SDC <= 0))
Either_rule_break = Either_rule_break + 1;
else
Either_rule_break = Either_rule_break + 0;
if((received.get(Calendar.HOUR_OF_DAY) >= SDC) && (time_diff_NDC <= 0))
Either_rule_break = Either_rule_break + 1;
else
Either_rule_break = Either_rule_break + 0;
if((Either_rule_break >= 1))
RuleBreak = RuleBreak + 1;
else
RuleBreak = RuleBreak + 0;
You haven't really explained where this code is used and what it receives. I assume the code is in a function, called in the sink's on-enter action, where ReceivedDate and Completion are Date instances stored per agent (source exit time and sink entry time, as dates, captured via AnyLogic's date() function).
And looks like your SDC hour-of-day is stored in SDC and your NDC hour-of-day in NDC (with RuleBreak being a variable in Main or similar storing the total number of rule-breaks).
Your calculations look OK except that the Tomorrow_at_NDC Calendar calculation seems wrong: you add 1 day twice (if not Saturday) or 3 days plus 1 day (if Saturday; in a Java Calendar, day-of-week 1 is Monday).
(Your Java is also very 'inefficient' with unnecessary extra local variables and performing logic when you don't need to; e.g., no point doing all the calendar preparation and check for your type 1 rule-break if the receive time is after the SDC hour.)
But are you sure there are any rule-breaks; how have you set up your model to ensure that there are (to test it)? Plus is RuleBreak definitely a variable outside of the agents that flow through your DES (i.e., in Main or similar)? Plus are Completion and ReceivedDate definitely stored per agent so, for example, if your function was called checkForRuleBreaks you would be doing something like the below in your sink on-exit action:
agent.Completion = date(); // Agent received date set earlier in Source action
checkForRuleBreaks(agent.ReceivedDate, agent.Completion);
(In fact, you don't need to store the completion date in the agent at all since that will always be the current sim-date inside your function and so you can just calculate it there.)
I have written a program that generates prime numbers . It works well but I want to speed it up as it takes quite a while for generating the all the prime numbers till 10000
var list = [2,3]
var limitation = 10000
var flag = true
var tmp = 0
for (var count = 4 ; count <= limitation ; count += 1 ){
while(flag && tmp <= list.count - 1){
if (count % list[tmp] == 0){
flag = false
}else if ( count % list[tmp] != 0 && tmp != list.count - 1 ){
tmp += 1
}else if ( count % list[tmp] != 0 && tmp == list.count - 1 ){
list.append(count)
}
}
flag = true
tmp = 0
}
print(list)
Two simple improvements that will make it fast up through 100,000 and maybe 1,000,000.
All primes except 2 are odd
Start the loop at 5 and increment by 2 each time. This isn't going to speed it up a lot because you are finding the counter example on the first try, but it's still a very typical improvement.
Only search through the square root of the value you are testing
The square root is the point at which a you half the factor space, i.e. any factor less than the square root is paired with a factor above the square root, so you only have to check above or below it. There are far fewer numbers below the square root, so you should check the only the values less than or equal to the square root.
Take 10,000 for example. The square root is 100. For this you only have to look at values less than the square root, which in terms of primes is roughly 25 values instead of over 1000 checks for all primes less than 10,000.
Doing it even faster
Try another method altogether, like a sieve. These methods are much faster but have a higher memory overhead.
In addition to what Nick already explained, you can also easily take advantage of the following property: all primes greater than 3 are congruent to 1 or -1 mod 6.
Because you've already included 2 and 3 in your initial list, you can therefore start with count = 6, test count - 1 and count + 1 and increment by 6 each time.
Below is my first attempt ever at Swift, so pardon the syntax which is probably far from optimal.
var list = [2,3]
var limitation = 10000
var flag = true
var tmp = 0
var max = 0
for(var count = 6 ; count <= limitation ; count += 6) {
for(var d = -1; d <= 1; d += 2) {
max = Int(floor(sqrt(Double(count + d))))
for(flag = true, tmp = 0; flag && list[tmp] <= max; tmp++) {
if((count + d) % list[tmp] == 0) {
flag = false
}
}
if(flag) {
list.append(count + d)
}
}
}
print(list)
I've tested the above code on iswift.org/playground with limitation = 10,000, 100,000 and 1,000,000.
This question already has an answer here:
Weekend extraction in Matlab
(1 answer)
Closed 6 years ago.
I were able to successfully made a schedule in which the output is 1 if time is between 7 AM-5PM and otherwise 0, time is based on my computer. However the day Monday-Sunday is based on my computer as well.. I cant find the solution to put an output 1 on Monday-Saturday and output 0 on Sunday. The code I have is below
function y = IsBetween5AMand7PM
coder.extrinsic('clock');
time = zeros(1,6);
time = clock;
current = 3600*time(4) + 60*time(5) + time(6); %seconds passed from the beginning of day until now
morning = 3600*7; %seconds passed from the beginning of day until 7AM
evening = 3600*17; %seconds passed from the beginning of day until 5PM
y = current > morning && current < evening;
end
Now the time here is correct already what I need is for the day (Monday-Sunday) to have my needed output. Also this matlab code is inside a matlab function on Simulink block.
If you use weekday like this, you can generate a 0/1 value as you specified for today's date:
if (weekday(now) > 1)
day_of_week_flag = 1;
else
day_of_week_flag = 0;
or if you like, this one-liner does the same thing, but may not be as easy to read if you're not familiar with the syntax:
day_of_week_flag = ( weekday(now) > 1);
You can also use date-strings like this to convert other dates:
day_of_week_flag = ( weekday('01-Mar-2016') > 1 )
Finally, if you have a numeric array of date/time values, like [2016 3 3 12 0 0], you first need to convert to a serial date using datenum, then use weekday:
time = clock;
day_of_week_flag = ( weekday(datenum(time)) > 1);
An alternate way to check without using weekday is the following:
time = clock;
day_of_week = datestr(time, 8);
if (day_of_week == 'Sun')
day_of_week_flag = 0;
else
day_of_week_flag = 1;
I have a text file that contains timestamps out of a camera that captures 50 frames per second .. The data are as follows:
1 20931160389
2 20931180407
3 20931200603
4 20931220273
5 20931240360
.
.
50 20932139319
... and so on.
It gives also the starting time of capturing like
Date: **02.03.2012 17:57:01**
The timestamps are in microseconds not in milliseconds, and MATLAB can support only till milliseconds but its OK for me.
Now I need to know the human format of these timestamps for each row..like
1 20931160389 02.03.2012 17:57:01.045 % just an example
2 20931180407 02.03.2012 17:57:01.066
3 20931200603 02.03.2012 17:57:01.083
4 20931220273 02.03.2012 17:57:01.105
5 20931240360 02.03.2012 17:57:01.124
and so on
I tried this:
%Refernce Data
clc; format longg
refTime = [2012,03,02,17,57,01];
refNum = datenum(refTime);
refStr = datestr(refNum,'yyyy-mm-dd HH:MM:SS.FFF');
% Processing data
dn = 24*60*60*1000*1000; % Microseconds! I have changed this equation to many options but nothing was helpful
for i = 1 : size(Data,1)
gzTm = double(Data{i,2}); %timestamps are uint64
gzTm2 = gzTm / dn;
gzTm2 = refNum + gzTm2;
gzNum = datenum(gzTm2);
gzStr = datestr(gzNum,'yyyy-mm-dd HH:MM:SS.FFF'); % I can't use 'SS.FFFFFF'
fprintf('i = %d\t Timestamp = %f\t TimeStr = %s\n', i, gzTm, gzStr);
end;
But I got always strange outputs like
i = 1 Timestamp = 20931160389.000000 TimeStr = **2012-03-08 13:29:28.849**
i = 2 Timestamp = 20931180407.000000 TimeStr = **2012-03-08 13:29:29.330**
i = 3 Timestamp = 20931200603.000000 TimeStr = **2012-03-08 13:29:29.815**
The output time is about some hours late/earlier than the Referenced Time. The day is different.
The time gap between each entry in the array should be nearly 20 seconds..since I have 50 frames per second(1000 millisecond / 50 = 20) ..and the year,month, day,hour,minute and seconds should also indicate the initial time given as reference time because it is about some seconds earlier.
I expect something like:
% just an example
1 20931160389 02.03.2012 **17:57:01.045**
2 20931180407 02.03.2012 **17:57:01.066**
Could one help me please..! Where is my mistake?
It looks like you can work out the number of microseconds between a record and the first record:
usecs = double(Data{i,2}) - double(Data{1,2});
convert that into seconds:
secsDiff = usecs / 1e6;
then add that to the initial datetime you'd calculated:
matDateTime = refNum + secsDiff / (24*60*60);
I have some code that is taking a long time to run(several hours) and I think it is because it is doing a lot of comparisons in the if statement. I would like it to run faster, does anyone have any helpful suggestions to improve the runtime? If anyone has a different idea of what is slowing the code down so I could try and fix that it would be appreciated.
xPI = zeros(1,1783);
argList2 = zeros(1,1783);
aspList2 = zeros(1,1783);
cysList2 = zeros(1,1783);
gluList2 = zeros(1,1783);
hisList2 = zeros(1,1783);
lysList2 = zeros(1,1783);
tyrList2 = zeros(1,1783);
minList= xlsread('20110627.xls','CM19:CM25');
maxList= xlsread('20110627.xls','CN19:CN25');
N = length(pIList);
for i = 1:N
if (argList(i)>= minList(1) && argList(i) <= maxList(1)) ...
&& (aspList(i)>= minList(2) && aspList(i) <= maxList(2)) ...
&& (cysList(i)>= minList(3) && cysList(i) <= maxList(3)) ...
&& (gluList(i)>= minList(4) && gluList(i) <= maxList(4)) ...
&& (hisList(i)>= minList(5) && hisList(i) <= maxList(5)) ...
&& (lysList(i)>= minList(6) && lysList(i) <= maxList(6)) ...
&& (tyrList(i)>= minList(7) && tyrList(i) <= maxList(7))
xPI(i) = pIList(i);
argList2(i) = argList(i);
aspList2(i) = aspList(i);
cysList2(i) = cysList(i);
gluList2(i) = gluList(i);
hisList2(i) = hisList(i);
lysList2(i) = lysList(i);
tyrList2(i) = tyrList(i);
disp('passed test');
end
end
You can try vectorising the code; I have made up some sample data sets and duplicated some of the operations you're performing below.
matA1 = floor(rand(10)*1000);
matB1 = floor(rand(10)*1000);
matA2 = zeros(10);
matB2 = zeros(10);
minList = [10, 20];
maxList = [100, 200];
indicesToCopy = ( matA1 >= minList(1) ) & ( matA1 <= maxList(1) ) & ( matB1 >= minList(2) ) & ( matB1 <= maxList(2) );
matA2(indicesToCopy) = matA1(indicesToCopy);
matB2(indicesToCopy) = matB1(indicesToCopy);
No idea whether this is any faster, you'll have to try it out.
EDIT:
This doesn't matter too much since you're only making two calls, but xlsread is horribly slow. You can speed up those calls by using this variant syntax of the function.
num = xlsread(filename, sheet, 'range', 'basic')
The catch is that the range argument is ignored and the entire sheet is read, so you'll have to mess with indexing the result correctly.
Use the profiler to see which lines or functions are using the most execution time.
You can probably get a huge increase in execution speed by vectorizing your code. This means using operations which operate on an entire vector at once, instead of using a for-loop to iterate through it. Something like:
% make a logical vector indicating what you want to include
ii = (argList >= minList(1) & argList <= maxList(1)) & ...
% use it
argList2(ii) = arglist(ii); % copies over every element where the corresponding ii is 1
...