Set days to respective quarter in quarterly format

Set days to respective quarter in quarterly format - date

I have a database of the following type:
Y033RD3Q086SBEA CONSDEF PERIC ... PCDG COMPNFB FEDFUNDS
1994-01-01 135.474 68.102186 2.914123 ... 588.824 52.651 3.05
1994-04-01 135.724 68.477710 2.886523 ... 598.721 53.074 3.56
1994-07-01 135.427 68.966372 2.841751 ... 609.310 53.067 4.26
1994-10-01 134.544 69.321509 2.789701 ... 631.830 53.481 4.76
1995-01-01 133.984 69.661335 2.744497 ... 621.252 53.799 5.53
... ... ... ... ... ... ...
2014-10-01 99.149 111.754096 0.761446 ... 1269.727 105.171 0.09
2015-01-01 99.144 111.340463 0.756917 ... 1282.177 106.534 0.11
2015-04-01 98.943 111.902682 0.744341 ... 1307.650 107.357 0.12
2015-07-01 98.508 112.379110 0.732450 ... 1316.681 107.839 0.13
2015-10-01 98.102 112.506108 0.723991 ... 1317.079 107.616 0.12
As one clearly sees, every data point refers to a specific quarter. However, I'd like to have the date index displayed in quarter format, for example 1994-01-01 would be 1994Q1, 1994-04-01 would be 1994Q2 and so on. Is there a simple way to do it?

I have already found a solution which looks very simple. I post it here in case it turns helpful for someone:
df.index=pd.PeriodIndex(df.index, freq='Q')
which trivially changes the index of the dataframe df displayed above into the quarterly correspondent.

Related

How do I calculate a rolling 30 day window in KDB?

I have a keyed table of the form:
t | ar av mr mv
-----------------------------| ----------------------------------------
2016.01.04D09:51:00.000000000| -0.001061315 513 -0.01507338 576
2016.01.04D11:37:00.000000000| -0.0004846135 618 -0.001100514 583
2016.01.04D12:04:00.000000000| -0.0009708739 1619 -0.001653045 1000
I want to calculate the 30 day rolling correlation ar cor mr.
I'm stuck trying to create a self join with wj, but I'm not getting anywhere. Is this the way to do it?

You could do something like:
/-Function which creates the rolling windows (w:window size, s:list)
q)f:{[w;s] (w-1)_({ 1_x,y }\[w#0;s])}
/-e.g.
q)f[3;til 5]
0 1 2
1 2 3
2 3 4
/-Apply cor to each 30-day rolling window as below:
q)ar:exec ar from t;
q)mr:exec mr from t;
q)cor'[f[30;ar]; f[30; mr]]

Analyze weather data stored in csv

I have some weather data stored in a csv file in the form of: „id, date, temperature, rainfall“, with id being the weather station and, obviously, date being the date of measurement. The file contains the data of 3 different stations over a period of 10 years.
What I'd like to do is analyze the data of each station and each year. For example: I'd like to calculate day-to-day differences in temperature [abs((n+1)-n)] for each station and each year.
I thought while-loops could be a possibility, with the loop calculating something as long as the id value is equal to the one in the next row.
But I’ve no idea how to do it.
Best regards

If you still need assistance, I would consider importing the .csv file data using "readtable". So long as only the first row are text, MATLAB will create a 'table' variable (this shouldn't be an issue for a .csv file). The individual columns can be accessed via "tablename.header" and can be reestablished as double data type (ex variable_1=tablename.header). You can then concatenate your dataset as you like. As for sorting by date and station id, I would advocate using "sortrows". For example, if the station id is the first column, sortrow(data,1) will sort "data" by the station id. sortrow(data, [1 2]) will sort "data" by the first column, then by the second column. From there, you can write an if statement to compare the station id's and perform the required calculations. I hope my brief answer is somewhat helpful.
A basic code structure would be:
path=['copy and paste file path here']; % show matlab where to look
data=readtable([path '\filename.csv'], 'ReadVariableNames',1); % read the file from csv format to table
variable1=data.header1 % general example of making double type variable from table
variable2=data.header2
variable3=data.header3
double_data=[variable1 variable2 variable3]; % concatenates the three columns together
sorted_data=sortrows(double_data, [1 2]); % sorts double_data by column 1 then column 2

It always helps to have actual data to work on and specifics as to what kind of output format is expected. Basically, ins and outs :) With the little info provided, I figured I would generate random data for you in the first section, and then calculate some stats in the second. I include the loop as an example since that's what you asked, but I highly recommend using vectorized calculations whenever available, such as the one done in summary stats.
%% example for weather stations
% generation of random data to correspond to what your csv file looks like
rng(1); % keeps the random seed for testing purposes
nbDates = 1000; % number of days of data
nbStations = 3; % number of weather stations
measureDates = repmat((now()-(nbDates-1):now())',nbStations,1); % nbDates days of data ending today
stationIds = kron((1:nbStations)',ones(nbDates,1)); % assuming 3 weather stations with IDs [1,2,3]
temp = rand(nbStations*nbDates,1)*70+30; % temperatures are in F and vary between 30 and 100 degrees
rain = max(rand(nbStations*nbDates,1)*40-20,0); % rain fall is 0 approximately half the time, and between 0mm and 20mm the rest of the time
csv = table(measureDates, stationIds, temp, rain);
clear measureDates stationIds temps rain;
% augment the original dataset as needed
years = year(csv.measureDates);
data = [csv,array2table(years)];
sorted = sortrows( data, {'stationIds', 'measureDates'}, {'ascend', 'ascend'} );
% example looping through your data
for i = 1 : size( sorted, 1 )
fprintf( 'Id=%d, year=%d, temp=%g, rain=%g', sorted.stationIds( i ), sorted.years( i ), sorted.temp( i ), sorted.rain( i ) );
if( i > 1 && sorted.stationIds( i )==sorted.stationIds( i-1 ) && sorted.years( i )==sorted.years( i-1 ) )
fprintf( ' => absolute difference with day before: %g', abs( sorted.temp( i ) - sorted.temp( i-1 ) ) );
end
fprintf( '\n' ); % new line
end
% depending on the statistics you wish to do, other more efficient ways of
% accessing summary stats might be accessible, for example:
grpstats( data ...
, {'stationIds','years'} ... % group by categories
, {'mean','min','max','meanci'} ... % statistics we want
, 'dataVars', {'temp','rain'} ... % variables on which to calculate stats
) % doesn't require data to be sorted or any looping
This produces one line printed for each row of data (and only calculates difference in temperature when there is no year or station change). It also produces some summary stats at the end, here's what I get:
stationIds years GroupCount mean_temp min_temp max_temp meanci_temp mean_rain min_rain max_rain meanci_rain
__________ _____ __________ _________ ________ ________ ________________ _________ ________ ________ ________________
1_2016 1 2016 82 63.13 30.008 99.22 58.543 67.717 6.1181 0 19.729 4.6284 7.6078
1_2017 1 2017 365 65.914 30.028 99.813 63.783 68.045 5.0075 0 19.933 4.3441 5.6708
1_2018 1 2018 365 65.322 30.218 99.773 63.275 67.369 4.7039 0 19.884 4.0615 5.3462
1_2019 1 2019 188 63.642 31.16 99.654 60.835 66.449 5.9186 0 19.864 4.9834 6.8538
2_2016 2 2016 82 65.821 31.078 98.144 61.179 70.463 4.7633 0 19.688 3.4369 6.0898
2_2017 2 2017 365 66.002 30.054 99.896 63.902 68.102 4.5902 0 19.902 3.9267 5.2537
2_2018 2 2018 365 66.524 30.072 99.852 64.359 68.69 4.9649 0 19.812 4.2967 5.6331
2_2019 2 2019 188 66.481 30.249 99.889 63.647 69.315 5.2711 0 19.811 4.3234 6.2189
3_2016 3 2016 82 61.996 32.067 98.802 57.831 66.161 4.5445 0 19.898 3.1523 5.9366
3_2017 3 2017 365 63.914 30.176 99.902 61.932 65.896 4.8879 0 19.934 4.246 5.5298
3_2018 3 2018 365 63.653 30.137 99.991 61.595 65.712 5.3728 0 19.909 4.6943 6.0514
3_2019 3 2019 188 64.201 30.078 99.8 61.319 67.082 5.3926 0 19.874 4.4541 6.3312

Chosing specific dates/hours from an array

I have a matrix that has 3months of data or so..Its a 952x1 matrix with the elements in the following format(3 hourly )
Aug-05-2015 03:00:00
Aug-05-2015 06:00:00
Aug-05-2015 09:00:00
Aug-05-2015 12:00:00
Aug-05-2015 15:00:00
Aug-05-2015 18:00:00
Aug-05-2015 21:00:00
Aug-06-2015 00:00:00
Aug-06-2015 03:00:00
Aug-06-2015 06:00:00
I would want to choose say only day timings/ only night or say for august month alone. How do i do that.
Further to my question, if I have a group of .wav files and Im trying to pick only month wise or do daily psd averages etc or chose files belonging to a month how to go about? The following are first 10 .wav files in a .txt file that are read into matlab code-
AMAR168.1.20150823T200235Z.wav
AMAR168.1.20150823T201040Z.wav
AMAR168.1.20150823T201845Z.wav
AMAR168.1.20150823T202650Z.wav
AMAR168.1.20150823T203455Z.wav
AMAR168.1.20150823T204300Z.wav
AMAR168.1.20150823T205105Z.wav
AMAR168.1.20150823T205910Z.wav
AMAR168.1.20150823T210715Z.wav
yyyymmddTHHMMSSZ.wav is part of the format to get sense of some parameters.
Thanks.

Are these datetimes? If so, you can use logical indexing here if you make use of some of the datetime functions. To get the times in August:
t = datetime(2015, 8, 1, 3, 0, 0) + hours(3:3:3000)';
t(month(t) == 8) % Times in August
To get the times that are during the day or night:
t(hour(t) < 12) % Day times
t(hour(t) >= 12) % Night times

How to match daily data from monthly using Matlab?

I have montly macroeconomic data series and I am planning to use them for a weekly (every Monday) regression analysis. How can I match a data point which release once a month to my date template( 4 times during that month) since the new point release and so on.
for u=2:size(daily,1)
l=find(dailytemplate(u)==monthly)
%# when the monthly date is not equal to my daily template
if isempty(l)
%# I need a clearver code for this part to find the previous release
dailyclose(u)=dailyclose(u-1)
else
dailyclose(u)=monthlyclose(l)
end
end
UPDATE from comment
I have the following monthly macro data. I want to use them to feed the weekly dates. For example, at March 31/03/2012 the M-input was 2.7. So any weekly date after that date should be
W_output=2.7
until the April 30/04/2012. Then the weekly W_output will be 2.3 which is the new monthly point, M_input. The following table provides examples for the weekly W_ouput and monthly M_Input:
08/06/2012 1.7
30/06/2012 1.7
01/06/2012 1.7
31/05/2012 1.7
25/05/2012 2.3
30/04/2012 2.3
18/05/2012 2.3
31/03/2012 2.7
11/05/2012 2.3
29/02/2012 2.9
04/05/2012 2.3
31/01/2012 2.9
27/04/2012 2.7
31/12/2011 3
20/04/2012 2.7

format long g
%Create a vector of dates (what I am assuming your date template looks like, this is march 31 and the next 9 mondays that follow it)
datetemplate = [datenum('2012/03/31')];
for i = 1:10
datetemplate(i + 1) = datetemplate(i) + 7;
end
datetemplate';
%Your macro ecos input and dates
macrochangedate = [datenum('2012/03/31');datenum('2012/04/30')]
macrochangedate = [macrochangedate [2.7; 2.3]]
for i = 1:size(macrochangedate,1)
result(datetemplate >= macrochangedate(i,1)) = macrochangedate(i,2);
end
Results:
result =
2.7
2.7
2.7
2.7
2.7
2.3
2.3
2.3
2.3
2.3
2.3
datestr(datetemplate)
ans =
31-Mar-2012
07-Apr-2012
14-Apr-2012
21-Apr-2012
28-Apr-2012
05-May-2012
12-May-2012
19-May-2012
26-May-2012
02-Jun-2012
09-Jun-2012

How can I convert a 5 digit int date and 7 digit int time to a real date?

I've come across some data where the date for today's value is 77026 and the time (as of a few minutes ago) is 4766011. FYI: today is Fri, 18 Nov 2011 12:54:46 -0600
I can't figure out how these represent a date/time, and there is no supporting documentation.
How can I convert these numbers to a date value?
Some other dates from today are:
77026 | 4765509
77026 | 4765003
77026 | 4714129
77026 | 4617107
And some dates from what is probably yesterday:
77025 | 6292509
77025 | 6238790
77025 | 4009544

Ok, with your expanded examples, it would appear the first number is a day count. That'd put this time system's epoch at
to_days(today) = 734824
734824 - 77025 = 657799
from_days(657799) = Dec 29, 1800
The time values are problematic, it looks like they're decreasing (unless you listed most recent first?), but if they are some "# of intervals since midnight", then centi-seconds could be likely. That'd give us a range of 0 - 8,640,000.
4765509 = 47655.09 seconds -> sec_to_time(47655) = 13:14:15
sec_to_time(47650.03) -> 13:14:10
sec_to_time(47141.29) -> 13:05:41
sec_to_time(46171.07) -> 12:49:31

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Set days to respective quarter in quarterly format - date

I have already found a solution which looks very simple. I post it here in case it turns helpful for someone: df.index=pd.PeriodIndex(df.index, freq='Q') which trivially changes the index of the dataframe df displayed above into the quarterly correspondent.

Related

How do I calculate a rolling 30 day window in KDB?

Analyze weather data stored in csv

Chosing specific dates/hours from an array

How to match daily data from monthly using Matlab?

How can I convert a 5 digit int date and 7 digit int time to a real date?

Categories

Resources