Date range for backfill between two dates in q kdb - kdb

I want to backfill from a particular date to the latest date(say last working day).
Considering I have total 671 partitions
count .Q.pv / 671j
And we need to backfill for last 10 days
{//backfill function; 0n!x}#'660 11 sublist .Q.pv
Is there any other/better way to provide partition dates to backfill function other then using sublist.

You can use -10#date to get last 10 dates in your hdb.
Another "safer" option would be to use sublist as this will work in the occurrence of you having less than 10 dates in your hdb:
-10 sublist date

To list all dates between two dates you could use the following formula;
q)daterange:{[date1;date2] 1+date1+til date2-date1}
q)daterange[2019.05.29;2019.06.03]
2019.05.30 2019.05.31 2019.06.01 2019.06.02 2019.06.03
This will increment from date1 until as many days as there are between date1 and date2.
Hope this helps

Related

Weekly hour allocation problem in Rails and Postgresql

If I have a list of tasks with a certain date ranges, and the task is broken into weekly hour chunks of work (ie. 30 hours from 2018-12-31 to 2019-01-06 ... etc starting from Monday).
The kind of operations I would like to do are
Display all the weekly hours of all the tasks for a list of users
Sum the weekly hours for a user for all his tasks for the week
When the duration of the task is modified, create/destroy the weekly hour chunks.
Would it be more efficient to store these weekly records as
start date/end date/hours,
year/week number/hours
Storing start/end date probably give more flexibility to the table as it could potentially store non-weekly align hours.
Storing week number means given a date range, creating the weekly chunks is as simple as finding the week number of the start date and the week number of the end date, and populating the weeks in between (without converting to date ranges). Also easier validation for updating the hours for a week, as long as the week number is 1-53.
Wondering if anyone has tried out either option and can give any pointers on their preferred option.
I would probably go for a daterange column.
That gives you the flexibility to have differently sized chunks and allows you to define an exclusion constraint to prevent overlapping ranges.
Finding the row for a given week is still quite simple using the "contains" operator #>, e.g. where the_column #> to_date('2019-24', 'iyyy-iw') finds the row(s) that contain week number 24 in 2019.
The expression to_date('2019-24', 'iyyy-iw') returns the first day (Monday) of the specified week.
Finding all rows that are between two weeks can also be done, however construction the corresponding date range looks a bit ugly. You can either construction an inclusive range with the first and last day: daterange(to_date('2019-24', 'iyyy-iw'), to_date('2019-24', 'iyyy-iw') + 6, '[]')
Or you can create a range with an exclusive upper range with the next week's first day: daterange(to_date('2019-24', 'iyyy-iw'), to_date('2019-25', 'iyyy-iw'), '[)')
While ranges can be indexed quite efficiently and , the required GIST indexes are a bit more expensive to maintain than a B-Tree index on two integer columns.
Another downside of using ranges (if you don't really need the flexibility) is that they take up more space than two integer columns (14 byte instead of 8, or even 4 with two smallint). So if the size of the table is of any concern, then your current solution with the year/week columns is more efficient.
"Storing week number means given a date range, creating the weekly chunks is as simple as finding the week number of the start date and the week number of the end date"
If your input is a start and end date to begin with (rather than a "week number"), then I would definitely go for a daterange column. If that start and end date cover more than one week, then you store only one row, rather than multiple rows.

Extract highest date per month from a list of dates

I have a date column which I am trying to query to return only the largest date per month.
What I currently have, albeit very simple, returns 99% of what I am looking for. For example, If I list the column in ascending order the first entry is 2016-10-17 and ranges up to 2017-10-06.
A point to note is that the last day of every month may not be present in the data, so I'm really just looking to pull back whatever is the "largest" date present for any existing month.
The query I'm running at the moment looks like
SELECT MAX(date_col)
FROM schema_name.table_name
WHERE <condition1>
AND <condition2>
GROUP BY EXTRACT (MONTH FROM date_col)
ORDER BY max;
This does actually return most of what I'm looking for - what I'm actually getting back is
"2016-11-30"
"2016-12-30"
"2017-01-31"
"2017-02-28"
"2017-03-31"
"2017-04-28"
"2017-05-31"
"2017-06-30"
"2017-07-31"
"2017-08-31"
"2017-09-29"
"2017-10-06"
which are indeed the maximal values present for every month in the column. However, the result set doesn't seem to include the maximum date value from October 2016 (The first months worth of data in the column). There are multiple values in the column for that month, ranging up to 2016-10-31.
If anyone could point out why the max value for this month isn't being returned, I'd much appreciate it.
You are grouping by month (1 to 12) rather than by month and year. Since 2017-10-06 is greater than any day in October 2016, that's what you get for the "October" group.
You should
GROUP BY date_trunc('month', date_col)

How to do dynamic date range iteration in Talend?

I have MinLoginTime and MaxLoginTime stored in 2 globalmap variables:
globalMap.put("MinLoginTime","2017-10-24") //ignore the datetime format, but it a date
globalMap.put("MaxLoginTime","2018-04-26")
I want to put month wise iteration and fetch records. i.e. Here we see there are 7 months in example: 10,11,12,1,2,3,4
I want to generate these kind of dates:
FromDate ToDate
2017-10-01 2017-10-31
2017-11-01 2017-11-30
2017-12-01 2017-12-31
...
2018-04-01 2018-04-30
Then, need to iterate over each of these rows and do something (lets use tLog for now)
Could someone please help as to what Talend components can be used here for generating date ranges, where to store them and how to iterate them to do something?
You can achieve this pretty easily using a combination of Talend components and some Java code. Talend has a good collection of date manipulation functions.
First, store your global variable dates as Date type.
globalMap.put("MinLoginTime", TalendDate.parseDate("yyyy-MM-dd", "2017-10-24"))
Then tLoop_1 loops on all the months between your min and max dates. This code gets the number of months between the 2 dates :
TalendDate.diffDate((Date)globalMap.get("MaxLoginTime"),(Date)globalMap.get("MinLoginTime"),"MM")
tJava_3 just stores the date of the current iteration in a CURRENT_DATE global variable. It is the sum of the min date and the current iteration value (from 0 to N months).
globalMap.put("CURRENT_DATE", TalendDate.addDate((Date)globalMap.get("MinLoginTime"), (Integer)globalMap.get("tLoop_1_CURRENT_VALUE"), "MM"))
tFixedFlowInput_1 defines 2 Date columns: FromDate and ToDate in order to get the first and last day of the current iteration's month respectively.
TalendDate.getFirstDayOfMonth((Date)globalMap.get("CURRENT_DATE"))
TalendDate.getLastDayOfMonth((Date)globalMap.get("CURRENT_DATE"))
Check TalendDate class reference for all date manipulation methods.

Finding dates, omitting weekends and holidays

I'm trying to think of the best way to create a function that avoids using loops/while. Any ideas?
Given a function prototype:
{[sd;n:hols]
/ return list of n number of dates <= SD, excluding weekends and hols
}
Thanks and happy holidays
You can use lists and the fact that dates in kdb+ are built on underlying integers:
{[sd;n;hols] d where not[d in hols]&1<mod[d:sd-til n]7}
This uses a til to generate the list of dates up to today, then filters using mod and at the same time checks to make sure the remaining dates aren't in the holiday filter list, before using true results to index back into the generated date list. These will be in descending order, but you can use
{[sd;n;hols] reverse d where not[d in hols]&1<mod[d:sd-til n]7}
To have an ascending date order.
An alternative solution would exclude the holidays before calculating the modulus:
{[sd;n;hols] d where 1<mod[d:except[;hols]sd-til n]7}
The components here are similar to Ryan's answer, other than using "except" to exclude the holidays.
In order to extract exactly n days, you can initially generate a larger list and return a sublist of the correct length, e.g.
{[sd;n;hols] n#d where 1<mod[d:except[;hols]sd-til 2*n]7}
knowing 2000.01.01 and 2000.01.02 are Saturday and Sunday and that mod those dates are 0 and 1 then excluding all dates who's modular is 0 1 I used:
getBusinessDays:{[Dates;N;Hols] N#(Dates*(Dates mod 7) in 2 3 4 5 6) except 2000.01.01,Hols}
Will return the first N business days you entered excluding holidays you chose.

How to get total experience in terms of date object

I have a condition here in which I will have total experience in terms of month and year. For example, two drop down will be there for asking total number of experience in month and year. So if I am working from 1 Jan 2012, then I will write total experience as 3 year and 11 months. Now I have to convert this 3 year and 11 months into date format so that I can save this into database
You could use java.util.Calendar:
Calendar calendar = Calendar.getInstance();
calendar.add(Calendar.MONTH, month);
calendar.add(Calendar.YEAR, year);
Date date = calendar.getTime();
As a word of caution, the day field would be set to today's date. Check the intended behaviour if the current day is outside of the bounds for the target month. For example, setting the month to February when calendar has a day field of 30. It might be wise to set the day to a known, valid value for every month (eg: 1) before setting the month and year.
Use DATE_SUB() function:
Try this:
SELECT DATE_SUB(DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR), INTERVAL 11 MONTH);
You can use mysql's date_sub() function or <date> - interval <expression> unit syntax to subtract an interval from a date.
select date_sub(curdate(),interval '3-11' YEAR_MONTH) as start_date
UPDATE:
Following the conversation between the OP and #eggyal, the OP need to replace the period in the incoming data with - and construct an insert statement as follows:
insert into mytable (...,join_date,...) values (...,date_sub(curdate(),interval '3-11' YEAR_MONTH),...)