Need to generate random date(1st of every month) selected from a given date range in hive (inclusive range).
For example if range is 25/12/2021 - 01/06/2022, then I want to select random date from this set of dates{01/01/2022, 01/02/2022, 01/03/2022, 01/04/2022, 01/05/2022, 01/06/2022).
Can any one guide me with my query?
I tried using
select concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0));
but this needs date, I need to pass a column value as low range and a particular date as 2nd range. Since there are different dates for different columns for the low range to b passed.
You can use below code to calculate a random date between two dates.
select trunc(date_add(start_dt, cast (datediff( end_dt,start_dt)*rand() as INT)),'MM') as random_dt
You can test the logic using below code-
select trunc(date_add('2021-01-17', cast (datediff( '2022-01-27','2021-01-17')*rand() as INT)), 'MM') as random_dt
Explanation -
Idea is to add a random number that is less than date difference to the start date.
datediff() - This returns diff of date as INT.
rand() - This returns a number between 0,1(both included). Which means, your start or end date can be same as random date sometime.
date_add - This adds the random integer to the start date to generate random date.
trunc(dt,'MM') - is going to return first day of the month.
Related
Have a date filter on the dashboard that allows for a custom date range:
Dashboard Date Filter
How can I add a the number of days in the filter to a formula? Just trying to show the number of days in column of a pivot table. In this example the date range is 45 days. The dataset doesn't have one record for each day, so a distinct count of days from the data set returns 42.
Is it possible to use the date from and date to filter values in formula? DDIFF([datefilter-from], [datefilter-to])
Extract the date from the fact table and create a dimension table which will contain all the dates.
Link the date column from fact table and dimension table. Use the dimension table's date column in filters.
I have to calculate the number of days between two dates and I search and I don't find any similar function available in ADF.
what I've noticed so far is that if I want to get the number of days between 2 columns, it means that the columns must be date columns, but I have timestamp columns (date + time)
how can I transform these columns into Date columns? or do you have other idea?
Using the fact that 86,400 is the number of seconds in a day
Now, using the function
ticks,
it returns the ticks property value for a specified timestamp. A tick
is a 100-nanosecond interval.
#string(div(sub(ticks(last_date),ticks(first_date)),864000000000))
Can re-format any type timestamp using function formatDateTime()
#formatDateTime(your_time_stamp,'yyyy-MM-dd HH:mm:ss')
Example:
#string(div(sub(ticks('2022-02-23 15:58:16'),ticks('2022-01-31 15:58:16')),864000000000))
This is the expression that I used for Data Flow.
toDate(toString({max_po create date},'yyyy-MM-dd')) - toDate(toString(max_datetimetoday,'yyyy-MM-dd'))
max_po, create date and max_datetimetoday are TimeStamp(date + time) columns.
The result is in days.
I have a date column which I am trying to query to return only the largest date per month.
What I currently have, albeit very simple, returns 99% of what I am looking for. For example, If I list the column in ascending order the first entry is 2016-10-17 and ranges up to 2017-10-06.
A point to note is that the last day of every month may not be present in the data, so I'm really just looking to pull back whatever is the "largest" date present for any existing month.
The query I'm running at the moment looks like
SELECT MAX(date_col)
FROM schema_name.table_name
WHERE <condition1>
AND <condition2>
GROUP BY EXTRACT (MONTH FROM date_col)
ORDER BY max;
This does actually return most of what I'm looking for - what I'm actually getting back is
"2016-11-30"
"2016-12-30"
"2017-01-31"
"2017-02-28"
"2017-03-31"
"2017-04-28"
"2017-05-31"
"2017-06-30"
"2017-07-31"
"2017-08-31"
"2017-09-29"
"2017-10-06"
which are indeed the maximal values present for every month in the column. However, the result set doesn't seem to include the maximum date value from October 2016 (The first months worth of data in the column). There are multiple values in the column for that month, ranging up to 2016-10-31.
If anyone could point out why the max value for this month isn't being returned, I'd much appreciate it.
You are grouping by month (1 to 12) rather than by month and year. Since 2017-10-06 is greater than any day in October 2016, that's what you get for the "October" group.
You should
GROUP BY date_trunc('month', date_col)
I tried this way
select tv.reg_number, tv.make, tv.model, tev.date_taken,tev.date_return, count(tev.date_taken, tev.date_return) as day_difference
from table_vehicle tv, table_evehicle tev
where tv.reg_number=tev.reg_number
and tev.date_return is not null
group by tv.reg_number, tv.make, tv.model, tev.date_taken, tev.date_return;
Is anyone able to help me on this one?
Subtracting one date from another will yield the number of days between the two dates, so assuming both tev.date_taken and tev.date_return of the DATE data type, you can use tev.date_return - tev.date_taken as day_difference to get the number of days between the two dates. If tev.date_taken and/or tev.date_return contain a time components the returned number may include a fractional portion. If you don't want the fractional day, you can TRUNCate, ROUND or take the CEILing of the resulting value.
However, if either value is a TIMESTAMP the resulting value will be an INTERVAL data type. If this is the case, then you can either cast the TIMESTAMP values to DATE values, or EXTRACT(DAY FROM (tev.date_return - tev.date_taken)) as day_difference to get just the truncated numeric number of days between the two dates.
My goal is to create a table that looks something like this using PostgreSQL:
date date_start date_end
12/16/2015 11/17/2015 12/16/2015
12/17/2015 11/18/2015 12/17/2015
etc.
So that I can then join to a different table to get the aggregations for each date on a rolling 30 day window. I've been doing some research and I think generate_series() is what I want to use, but I am unsure.
Something like this:
SELECT '2015-12-16'::date + g - 30 AS date_start
, '2015-12-16'::date + g AS date_end
FROM generate_series (0, 25) g; -- number of rows
I skipped the redundant date column. (You shouldn't use a basic type name as identifier anyways.)
There is a variant of generate_series() that works with timestamps, but the simple version generating integer numbers is just as good for dates. Maybe even better because you avoid possible confusion with time zones.
Always use the ISO 8601 format for date literals, which is unambiguous with any datestyle or locale settings.
Related:
Rolling sum / count / average over date interval