Redshift datediff() vs. date_diff() - amazon-redshift

It appears that Redshift supports two possible functions for computing a time interval distance between two DATE-like objects: DATEDIFF() & date_diff(). The following code snippet provides an example of this behavior:
SELECT datediff(DAYS, '2021-01-01'::DATE, '2021-02-01'::DATE) AS datediff_interval_output
, datediff('day', '2021-01-01'::DATE, '2021-02-01'::DATE) AS datediff_str_literal_output
, date_diff('day', '2021-01-01'::DATE, '2021-02-01'::DATE) AS date_diff_output
;
AWS provides documentation on DATEDIFF(), however no record of date_diff() appears to exist within either Redshift or PostgreSQL documentation. A curious difference between the two functions is that DATEDIFF will accept either a raw interval for its first argument (e.g. DAY, MONTH, SECOND) or a string literal ('day', 'month', 'second'), but date_diff() only accepts string literal interval representations.
The only reference I can find to date_diff() is an entry within pg_catalog which returns the following auto-generated definition:
CREATE FUNCTION date_diff(text, time with time zone, time with time zone) RETURNS bigint
IMMUTABLE
LANGUAGE internal AS
$$
begin
-- missing source code
end;
$$;
Does anyone have additional insight around the origin of the date_diff() function, and why it's included on Redshift but is undocumented?
Note: Using Redshift Version: 1.0.25109

Related

Can I use to_char() and make_date() in postgreSQL table definition?

I'm working on a poc to migrate an on-prem SQL Server database to Amazon Aurora for PostgreSQL. Amazon's Schema Conversion Tool struggled to translate the SQL Server code for the creation of a table on this column:
[DOB] AS (CONVERT([varchar],datefromparts([DOB_year],[DOB_month],[DOB_day]),(120))) PERSISTED,
as the CONVERT function is unsupported in Postgres.
The best translation I can come up with is:
dob varchar(30) GENERATED ALWAYS AS (to_char((make_date(dob_year, dob_month, dob_day))::timestamp, 'YYYY-MM-DD HH24:MI:SS')) STORED,
but neither the SCT nor pgAdmin4 are recognising to_char() and make_date() as functions. 'dob_day', 'dob_month' and 'dob_year' are all column names with datatype of integer. I'm new to all this but another column definition is using other functions, e.g. replace() and right(), successfully, so I'm confused why this isn't working.
When I tried to run the code in pgAdmin I got this error:
ERROR: generation expression is not immutable
SQL state: 42P17
Thanks
to_char() is is not marked as immutable even though in your case it would be. But there are format masks that are not immutable if e.g. time zones or different locales are involved.
If you really want to (or are forced to) convert day,month, year in a formatted string (rather than a proper date which would be the correct thing to do), then you can only achieve this with a custom function:
create function create_string_date(p_year int, p_month int, p_day int)
returns text
as
$$
select to_char(make_date(p_year, p_month, p_day), 'yyyy-mm-dd hh24:mi:ss');
$$
language sql
immutable;
Marking the function as immutable isn't cheating, because we know that with the given input and format string this is indeed immutable.
dob text generated always as (create_string_date(dob_year, dob_month, dob_day)) stored

How to create a new date range type with included upper bound in Postgres?

Postgres comes with a nice feature called Range Types that provides useful range functionality (overlaps, contains, etc).
I am looking to use the daterange type, however I think the type was implemented with an awkward choice: the upper bound of the daterange is excluded. That means that if I defined my value as 2014/01/01 - 2014/01/31, this is displayed as [2014/01/01, 2014/01/31) and the 31st of January is excluded from the range!
I think this was the wrong default choice here. I cannot think of any application or reference in real life that assumes that the end date of a date range is excluded. At least not to my experience.
I want to implement a range type for dates with both lower and upper bounds included, but I am hitting the Postgres documentation wall: References on how to create a new discrete range type are cryptic and lack any examples (taken from the documentation: Creating a canonical function is a bit tricky, since it must be defined before the range type can be declared).
Can someone provide some help on this? Or even directly the implementation itself; it should be 5-10 lines of code, but putting these 5-10 lines together is a serious research effort.
EDIT: Clarification: I am looking for information on how to create the proper type so that inserting [2014/01/01, 2014/01/31] results in a upper(daterange) = '2014/01/31'. With the existing daterange type this value is "converted" to a [2014/01/01, 2014/02/01) and gives a upper(daterange) = '2014/02/01'
Notice the third constructor parameter:
select daterange('2014/01/01', '2014/01/31', '[]');
daterange
-------------------------
[2014-01-01,2014-02-01)
Or a direct cast with the upper bound included:
select '[2014/01/01, 2014/01/31]'::daterange;
daterange
-------------------------
[2014-01-01,2014-02-01)
Edit
Not a new type (wrong approach IMHO) but a proper function:
create function inclusive_upper_daterange(dtr daterange)
returns date as $$
select upper(dtr) - 1;
$$ language sql immutable;
select inclusive_upper_daterange('[2014/01/01, 2014/01/31]'::daterange);
inclusive_upper_daterange
---------------------------
2014-01-31
Following the instructions on Postgres documentation I came up with the following code to create the type I need. However it won't work (read on).
CREATE TYPE daterange_;
CREATE FUNCTION date_minus(date1 date, date2 date) RETURNS float AS $$
SELECT cast(date1 - date2 as float);
$$ LANGUAGE sql immutable;
CREATE FUNCTION dr_canonical(dr daterange_) RETURNS daterange_ AS $$
BEGIN
IF NOT lower_inc(dr) THEN
dr := daterange_(lower(dr) + 1, upper(dr), '[]');
END IF;
IF NOT upper_inc(dr) THEN
dr := daterange_(lower(dr), upper(dr) - 1, '[]');
END IF;
RETURN dr;
END;
$$ LANGUAGE plpgsql;
CREATE TYPE daterange_ AS RANGE (
SUBTYPE = date,
SUBTYPE_DIFF = date_minus,
CANONICAL = dr_canonical
);
As far as I can tell, this definition follows the specification exactly. However it fails at declaring the dr_canonical function with ERROR: SQL function cannot accept shell type daterange_.
It looks like (code also) it is impossible to declare a canonical function using any language other than C! So it is practically impossible to declare a new discrete range type, especially if you use a Postgres cloud service that gives no access to the machine running it. Well played Postgres.
Using PostgresSQL 11 you can solve presentation part using upper_inc function, example:
select
WHEN upper_inc(mydaterange) THEN upper(mydaterange)
ELSE date(upper(mydaterange)- INTERVAL '1 day')
END
I managed to create a custom type for the date range:
CREATE or replace FUNCTION to_timestamptz(arg1 timestamptz, arg2 timestamptz) RETURNS float8 AS
'select extract(epoch from (arg2 - arg1));' LANGUAGE sql STRICT
IMMUTABLE;
;
create type tsrangetz AS RANGE
(
subtype = timestamptz,
subtype_diff =
to_timestamptz
)
;
select tsrangetz(current_date, current_date + 1)
--["2020-10-05 00:00:00+07","2020-10-06 00:00:00+07")
;

Create timestamp index from JSON on PostgreSQL

I have a table on PostgreSQL with a field named data that is jsonb with a lot of objects, I want to make an index to speed up the queries. I'm using few rows to test the data (just 15 rows) but I don't want to have problems with the queries in the future. I'm getting data from the Twitter API, so with a week I get around 10gb of data. If I make the normal index
CREATE INDEX ON tweet((data->>'created_at'));
I get a text index, if I make:
Create index on tweet((CAST(data->>'created_at' AS timestamp)));
I get
ERROR: functions in index expression must be marked IMMUTABLE
I've tried to make it "inmutable" setting the timezone with
date_trunc('seconds', CAST(data->>'created_at' AS timestamp) at time zone 'GMT')
but I'm still getting the "immutable" error. So, How can I accomplish a timestamp index from a JSON? I know that I could make a simple column with the date because probably it will remain constant in the time, but I want to learn how to do that.
This expression won't be allowed in the index either:
(CAST(data->>'created_at' AS timestamp) at time zone 'UTC')
It's not immutable, because the first cast depends on your DateStyle setting (among other things). Doesn't help to translate the result to UTC after the function call, uncertainty has already crept in ...
The solution is a function that makes the cast immutable by fixing the time zone (like #a_horse already hinted).
I suggest to use to_timestamp() (which is also only STABLE, not IMMUTABLE) instead of the cast to rule out some source of trouble - DateStyle being one.
CREATE OR REPLACE FUNCTION f_cast_isots(text)
RETURNS timestamptz AS
$$SELECT to_timestamp($1, 'YYYY-MM-DD HH24:MI')$$ -- adapt to your needs
LANGUAGE sql IMMUTABLE;
Note that this returns timestamptz. Then:
CREATE INDEX foo ON t (f_cast_isots(data->>'created_at'));
Detailed explanation for this technique in this related answer:
Does PostgreSQL support "accent insensitive" collations?
Related:
Query on a time range ignoring the date of timestamps

Using extract function in your own sql function

I'm trying to create the following function into my Postgres database:
create or replace function public.extract_hour(time_param timestamp with time zone) returns double precision language sql as $function$
SELECT EXTRACT(hour from timestamp time_param);
$function$;
but I get the following error:
ERROR: syntax error at or near "time_param"
I tried to put instead of time_param $0, but the same error occur. Can somebody explain me how to solve that?
Obviously, you are using an older version of PostgreSQL (which you neglected to provide). Referring to parameter names is possible since PostgreSQL 9.2. In older versions you can only refer to positional parameters:
CREATE OR REPLACE FUNCTION public.extract_hour(time_param timestamptz)
RETURNS double precision LANGUAGE sql AS
$function$
SELECT EXTRACT(hour from $1);
$function$;
Also I removed the misplaced keyword timestamp.
While referring to parameter names has been possible in PL/pgSQL functions for a long time now.
We fixed a documentation glitch to the opposite just recently.

now() default values are all showing same timestamp

I have created my tables with a column (type: timestamp with timezone) and set its default value to now() (current_timestamp()).
I run a series of inserts in separate statements in a single function and I noticed all the timestamps are equal down to the (ms), is the function value somehow cached and shared for the entire function call or transaction?
That is expected and documented behaviour:
From the manual:
Since these functions return the start time of the current transaction, their values do not change during the transaction. This is considered a feature: the intent is to allow a single transaction to have a consistent notion of the "current" time, so that multiple modifications within the same transaction bear the same time stamp.
If you want something that changes each time you run a statement, you need to use statement_timestamp() or even clock_timestamp() (again see the description in the manual)
now() and current_timestamp (the latter without parentheses - odd SQL standard) are STABLE functions returning the point in time when the transaction started as timestamptz.
Consider one of the other options PostgreSQL offers, in particular statement_timestamp(). The manual:
statement_timestamp() returns the start time of the current statement (more specifically, the time of receipt of the latest command message from the client)
Related:
Difference between now() and current_timestamp