casting text to date in redshift - amazon-redshift

I have saved date as text type. There are a few invalid dates those are preventing me from running any date related operation. For e.g.
select case when deliver_date = '0000-00-00 00:00:00' then '2014-01-01' else deliver_date::date end as new_date, count(*) as cnt from some_table group by new_date
Error in query: ERROR: Error converting text to date
I am using the following work-around that seems to be working.
select left(deliver_date,10) as new_date, count(*) as cnt from sms_dlr group by new_date
But I will like to know if it is possible to convert this column to date.

You need to separate the valid and invalid date values.
One solution is to use regular expressions- I'll let you decide how thorough you want to be, but this will broadly cover date and datetime values:
SELECT
CASE
WHEN
deliver_date SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2}'
OR deliver_date SIMILAR TO '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}'
THEN TO_DATE(deliver_date, 'YYYY-MM-DD')
ELSE TO_DATE('2014-01-01', 'YYYY-MM-DD')
END as new_date,
COUNT(*) as cnt
FROM some_table
GROUP BY new_date

Try dropping the ::date part:
select
cast(
case
when deliver_date = '0000-00-00 00:00:00' then '2014-01-01'
else deliver_date
end
as date
) as new_date,
count(*) as cnt
from some_table
group by new_date

Related

Postgres - Pass dynamically generated date to where clause

I need to generate series of date till current_date based on job's last run date
last run date ='2022-10-01'
current date = '2022-10-05'
generate date like
varchar dynamic_date = '2022-10-01','2022-10-02','2022-10-03','2022-10-04','2022-10-05'
and pass to where to clause
select *
from t1
where created_date in (dynamic_date)
this is not allowed as dynamic_date is varchar and created_date is date column
trying to find efficient way to do this
You can use generate_series()
select *
from t1
where created_date in (select g.dt::date
from generate_series(date '2022-10-01',
current_date,
interval '1 day') as g(dt)
)
Or even simpler:
select *
from t1
where created_date >= date '2022-10-01'
and created_date <= current_date

Create additional columns based on other column values in PostgreSQL

I have following data in a PostgreSQL table:
trial start_date end_date
1 20_12_2001 20_01_2005
The expected output is below:
trial start_date end_date Date[(start_end_date)] marker_start_end
1 20_12_2001 20_01_2005 20_12_2001 start
1 20_12_2001 20_01_2005 20_01_2005 end
Is there a way to calculate the additional two columns (Date[(start_end_date)], marker_start_end) without join, but a CASE expression
You can use a lateral join to turn two columns into two rows:
select *
from the_table t
cross join lateral (
values (t.start_date, 'start'), (t.end_date, 'end')
) as x(start_end_date, marker);
The UNION ALL solution might be faster though.
UNION ALL
select trial, start_date, end_date, start_date as date, 'start' marker_start_end from table1
union all
select trial, start_date, end_date, end_date as date, 'end' marker_start_end from table1
UNNEST with CASE
select trial, start_date, end_date,
case when a.num = 1 then start_date else end_date end date,
case when a.num = 1 then 'start' else 'end' end marker_start_end from
(
select trial, start_date, end_date,
unnest(array[1,2]) num from table1
) a
Hidden JOIN (but still join)
select
trial,
start_date,
end_date,
case when a.num = 1 then start_date else end_date end date,
marker_start_end
from table1, (values(1,'start'),(2, 'end')) a(num,marker_start_end)
Db fiddle

PGSQL selecting columns on certain conditions

In a table I have 5 columns day1, day2... day5. Records in the table can have all days set to TRUE or few days set to TRUE.
Is there any way in PGSQL to select only those columns of a record which have boolean value as TRUE
Example:
My table is: Course, with columns as Course Name, Day1, Day2, Day3, Day4,Day5 with record set as
English,True,False,True,False,True
German,False,False,True,True,True
French,False,True,False,True,True
What I need to display as result set is:
English,Mon,Wed,Fri
German,Wed,Thu,Fri
French,Tue,Thu,Fri
I believe something like the following should do the job. It's a bit ugly because your schema isn't the most awesome. This should work on Postgres 9+
SELECT course, string_agg(day, ',') as days_of_week
FROM
(
SELECT course, 'Mon' as day FROM yourtable WHERE day1 = 'True'
UNION ALL
SELECT course, 'Tue' as day FROM yourtable WHERE day2 = 'True'
UNION ALL
SELECT course, 'Wed' as day FROM yourtable WHERE day3 = 'True'
UNION ALL
SELECT course, 'Thu' as day FROM yourtable WHERE day4 = 'True'
UNION ALL
SELECT course, 'Fri' as day FROM yourtable WHERE day5 = 'True'
) sub
Function like iif missed suddenly but you could to create it simply:
create or replace function iif(boolean, anyelement, anyelement = null) returns anyelement
language sql
immutable
as $$
select case when $1 is null then null when $1 then $2 else $3 end
$$;
then:
select
course_name,
concat_ws(',', iif(day1,'Mon'), iif(day2,'Tue'), iif(day3,'Wed'), iif(day4,'Thu'), iif(day5, 'Fri'))
from course;

How to "loop" through dates in PostgreSQL

Say I have a query with a nested query inside of a where condition.
SELECT COUNT(id)
FROM table
WHERE create_date = date_trunc('month', current_timestamp)
and id NOT IN (
SELECT DISTINCT id
FROM some_table
WHERE date_trunc('month', current_timestamp)
)
This query gets the metric for this month. However, what if I want it for all months?
I tried this query, although it doesn't seem to run/takes a very long time:
SELECT date_trunc('month', t.create_date), COUNT(id)
FROM table t
WHERE id NOT IN (
SELECT DISTINCT id
FROM some_table tt
WHERE date_trunc('month', tt.create_date)= date_trunc('month', t.create_date)
)
GROUP BY date_trunc('month', t.create_date)
I would like to execute this command via Postgres CLI (from the command line).
Any guidance to make this query more efficient or logical appreciated!

Extracting the number of days from a calculated interval

I am trying to get a query like the following one to work:
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
In the referenced table, to_date and from_date are of type timestamp without time zone. A regular query like
SELECT to_date - from_date FROM histories;
Gives me interval results such as '65 days 04:58:09.99'. But using this expression inside the first query gives me an error: invalid input syntax for type interval. I've tried various quotations and even nesting the query without luck. Can this be done?
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
This makes no sense. INTERVAL xxx is syntax for interval literals. So INTERVAL from_date is a syntax error, since from_date isn't a literal. If your code really looks more like INTERVAL '2012-02-01' then that's going to fail, because 2012-02-01 is not valid syntax for an INTERVAL.
The INTERVAL keyword here is just noise. I suspect you misunderstood an example from the documentation. Remove it and the expression will be fine.
I'm guessing you're trying to get the number of days between two dates represented as timestamp or timestamptz.
If so, either cast both to date:
SELECT to_date::date - from_date::date FROM histories;
or get the interval, then extract the day component:
SELECT extract(day from to_date - from_date) FROM histories;
This example demontrates the creation of a table with trigger which updates the difference between a stop_time and start_time in DDD HH24:MI:SS format where the DDD stands for the amount of dates ...
DROP TABLE IF EXISTS benchmarks ;
SELECT 'create the "benchmarks" table'
;
CREATE TABLE benchmarks (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, id bigint UNIQUE NOT NULL DEFAULT cast (to_char(current_timestamp, 'YYMMDDHH12MISS') as bigint)
, git_hash char (8) NULL DEFAULT 'hash...'
, start_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, stop_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, diff_time varchar (20) NOT NULL DEFAULT 'HH:MI:SS'
, update_time timestamp DEFAULT DATE_TRUNC('second', NOW())
, CONSTRAINT pk_benchmarks_guid PRIMARY KEY (guid)
) WITH (
OIDS=FALSE
);
create unique index idx_uniq_benchmarks_id on benchmarks (id);
-- START trigger trg_benchmarks_upsrt_diff_time
-- hrt = human readable time
CREATE OR REPLACE FUNCTION fnc_benchmarks_upsrt_diff_time()
RETURNS TRIGGER
AS $$
BEGIN
-- NEW.diff_time = age(NEW.stop_time::timestamp-NEW.start_time::timestamp);
NEW.diff_time = to_char(NEW.stop_time-NEW.start_time, 'DDD HH24:MI:SS');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_benchmarks_upsrt_diff_time
BEFORE INSERT OR UPDATE ON benchmarks
FOR EACH ROW EXECUTE PROCEDURE fnc_benchmarks_upsrt_diff_time();
--
-- STOP trigger trg_benchmarks_upsrt_diff_time
Just remove the keyword INTERVAL:
SELECT EXTRACT(DAY FROM to_date - from_date) FROM histories;