How to identify invalid dates in postgres table field? - postgresql

I have a table in PostgreSQL that has two date fields ( start and end ). There are many invalid dates both date fields like 0988-08-11,4987-09-11 etc.. Is there a simple query to identify them? The data type of the field is DATE. Thanks in advance.

Values in a date column ARE valid per definition. The year 0988 = 988 is a valid historic date as well as the year 4987 which is far in the future.
To filter out dates which are too historic or too far in the future you simply make this query:
SELECT
date_col
FROM
table
WHERE
date_col < /* <MINIMUM DATE> */
OR date_col > /* <MAXIMUM DATE> */
For date ranges (your minimum and maximum date) you could use the daterange functionality:
https://www.postgresql.org/docs/current/static/rangetypes.html
https://www.postgresql.org/docs/current/static/functions-range.html
Example table:
start_date end_date
2015-01-01 2017-01-01 -- valid
200-01-01 900-01-01 -- completely too early
3000-01-01 4000-01-01 -- completely too late
0200-01-01 2000-01-01 -- begin too early
2000-01-01 4000-01-01 -- end too late
200-01-01 4000-01-01 -- begin too early, end too late
Query:
SELECT
start_date,
end_date
FROM
dates
WHERE
daterange('1900-01-01', '2100-01-01') #> daterange(start_date, end_date)
Result:
start_date end_date
2015-01-01 2017-01-01
demo:db<>fiddle

Those are valid dates, but if you have business rules that state they are not valid for your purpose, you can delete them based on those rules:
For example, if you don't want any dates prior to 1900 or after 2999, this statement would delete the records with those dates:
DELETE FROM mytable
WHERE
start_date < '1900-01-01'::DATE OR
start_date >= '2999-01-01'::DATE OR
end_date < '1900-01-01'::DATE OR
end_date >= '2999-01-01'::DATE;
If you want to replace the dates with the lowest/highest acceptable dates instead of deleting the entire record, you could do something like this:
UPDATE mytable
SET
start_date = least('2999-01-01'::DATE, greatest('1900-01-01'::DATE, start_date)),
end_date = least('2999-01-01'::DATE, greatest('1900-01-01'::DATE, end_date))
WHERE
start_date < '1900-01-01'::DATE OR
start_date >= '2999-01-01'::DATE OR
end_date < '1900-01-01'::DATE OR
end_date >= '2999-01-01'::DATE;

Related

How do I know that these two dates are in the range of start date and end date? (PostgreSQL)

How do I know that these two dates are in the range of start date and end date?
for example, I have data in the database as follows:
id
start_date
end_date
1
2022-11-07 09:00:00
2022-11-07 16:00:00
2
2022-11-08 10:00:00
2022-11-08 12:00:00
Input:
start_date : '2022-11-07 08:00:00'
end_date : '2022-11-07 09:00:00'
if I input the data above, it means the data is not in the range right? but when I use this query, the data is still in the range
SELECT
start_date,
end_date,
FROM
table
WHERE
(start_date >= '2022-11-07 08:00:00' AND start_date <= '2022-11-07 09:00:00')
OR
(end_date >= '2022-11-07 08:00:00' AND end_date <= '2022-11-07 09:00:00')
Would you like to help me? Thank you in advance.
I want to get the right query for range between two dates
The easiest way to deal with ranges is to use a range type
select *
from the_table
where tsrange(start_date, end_date) #> tsrange('2022-11-07 08:00:00', '2022-11-07 09:00:00');
The #> is the "contains" operator and is true if the left hand range completely includes the range on the right hand side.
The above will create ranges that exclude the upper value. If the ending time should be included use e.g. tsrange(start_date, end_date, '[]')
Alternatively you can use the standard compliant overlaps operator:
select *
from the_table
where (start_date, end_date) overlaps (timestamp '2022-11-07 08:00:00', timestamp '2022-11-07 09:00:00');
I am however not 100% that both solutions behave the same, especially with edge cases.

Updating date field with min date from selected dates if they are >= to current_date, except when all dates are >= to current date

I would like to update the contents of the Date1 column to reflect the oldest date in each row, unless the date has already passed (Date1 < current date), in which case i'd like Date1 to be populated with the 2nd oldest date in the row.
ID
Date 1
Date 2
Date 3
Date 4
001
01/14/2022
01/14/2022
01/15/2022
01/16/2022
002
04/15/2019
04/15/2019
01/10/2021
01/10/2021
I am currently using
update mytable t
set date1 = (
select min(date)
from (values (date2), (date3), (date4)) d(dt)
where dt >= current_date
)
The only problem I run into is when all available dates are prior to the current date. In this case it overwrites the value in the date1 column with null, which is not ideal. I'd like the query to leave the date1 field intact in these instances.
Figured it out:
update mytable t
set date1 = coalesce ((
select min(date)
from (values (date2), (date3), (date4)) d(dt)
where dt >= current_date
), date 1);

How to select specific dates in PostgreSQL?

My table:
create table example
(
code varchar(7),
date date,
CONSTRAINT pk_date PRIMARY KEY (code)
);
Dates:
insert into example(code, date)
values('001','2016/05/12');
insert into example(code, date)
values('002','2016/04/11');
insert into example(code, date)
values('003','2017/02/03');
My problem: how to select the previous dates to six month from today ?
In MySQL I can use PERIOD_DIFF,but, in PostgreSQL?
You can try INTERVAL instruction :
SELECT date
FROM example
WHERE date < CURRENT_DATE + INTERVAL '6 months'
AND date > CURRENT_DATE;
You will get the dates from today to six months.

Postgres where clause compare timestamp

I have a table where column is of datatype timestamp
Which contains records multiple records for a day
I want to select all rows corresponding to day
How do I do it?
Assuming you actually mean timestamp because there is no datetime in Postgres
Cast the timestamp column to a date, that will remove the time part:
select *
from the_table
where the_timestamp_column::date = date '2015-07-15';
This will return all rows from July, 15th.
Note that the above will not use an index on the_timestamp_column. If performance is critical, you need to either create an index on that expression or use a range condition:
select *
from the_table
where the_timestamp_column >= timestamp '2015-07-15 00:00:00'
and the_timestamp_column < timestamp '2015-07-16 00:00:00';

Count data per day in a specific month in Postgresql

I have a table with a create date called created_at and a delete date called delete_at for each record. If the record was deleted, the field save that date; it's a logic delete.
I need to count the active records in a specific month. To understand what is an active record for me, let's see an example:
For this example we'll use this hypothetical record:
id | created_at | deleted_at
1 | 23-01-2014 | 05-06-2014
This record is active for every days between its creation date and delete date. Including that last. So if I need count the active record for March, in this case, this record must be counted in every days of that month.
I have a query (really easy to do) that show the actives records for a specific month, but my principal problem is how to count that actives for each day in that month.
SELECT
date_trunc('day', created_at) AS dia_creacion,
date_trunc('day', deleted_at) AS dia_eliminacion
FROM
myTable
WHERE
created_at < TO_DATE('01-04-2014', 'DD-MM-YYYY')
AND (deleted_at IS NULL OR deleted_at >= TO_DATE('01-03-2014', 'DD-MM-YYYY'))
Here you are:
select
TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i,
count( case (TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i) between created_at and coalesce(deleted_at, TO_DATE('01-03-2014', 'DD-MM-YYYY') + g.i)
when true then 1
else null
end)
from generate_series(0, TO_DATE('01-04-2014', 'DD-MM-YYYY') - TO_DATE('01-03-2014', 'DD-MM-YYYY')) as g(i)
left join myTable on true
group by 1
order by 1;
You can add more specific condition for joining only relevant records from myTable, but even without it gives you idea how to achieve counting as desired.