Optimizing query for huge database - postgresql

I am using a SQL query for monthly extraction of data from a huge PostgreSQL replica database which stores location data. Currently I have split it into 3 parts (10 days each) and each part is taking roughly 21 hours to complete. Was wondering if there is any way to optimize the query and process the data more quickly.
select
asset_dcs.registration_number,
date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') AS bussines_date,
min(seq_num) as min_seq_num,
max(seq_num) as max_seq_num,
count (*) row_count
from dcs_posn
LEFT OUTER JOIN asset_dcs on (asset_id = asset_dcs.id)
where 1=1
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') > '2015-12-31'
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') <= '2016-01-10'
group by asset_id, bussines_date, asset_dcs.registration_number;

The most obvious improvement is in your filter:
where 1=1
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') > '2015-12-31'
and date_trunc('day', transmitter_received_dttm + '08:00:00' + '-04:00:00') <= '2016-01-10'
should be rewritten as:
WHERE transmitter_received_dttm > '2015-12-31 20:00:00'::timestamp
AND transmitter_received_dttm <= '2016-01-10 20:00:00'::timestamp
The date_trunc() function is very wasteful the way that you use it.
Otherwise you should add an EXPLAIN ... to your question so that we can see the query plan, as well as other performance-related information such as any indexes.

Related

extract days of daterange grouped by month postresql

I have a pickupDate and returnDate in my OrderHistory table. I want to extract the sum of rental days of all OrderHistory entries, grouped/ordered by month. A cte seems to be the solution but I don´t get how to implement it in my query since the cte´s i saw were refering to themselves where it says "FROM cte".
I tried something like this:
SELECT
SUM((EXTRACT (DAY FROM("OrderHistory"."returnDate")-("OrderHistory"."pickupDate")))) as traveltime
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
M
ORDER BY
M
But the outcome doesn´t split bookings btw two months (e.g. pickupDate=27th march 2022 and returnDate=03rd of april 2022) but will assign the whole 7 days to the month of march, since the returndate is in it. It should show 4 days in march and 3 in april.
Sorry for the probably very stupid question but I am a beginner. (my code is written in postgresql btw)
PostgreSQL naming conventions
Are PostgreSQL column names case-sensitive?
use legal, lower-case names exclusively so double-quoting is not
needed.
Final result in db fiddle
Add daterange column.
alter table order_history add column date_ranges daterange;
update order_history
with a(m_begin, m_end, pickup_date) as
(select date_trunc('month', pickup_date)::date,
(date_trunc('month', pickup_date) + interval '1 month - 1 day')::date,
pickup_date from order_history)
update order_history set date_ranges =
daterange(a.m_begin, a.m_end,'[]') from a
where a.pickup_date = order_history.pickup_date;
then final query:
WITH A AS(
select
pickup_date,
return_date,
return_date - pickup_date as total,
case when return_date <# date_ranges then (return_date - pickup_date)
else ( date_trunc('month', pickup_date) + interval '1 month - 1 day')::date - pickup_date
end partial_mth
from order_history),
b as (SELECT *, a.total - partial_mth parital_not_mth FROM a)
select *,
case when to_char(pickup_date,'YYYY-MM') = to_char(return_date,'YYYY-MM')
then
sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM')) +
sum(parital_not_mth) over (partition by to_char(return_date,'YYYY-MM'))
else sum(partial_mth) over(partition by to_char(pickup_date,'YYYY-MM'))
end
from b;
After trying different things I think I found the best answer to my question, that I want to share with the community:
WITH hier as (
SELECT
"OrderHistory"."pickupDate" as start_date
, "OrderHistory"."returnDate" as end_date
, to_char("OrderHistory"."pickupDate"::date, 'YYYY-MM') as M
FROM
"OrderHistory"
GROUP BY
1, 2, 3
ORDER BY
3
), calendar as (
select date '2022-01-01' + (n || ' days')::interval calendar_date
from generate_series(0, 365) n
)
select
to_char(calendar_date::date, 'YYYY-MM')
, count(*) as tage_gebucht
from calendar
inner join hier on calendar.calendar_date between start_date and end_date
where calendar_date between '2022-01-01' and '2022-12-31'
group by 1
order by 1;
I think this is the simplest solution I came up with.

Count Distinct on Converted and Concatenated Columns and Grouping by Each Record

select
distinct convert(varchar(8), Creative.Width) + 'x' + convert(varchar(8), Creative.Height) as FormatName
from Creative
where CreativeFileDate > '1 SEP 18'
Query pulls the unique records as my per my concatenation. How do I most efficiently find the counts of each now?
Thank you
You'll want to use group by and count
https://www.w3schools.com/sql/sql_groupby.asp
https://www.w3schools.com/sql/sql_count_avg_sum.asp
My SQL is a bit rusty, but your query should look something like:
select convert(varchar(8), Creative.Width) + 'x' + convert(varchar(8), Creative.Height) as FormatName, count(0)
from Creative
where CreativeFileDate > '1 SEP 18'
group by convert(varchar(8), Creative.Width) + 'x' + convert(varchar(8), Creative.Height)
Cheers

Unify select sql. Postgres

I can unify the two select below in a single, where in the first column return the result of the first and second column the result of the second.
select count(*) from rrhh.empleado where fecha_contratado > current_date - interval '100 days'; // select1
select count(*) from rrhh.empleado where fecha_fin_contrato > current_date - interval '100 days'; //select2
Thank you
try:
with a as (
select
case when fecha_contratado > current_date - interval '100 days' then 1
else 0 end q1
, case when fecha_fin_contrato > current_date - interval '100 days' then 1
else 0 end q2
from rrhh.empleado
)
select sum(q1), sum(q2)
from a
;
This is a typical case for conditional aggregation:
select count(*) filter (where fecha_contratado > current_date - interval '100 days'),
count(*) filter (where fecha_fin_contrato > current_date - interval '100 days')
from rrhh.empleado
You can use the CASE expression (and the fact that most aggregates does not use NULL values) for versions earlier than 9.4:
select count(case when fecha_contratado > current_date - interval '100 days' then 1 end),
count(case when fecha_fin_contrato > current_date - interval '100 days' then 1 end)
from rrhh.empleado
Note: these queries will scan the whole table, while your original queries could make use of indexes on fecha_contratado and fecha_fin_contrato. If performance matters to you, you could append a filter to these queries too:
where least(fecha_contratado, fecha_fin_contrato) > current_date - interval '100 days'
and you could index the expression: least(fecha_contratado, fecha_fin_contrato).

SQL Server - WHERE Date Range & GROUP BY MonthName

I have 2 same queries (to return "MonthName Year" and count) as below, but only the date range in the WHERE condition is different. Query 1 gets only the June month count, while Query 2 gets count from Apr to Jul, where the Jun month count (in Query 2) is not same as June month count from Query 1. Please advise.
Query 1:
SELECT DATENAME(MONTH, SubmissionDate) + ' ' + DateName(Year, SubmissionDate) AS MonthNumber, COUNT(1) AS InquiryCount
, Cast(Datename(MONTH,SubmissionDate) + ' ' + Datename(YEAR,SubmissionDate) AS DATETIME) AS tmp
FROM [dbo].[InvestigationDetails] (nolock)
WHERE SubmissionDate>= '06/01/2016'
AND SubmissionDate <= '06/30/2016'
GROUP BY DATENAME(MONTH, SubmissionDate) + ' ' + DateName(Year, SubmissionDate), DateName(Year, SubmissionDate)
ORDER BY tmp ASC
Query 2:
SELECT DATENAME(MONTH, SubmissionDate) + ' ' + DateName(Year, SubmissionDate) AS MonthNumber, DateName(Year, SubmissionDate), COUNT(1) AS InquiryCount
, Cast(Datename(MONTH,SubmissionDate) + ' ' + Datename(YEAR,SubmissionDate) AS DATETIME) AS tmp
FROM [dbo].[InvestigationDetails] (nolock)
WHERE SubmissionDate>= '04/01/2016'
AND SubmissionDate <= '07/31/2016'
GROUP BY DATENAME(MONTH, SubmissionDate) + ' ' + DateName(Year, SubmissionDate), DateName(Year, SubmissionDate)
ORDER BY tmp ASC
Thanks,
Jay
SubmissionDate must be of type DATETIMEand thus, you are missing all values for your last day, 06/30/2016, since this equates to 06/30/2016 00:00:00. This means any records that have SubmissionDate with a time > 00:00:00 on 6/30/2016 will be excluded. For example, 6/30/2016 12:44:22 wouldn't be included in your results with your current logic.
Use one of these instead:
AND SubmissionDate < '07/01/2016'
AND SubmissionDate <= '06/30/2016 23:59:59.999'
The first method is preferred since you will get all records before 7/1/2016, which includes 6/30/2016 23:59:59.999. Of course, you should be aware of how precise DATETIME can be in SQL Server. Run the code below to see what I mean.
declare #dt datetime2 = getdate()
select #dt --more precise with datetime2
select getdate() --not as precise

Select today's (since midnight) timestamps only

I have a server with PostgreSQL 8.4 which is being rebooted every night at 01:00 (don't ask) and need to get a list of connected users (i.e. their timestamps are u.login > u.logout):
SELECT u.login, u.id, u.first_name
FROM pref_users u
WHERE u.login > u.logout and
u.login > now() - interval '24 hour'
ORDER BY u.login;
login | id | first_name
----------------------------+----------------+-------------
2012-03-14 09:27:33.41645 | OK171511218029 | Alice
2012-03-14 09:51:46.387244 | OK448670789462 | Bob
2012-03-14 09:52:36.738625 | OK5088512947 | Sergej
But comparing u.login > now()-interval '24 hour' also delivers the users before the last 01:00, which is bad, esp. in the mornings.
Is there any efficient way to get the logins since the last 01:00 without doing string acrobatics with to_char()?
This should be 1) correct and 2) as fast as possible:
SELECT u.login, u.id, u.first_name
FROM pref_users u
WHERE u.login >= now()::date + interval '1h'
AND u.login > u.logout
ORDER BY u.login;
As there are no future timestamps in your table (I assume), you need no upper bound.
Some equivalent expressions:
SELECT localtimestamp::date + interval '1h'
, current_date + interval '1h'
, date_trunc('day', now()) + interval '1h'
, now()::date + interval '1h'
now()::date used to perform slightly faster than CURRENT_DATE in older versions, but that's not true any more in modern Postgres. But either is still faster than LOCALTIMESTAMP in Postgres 14 for some reason.
date_trunc('day', now()) + interval '1h' slightly differs in that it returns timestamptz. But it is coerced to timestamp according to the timezone setting of the current session in comparison to the timestamp column login, doing effectively the same.
See:
Ignoring time zones altogether in Rails and PostgreSQL
To return rows for the previous day instead of returning nothing when issued between 00:00 and 01:00 local time, use instead:
WHERE u.login >= (LOCALTIMESTAMP - interval '1h')::date + interval '1h'
select * from termin where DATE(dateTimeField) >= CURRENT_DATE AND DATE(dateTimeField) < CURRENT_DATE + INTERVAL '1 DAY'
This works for me - it selects ALL rows with todays Date.
select * from termin where DATE(dateTimeField) = '2015-11-17'
This works well for me!
An easy way of getting only time stamps for the current day since 01:00 is to filter with
CURRENT_DATE + interval '1 hour'
So your query should look like this:
SELECT u.login, u.id, u.first_name
FROM pref_users u
WHERE u.login > u.logout AND
u.login > CURRENT_DATE + interval '1 hour'
ORDER BY u.login;
Hope that helps.
where
u.login > u.logout
and
date_trunc('day', u.login) = date_trunc('day', now())
and
date_trunc('hour', u.login) >= 1
All answers so far are incorrect because they give the wrong answer between 0.00 and 1.00. So if you happen to run the query in that time period you get no results. Based on #ErwinBrandstetter's answer, what you want is this:
WHERE u.login > u.logout
AND u.login >= CASE WHEN NOW()::time < '1:00'::time THEN NOW()::date - INTERVAL '23 HOUR' ELSE NOW()::date + INTERVAL '1 HOUR' END;
I would love to do without the conditional but found no way to.
Edit: #ErwinBrandstetter did do it without a conditional, leaving this here for completeness.