Postgres optimize SELECT and indexes - postgresql

I have written a query to summarise financial transactions so that I can go back to any date and find out what the financial position was at that time.
The data is multi company so the data holds information for all the group companies
Being financial information, some of the accounts reset themselves at each year end whilst others have a running balance.
The structure of the tables is:
nominal_account - Return 1 row for each account
nominal-transaction_lines - the full dataset
nominal_period_year - is a summary of transactions that is based on the nominal account, month/year and the financial company.
My SQL below works but takes over a minute to generate (the SQL below is based on today's date).
The query is broken into a few sections.
The first part of the case (na1.id=178) is a special account that is the summary of all income/expenditure records for that financial year.
The second part first looks for any records in the summary table until the last month end and then goes to the transaction table to find any records since the beginning of the current month. Added together that makes the balance.
At present the transaction table has around 25m records and the summary table 26000.
I am not asking for the query to be written for me, just some hints as to how to speed it up. So if anyone could suggest ways to optimize it or which indexes would help to speed it up, I would be very grateful.
SELECT id, nominal_code AS nom_code, description AS descr, COALESCE(management_type,0) AS type,
case
when na1.id = 178
then (SELECT coalesce( (SELECT sum(period_movement)
FROM nominal_period_year JOIN nominal_account on nominal_account.id = nominal_period_year.nominal_account
WHERE (period_key/10000000000) <= 201704 AND
financial_company = 1 AND
nominal_account.profit_or_balance = true)
+
(SELECT sum(period_movement) FROM nominal_period_year WHERE nominal_account = na1.id AND
(period_key/10000000000) <= 201803 AND nominal_period_year.financial_company = 1)
+
(SELECT GREATEST(0, sum(db_Amount)) - GREATEST(0, sum(cr_Amount))
FROM nominal_transaction_lines
WHERE transaction_date between '2018-04-01' AND '2018-04-27'
AND original_id = 0
AND reversed_by = 0
AND status = 'A'
AND financial_company = 1 AND status = 'A' AND nominal_account = na1.id)
,0.00) AS balance)
ELSE
(SELECT coalesce(
(SELECT sum(period_movement) FROM nominal_period_year WHERE nominal_account = na1.id AND
(case
when na1.profit_or_balance = true
then (period_key/10000000000) > 201704
ELSE period_key > 0
end)
AND (period_key/10000000000) <= 201803 AND nominal_period_year.financial_company = 1)
+
(SELECT GREATEST(0, sum(db_Amount)) - GREATEST(0, sum(cr_Amount))
FROM nominal_transaction_lines
WHERE transaction_date between '2018-04-01' AND '2018-04-27'
AND original_id = 0
AND reversed_by = 0
AND financial_company = 1
AND status = 'A'
AND nominal_account = na1.id)
,0) AS balance)
end
FROM nominal_account AS na1
order by nom_code;

Related

Dividing 2 count statements in Postgresql

I do have a question about the division of 2 count statements below, which give me the error underneath.
(SELECT COUNT(transactions.transactionNumber)
FROM transactions
INNER JOIN account ON account.sfid = transactions.accountsfid
INNER JOIN transactionLineItems ON transactions.transactionNumber
= transactionLineItems.transactionNumber
INNER JOIN products ON transactionLineItems.USIM = products.USIM
WHERE products.gender = 'male' AND products.agegroup = 'adult'
AND transactions.transactionDate >= current_date - interval
'730' day)/
(SELECT COUNT(transactions.transactionNumber)
FROM transactions
WHERE transactions.transactionDate >=
current_date - interval '730' day)
ERROR: syntax error at or near "/"
LINE 6: ...tions.transactionDate >= current_date - interval '730' day)/``
What I think the problem is, that the my count statements are creating tables, and the division of the tables is the problem, but how can I make this division work?
Afterwards I want to check the result against a percentage, e.g. < 0.2.
Can anyone help me with this.
Is that your complete query? Something like this works in Postgres 10:
SELECT
(SELECT COUNT(id) FROM test WHERE state = false) / (SELECT COUNT(id) FROM test WHERE state = true) as y
The extra SELECT in front of both sub queries with the division is what's important. Otherwise I also get the error you mentioned.
See also my DB Fiddle version of this query.

How do you organize this query by week

Here is my Query so far:
select one.week, total, comeback, round(comeback)::Numeric / total::numeric * 100 as comeback_percent
FROM
(
SELECT count(username) as total, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, date_trunc ('month', creation_date)::date AS week
FROM users u
left join entries e on u.id = e.user_id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) x
where row = 1
group by week
order by week asc
) one
join
(
SELECT count(username) as comeback, week
FROM
(
select row_number () over (partition by u.id order by creation_date) as row, username, runs_completed, date_trunc ('month', creation_date)::date AS week
FROM entries e
left join users u on e.user_id = u.id
where ((entry_type = 0 and distance >= 1) or (entry_type = 1 and seconds_running >= 600))
) y
where runs_completed > 1 and row = 1
group by week
order by week asc
) two
on one.week = two.week
What I want to accomplish, is return a line graph for users that have completed one run with us, grouped by week, and assign percentages for that week of anyone who has completed a second run EVER, not just within that week. Our funnel has improved by a factor of 5 since we started, yet the line graph that is produced does not show similar results.
I could be incorrectly joining them together, or there may be a cleaner way to use CTE or window functions to perform this query, I am open to any and all suggestions. Thanks!
If you need tables or further information, let me know. I'm happy to provide anything that may be needed.

Looping SQL query - PostgreSQL

I'm trying to get a query to loop through a set of pre-defined integers:
I've made the query very simple for this question.. This is pseudo code as well obviously!
my_id = 0
WHILE my_id < 10
SELECT * from table where id = :my_id`
my_id += 1
END
I know that for this query I could just do something like where id < 10.. But the actual query I'm performing is about 60 lines long, with quite a few window statements all referring to the variable in question.
It works, and gets me the results I want when I have the variable set to a single figure.. I just need to be able to re-run the query 10 times with different variables hopefully ending up with one single set of results.
So far I have this:
CREATE OR REPLACE FUNCTION stay_prices ( a_product_id int ) RETURNS TABLE (
pid int,
pp_price int
) AS $$
DECLARE
nights int;
nights_arr INT[] := ARRAY[1,2,3,4];
j int;
BEGIN
j := 1;
FOREACH nights IN ARRAY nights_arr LOOP
-- query here..
END LOOP;
RETURN;
END;
$$ LANGUAGE plpgsql;
But I'm getting this back:
ERROR: query has no destination for result data
HINT: If you want to discard the results of a SELECT, use PERFORM instead.
So do I need to get my query to SELECT ... INTO the returning table somehow? Or is there something else I can do?
EDIT: this is an example of the actual query I'm running:
\x auto
\set nights 7
WITH x AS (
SELECT
product_id, night,
LAG(night, (:nights - 1)) OVER (
PARTITION BY product_id
ORDER BY night
) AS night_start,
SUM(price_pp_gbp) OVER (
PARTITION BY product_id
ORDER BY night
ROWS BETWEEN (:nights - 1) PRECEDING
AND CURRENT ROW
) AS pp_price,
MIN(spaces_available) OVER (
PARTITION BY product_id
ORDER BY night
ROWS BETWEEN (:nights - 1) PRECEDING
AND CURRENT ROW
) AS min_spaces_available,
MIN(period_date_from) OVER (
PARTITION BY product_id
ORDER BY night
ROWS BETWEEN (:nights - 1) PRECEDING
AND CURRENT ROW
) AS min_period_date_from,
MAX(period_date_to) OVER (
PARTITION BY product_id
ORDER BY night
ROWS BETWEEN (:nights - 1) PRECEDING
AND CURRENT ROW
) AS max_period_date_to
FROM products_nightlypriceperiod pnpp
WHERE
spaces_available >= 1
AND min_group_size <= 1
AND night >= '2016-01-01'::date
AND night <= '2017-01-01'::date
)
SELECT
product_id as pid,
CASE WHEN x.pp_price > 0 THEN x.pp_price::int ELSE null END as pp_price,
night_start as from_date,
night as to_date,
(night-night_start)+1 as duration,
min_spaces_available as spaces
FROM x
WHERE
night_start = night - (:nights - 1)
AND min_period_date_from = night_start
AND max_period_date_to = night;
That will get me all the nights night periods available for all my products in 2016 along with the price for the period and the max number of spaces I could fill in that period.
I'd like to be able to run this query to get all the periods available between 2 and 30 days for all my products.
This is likely to produce a table with millions of rows. The plan is to re-create this table periodically to enable a very quick look up of what's available for a particular date. The products_nightlypriceperiod represents a night of availability of a product - e.g. Product X has 3 spaces left for Jan 1st 2016, and costs £100 for the night.
Why use a loop? You can do something like this (using your first query):
with params as (
select generate_series(1, 10) as id
)
select t.*
from params cross join
table t
where t.id = params.id;
You can modify params to have the values you really want. Then just use cross join and let the database "do the looping."

Doing SELECT from a VIEW is very slow

I have a table say tbl_test. In this table I have 2.4 million records. I run the following three queries which are very very slow and frustrating.
select count(*) from tbl_test;
-- 2.4 mil records in ~9 seconds
select count(*) from tbl_test where status = 'active';
-- 2.4 mil records in ~9 seconds
select count(*) from tbl_test where status = 'inactive';
-- 0 records in ~0 seconds
I have created a view say view_tbl_test using the following query:
create view view_tbl_test as
select * from
(select count(*) count_active from tbl_test where status = 'active' ) x,
(select count(*) count_inactive from tbl_test where status = 'inactive' ) y,
(select count(*) count_total from tbl_test) z
Now, I am picking only the single row from the view and its taking the same amount of time like previous.
select * from view_tbl_test limit 1;
Am I doing something wrong here? Is there any way which can make the view to return data in ~0 seconds?
Your statement runs three selects on the table. This can be done with a single statement:
create view view_tbl_test
as
select count(case when status = 'active' then 1 end) as count_active
count(case when status = 'inactive' then 1 end) as count_inactive,
count(*) as count_total
from tbl_test;
This should run in approx. 9seconds as it essentially does the same as your first statement.
The last statement is probably that fast because you have an index on status but as you did not provide the execution plans, this is nearly impossible to tell.

Optimize recursive query using exclusion list

I'm trying to optimizes a recursive query for speed. The full query runs for 15 minutes.
The part I'm trying to optimize takes ~3.5min to execute, and the same logic is used twice in the query.
Description:
Table Ret contains over 300K rows with 30 columns (Daily snapshot)
Table Ret_Wh is the werehouse for Ret with over 5million rows (Snapshot history, 90days)
datadate - the day the info was recorded (like 10-01-2012)
statusA - a status like (Red, Blue) that an account can have.
statusB - a different status like (Large, Small) that an account can have.
Statuses can change day to day.
old - an integer age on the account. Age can be increased/decreased if there is a payment on the account. Otherwise incerase by 1 with each day.
account - the account number, and primary key of a row.
In Ret the account is unique.
In RetWh account is unique per datadate.
money - dollars in the account
Both Ret and Ret_Wh have the columns listed above
Query Goal: Select all accounts from Ret_Wh that had an age in a certain range, at ANY time during he month, and had a specific status while in that range.
Then select from those results, matching accounts in Ret, with a specific age "today", no matter their status.
My Goal: Do this in a way that doesn't take 3.5 minutes
Pseudo_Code:
#sdt='2012-10-01' -- or the beginning of any month
#dt = getdate()
create table #temp (account char(20))
create table #result (account char(20), money money)
while #sdt < #dt
BEGIN
insert into #temp
select
A.account
from Ret_Wh as A
where a.datadate = #sdt
and a.statusA = 'Red'
and a.statusB = 'Large'
and a.old between 61 and 80
set #sdt=(add 1 day to #sdt)
END
------
select distinct
b.account
,b.money
into #result
from #temp as A
join (Select account, money from Ret where old = 81) as B
on A.account=B.account
I want to create a distinct list of accounts in Ret_Wh (call it #shrinking_list). Then, in the while, I join Ret_Wh to #shrkining_list. At the end of the while, I delete one account from #shrinking_list. Then the while iterrates, with a smaller list joined to Ret_Wh, thereby speeding up the query as #sdt increases by 1 day. However, I don't know how to pass the exact same account number selected, to an external variable in the while, so that I can delete it from the #shrinking_list.
Any ideas on that, or how to speed this up in general?
Why are you using a cursor to get dates from #sdt to #dt one at a time?
select distinct b.account, b.money
from Ret as B
join Ret_Wh as A
on A.account = B.account
and a.datadate >= #sdt
and a.datadate < #dt
and a.statusA = 'Red'
and a.statusB = 'Large'
and a.old between 61 and 80
where b.old = 81