inneficient subquery postgresql - postgresql

Hi I have this query:
select distinct r.fparams::json->>'uuid_level_2' as uuid_level_2
from jhft.run r
where r.ts_run >= :ts_run
which returns in 323ms:
49c954c3-9d57-4777-99cb-634e59393053
4e9f3aac-b9d0-422b-badf-171c24dac138
d68726a0-7176-4bd3-aac8-b796dab074a5
I'm using it as a subquery a in clause in this other query:
select distinct
r.fparams::json->>'uuid_level_2' as uuid_level_2,
first_value(r.fparams) over
(partition by r.fparams::json->>'uuid_level_2' order by r.id) as first_fparams
from jhft.run r
where r.fparams::json->>'uuid_level_2' in (
select distinct r.fparams::json->>'uuid_level_2' as uuid_level_2
from jhft.run r
where r.ts_run >= :ts_run )
the results takes about 20 seconds to be retrieved;
BUT when I try to make the same query with the where clause as:
where r.fparams::json->>'uuid_level_2' in (
'd68726a0-7176-4bd3-aac8-b796dab074a5',
'49c954c3-9d57-4777-99cb-634e59393053',
'4e9f3aac-b9d0-422b-badf-171c24dac138' )
the results takes just about 300 ms.
Looks like when there is a subquery in the WHERE clause it makes the whole table to be scanned.
any means to "simulate" the hard-coding of the keys?

An obvious candidate for a faster solution would be to use a CTE and a join (but as Erwin and a_horse_with_no_name pointed out, your question is lacking in detail to come up with a definitive solution):
WITH target AS (
SELECT DISTINCT fparams::json->>'uuid_level_2' AS uuid_level_2
FROM jhft.run
WHERE ts_run >= :ts_run
)
SELECT DISTINCT
fparams::json->>'uuid_level_2' AS uuid_level_2,
first_value(fparams) OVER
(PARTITION BY fparams::json->>'uuid_level_2' ORDER BY id) AS first_fparams
FROM jhft.run
JOIN target USING (uuid_level_2)
However, without any EXPLAIN ANALYZE VERBOSE output from your query as an absolute minimum, this is only an educated guess.

Related

why am I getting ERROR: syntax error at end of input?

I am keep getting 'syntax error at end of input' and don't know why.
What I want to do is divide result of disease by result of total with showing condition_id in disease section.
select disease.condition_id, (disease::float/total::float) as prevalence
from (
select condition_id, count(person_id)
from a.condition
where condition_id=316139
group by condition_id
) as disease
join (
select count(distinct person_id) as total
from a.person
)as total;
Can someone please help me with this?
Thanks!
I don't have an exact fix for your current syntax, but I would phrase this query as a join with an aggregation over the entire tables:
SELECT
COUNT(*) FILTER (WHERE c.condition_id = 316139) /
COUNT(DISTINCT p.person_id) AS prevalence
FROM a.person p
LEFT JOIN a.condition c
ON p.person_id = c.person_id;
The main reason for your error is the missing join condition. The join operator requires a join condition (defined using ON).
But given the structure of your query I think you don't actually want a inner join, but a cross join between the two.
Additionally the expression disease::float is trying to cast a complete row to a float value, not a single column. I assume you wanted to alias the count aggregate to something, e.g. count(person_id) as num_persons
Using total::float is also ambiguous as you have a sub-query alias with that name and a column with that name. That is highly confusing, you should avoid that.
select disease.condition_id,
(disease.num_person::float / total.total::float) as prevalence
from (
select condition_id, count(person_id) as num_person
from a.condition
where condition_id = 316139
group by condition_id
) as disease
cross join (
select count(distinct person_id) as total
from a.person
) as total

Implement ROW_NUMBER() in beamSQL

I have the below query :
SELECT DISTINCT Summed, ROW_NUMBER () OVER (order by Summed desc) as Rank from table1
I have to write it in Apache Beam(beamSql). Below is my code :
PCollection<BeamRecord> rec_2_part2 = rec_2.apply(BeamSql.query("SELECT DISTINCT Summed, ROW_NUMBER(Summed) OVER (ORDER BY Summed) Rank1 from PCOLLECTION "));
But I'm getting the below error :
Caused by: java.lang.UnsupportedOperationException: Operator: ROW_NUMBER is not supported yet!
Any idea how to implement ROW_NUMBER() in beamSql ?
Here is one way you can approximate your current query without using ROW_NUMBER:
SELECT
t1.Summed,
(SELECT COUNT(*) FROM (SELECT DISTINCT Summed FROM table1) t2
WHERE t2.Summed >= t1.Summed) AS Rank
FROM
(
SELECT DISTINCT Summed
FROM table1
) t1
The basic idea is to first subquery to get a table with only distinct Summed values. Then, use a correlated subquery to simulate the row number. This isn't a very efficient method, but if ROW_NUMBER is not available, then you're stuck with some alternative.
The solution which worked for the above query:
PCollection<BeamRecord> rec_2 = rec_1.apply(BeamSql.query("SELECT max(Summed) as maxed, max(Summed)-10 as least, 'a' as Dummy from PCOLLECTION"));

Unable to get Percentile_Cont() to work in Postgresql

I am trying to calculate a percentile using the percentile_cont() function in PostgreSQL using common table expressions. The goal is find the top 1% of accounts regards to their balances (called amount here). My logic is to find the 99th percentile which will return those whose account balances are greater than 99% of their peers (and thus finding the 1 percenters)
Here is my query
--ranking subquery works fine
with ranking as(
select a.lname,sum(c.amount) as networth from customer a
inner join
account b on a.customerid=b.customerid
inner join
transaction c on b.accountid=c.accountid
group by a.lname order by sum(c.amount)
)
select lname, networth, percentile_cont(0.99) within group
order by networth over (partition by lname) from ranking ;
I keeping getting the following error.
ERROR: syntax error at or near "order"
LINE 2: ...ame, networth, percentile_cont(0.99) within group order by n..
I am thinking that perhaps I forgot a closing brace etc. but I can't seem to figure out where. I know it could be something with the order keyword but I am not sure what to do. Can you please help me to fix this error?
This tripped me up, too.
It turns out percentile_cont is not supported in postgres 9.3, only in 9.4+.
https://www.postgresql.org/docs/9.4/static/release-9-4.html
So you have to use something like this:
with ordered_purchases as (
select
price,
row_number() over (order by price) as row_id,
(select count(1) from purchases) as ct
from purchases
)
select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1
That query care of https://www.periscopedata.com/blog/medians-in-sql (section: "Median on Postgres")
You are missing the brackets in the within group (order by x) part.
Try this:
with ranking
as (
select a.lname,
sum(c.amount) as networth
from customer a
inner join account b on a.customerid = b.customerid
inner join transaction c on b.accountid = c.accountid
group by a.lname
order by networth
)
select lname,
networth,
percentile_cont(0.99) within group (
order by networth
) over (partition by lname)
from ranking;
I want to point out that you don't need a subquery for this:
select c.lname, sum(t.amount) as networth,
percentile_cont(0.99) within group (order by sum(t.amount)) over (partition by lname)
from customer c inner join
account a
on c.customerid = a.customerid inner join
transaction t
on a.accountid = t.accountid
group by c.lname
order by networth;
Also, when using table aliases (which should be always), table abbreviations are much easier to follow than arbitrary letters.

JOIN tables inside a subquery in DB2

I'm having trouble with paginating with joined tables in DB2. I want to return rows 10-30 of a query that contains an INNER JOIN.
This works:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY U4SLSMN.SLNAME) AS ID,
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC
FROM U4SLSMN) AS P
WHERE P.ID BETWEEN 10 AND 30
This does not work:
SELECT *
FROM (
SELECT row_number() OVER (ORDER BY U4SLSMN.SLNAME) AS ID,
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC, U4CONST.C4NAME
FROM U4SLSMN INNER JOIN U4CONST ON U4SLSMN.SLNO = U4CONST.C4NAME
) AS P
WHERE P.ID BETWEEN 10 AND 30
The error I get is:
Selection error involving field *N.
Note that the JOIN query works correctly by itself, just not when it's run as a subquery.
How do I perform a join inside a subquery in DB2?
Works fine for me on v7.1 TR9
Here's what I actually ran:
select *
from ( select rownumber() over (order by vvname) as ID, idescr, vvname
from olsdta.ioritemmst
inner join olsdta.vorvendmst on ivndno = vvndno
) as P
where p.id between 10 and 30;
I much prefer the CTE version however:
with p as
( select rownumber() over (order by vvname) as ID, idescr, vvname
from olsdta.ioritemmst
inner join olsdta.vorvendmst on ivndno = vvndno
)
select *
from p
where p.id between 10 and 30;
Finally, note that at 7.1 TR11 (7.2 TR3), IBM added support of the LIMIT and OFFSET clauses. Your query could be re-done as follows:
SELECT
U4SLSMN.SLNO, U4SLSMN.SLNAME, U4SLSMN.SLLC, U4CONST.C4NAME
FROM U4SLSMN INNER JOIN U4CONST ON U4SLSMN.SLNO = U4CONST.C4NAME
ORDER BY U4SLSMN.SLNAME
LIMIT 20 OFFSET 9;
However, note that the LIMIT & OFFSET clauses are only supported in prepared or embedded SQL. You can't use them in STRSQL or STRQMQRY. I believe the "Run SQL Scripts" GUI interface does support them. Here's an article about LIMIT & OFFSET

In Firebird, how to aggregate the first N rows?

I would like to do something like this:
CNT=2;
//[edit]
select avg(price) from (
select first :CNT p.Price
from Price p
order by p.Date desc
);
This does not work, Firebird does not allow :cnt as a parameter to FIRST. I need to average the first CNT newest prices. The number 2 changes so it can not be hard-coded.
This can be broken out into a FOR SELECT loop and break when a count is reached. Is that the best way though? Can this be done in a single SQL statement?
Creating the SQL as a string and running it is not the best fit either. It is important that the database compile my SQL statement.
You don't have to use CTE, you can do it directly:
select avg(price) from (
select first :cnt p.Price
from Price p
order by p.Date desc
);
You can use a CTE (Common Table Expression) (see http://www.firebirdsql.org/refdocs/langrefupd21-select.html#langrefupd21-select-cte) to select data before calculate average.
See example below:
with query1 as (
select first 2 p.Price
from Price p
order by p.Date desc
)
select avg(price) from query1