how to Use created SQL variable in the same query [duplicate] - postgresql

Can I modify the next to use the column aliases avg_time and cnt in an expression ROUND(avg_time * cnt, 2)?
SELECT
COALESCE(ROUND(stddev_samp(time), 2), 0) as stddev_time,
MAX(time) as max_time,
ROUND(AVG(time), 2) as avg_time,
MIN(time) as min_time,
COUNT(path) as cnt,
ROUND(avg_time * cnt, 2) as slowdown, path
FROM
loadtime
GROUP BY
path
ORDER BY
avg_time DESC
LIMIT 10;
It raises the next error:
ERROR: column "avg_time" does not exist
LINE 7: ROUND(avg_time * cnt, 2) as slowdown, path
The next, however, works fine (use primary expressions instead of column aliases:
SELECT
COALESCE(ROUND(stddev_samp(time), 2), 0) as stddev_time,
MAX(time) as max_time,
ROUND(AVG(time), 2) as avg_time,
MIN(time) as min_time,
COUNT(path) as cnt,
ROUND(AVG(time) * COUNT(path), 2) as slowdown, path
FROM
loadtime
GROUP BY
path
ORDER BY
avg_time DESC
LIMIT 10;

You can use a previously created alias in the GROUP BY or HAVING statement but not in a SELECT or WHERE statement. This is because the program processes all of the SELECT statement at the same time and doesn't know the alias' value yet.
The solution is to encapsulate the query in a subquery and then the alias is available outside.
SELECT stddev_time, max_time, avg_time, min_time, cnt,
ROUND(avg_time * cnt, 2) as slowdown
FROM (
SELECT
COALESCE(ROUND(stddev_samp(time), 2), 0) as stddev_time,
MAX(time) as max_time,
ROUND(AVG(time), 2) as avg_time,
MIN(time) as min_time,
COUNT(path) as cnt,
path
FROM
loadtime
GROUP BY
path
ORDER BY
avg_time DESC
LIMIT 10
) X;

The order of execution of a query (and thus the evaluation of expressions and aliases) is NOT the same as the way it is written. The "general" position is that the clauses are evaluated in this sequence:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Hence the column aliases are unknown to most of the query until the select clause is complete (and this is why you can use aliases in the ORDER BY clause). However table aliases which are established in the from clause are understood in the where to order by clauses.
The most common workaround is to encapsulate your query into a "derived table"
Suggested reading: Order Of Execution of the SQL query
Note: different SQL dbms have different specific rules regarding use of aliases
EDIT
The purpose behind reminding readers of the logical clause sequence is that often (but not always) aliases only becomes referable AFTER the clause where the alias is declared. The most common of which is that aliases declared in the SELECT clause can be used by the ORDER BY clause. In particular, an alias declared in a SELECT clause cannot be referenced within the same SELECT clause.
But please do note that due to differences in products not every dbms will behave in this manner

Aliases are not available until the virtual relation is actually created, if you want to do additional expressions using the aliases themselves you will have to create the virtual relation using as sub-query than run an additional query on top of it. So I would modify your query to the following:
SELECT stddev_time, max_time, avg_time, min_time, ROUND(avg_time * cnt, 2) as slowdown, path FROM
(
SELECT
COALESCE(ROUND(stddev_samp(time), 2), 0) as stddev_time,
MAX(time) as max_time,
ROUND(AVG(time), 2) as avg_time,
MIN(time) as min_time,
COUNT(path) as cnt,
ROUND(AVG(time) * COUNT(path), 2) as slowdown, path
FROM
loadtime
GROUP BY
path
ORDER BY
avg_time DESC
LIMIT 10;
)
I want to add here the reason your second query worked is because the query planner recognized those columns as defined directly in the table you're querying them from.

Either repeat the expressions:
ROUND(ROUND(AVG(time), 2) * COUNT(path), 2) as slowdown
or use an subquery:
SELECT *, ROUND(avg_time * cnt, 2) as slowdown FROM (
SELECT
COALESCE(ROUND(stddev_samp(time), 2), 0) as stddev_time,
MAX(time) as max_time,
ROUND(AVG(time), 2) as avg_time,
MIN(time) as min_time,
COUNT(path) as cnt,
path
FROM loadtime
GROUP BY path) x
ORDER BY avg_time DESC
LIMIT 10;

Related

Limit by percent instead of number of rows without subqueries

I would like to select the top 1% of rows; however, I cannot use subqueries to do it. I.e., this won't work:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT (COUNT(*) * 0.01)::integer FROM mytbl)
How would I accomplish the same output without using a subquery with limit?
You can utilize PERCENT_RANK:
WITH cte(ID, var, pc) AS
(
SELECT ID, var, PERCENT_RANK() OVER (ORDER BY random()) AS pc
FROM mytbl
WHERE var = 'value'
)
SELECT *
FROM cte
WHERE pc <= 0.01
ORDER BY id;
SqlFiddleDemo
I solved it with Python using the psycopg2 package:
cur.execute("SELECT ROUND(COUNT(id)*0.01,0)
FROM mytbl")
nrows = str([int(d[0]) for d in cur.fetchall()][0])
cur.execute("SELECT *
FROM mytbl
WHERE var='value'
ORDER BY id, random() LIMIT (%s)",nrows)
Perhaps there is a more elegant solution using just SQL, or a more efficient one, but this does exactly what I'm looking for.
If I got it right, you need:
Random 1% sample of all rows,
If some id is within the sample, all rows with the same id must be there too.
The follow sql should do the trick:
with ids as (
select id,
total,
sum(cnt) over (order by max(rnd)) running_total
from (
select id,
count(*) over (partition by id) cnt,
count(*) over () total,
row_number() over(order by random()) rnd
from mytbl
) q
group by id,
cnt,
total
)
select mytbl.*
from mytbl,
ids
where mytbl.id = ids.id
and ids.running_total <= ids.total * 0.01
order by mytbl.id;
I don’t have your data, of course, but I have no trouble using a sub query in the LIMIT clause.
However, the sub query contains only the count(*) part and I then multiply the result by 0.01:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT count(*) FROM mytbl)*0.01;

ORDER BY complicated expressions in PostgreSQL

I am trying to ORDER BY a difference between 2 double values (which are aliased columns), but it does not see the aliased columns.
Example:
SELECT COALESCE(
ROUND(
SUM(amount * currency1.rate / currency2.rate)
, 4)
, 0) AS first_amount,
SUM(
(SELECT
COALESCE(
ROUND(
SUM(table2.amount * currency3.rate / currency2.rate)
, 4)
, 0)
FROM table2
JOIN currencies currency3
ON currency3.id = table2.currency_id
WHERE table2.date BETWEEN table1.start_date AND table1.end_date
)
) AS second_amount
FROM table1
JOIN currencies currency1
ON currency3.id = table1.currency_id
JOIN currencies currency2
ON currency3.id = 123 # some hardcoded ID
ORDER BY first_amount - second_amount ASC
Postgres tells me that column first_amount does not exist.
Reading the documentation, I saw that Postgres 9.0 does not allow expressions with aliased columns.
How can I solve the problem by sorting all the stuff I need in the correct manner ?
A column alias cannot be used directly in the where or order by clause. You need to wrap this in a derived table.
select *
from (
... your original query goes here ...
) as t
ORDER BY first_amount - second_amount ASC;

Trouble in calculating the field while creating view in postgresql

I have two tables q1data and q1lookup in postgres database. q1data contains 3 columns (postid, reasonid, other) and q1lookup contains 2 columns (reasonid, reason).
I am trying to create a view which will include 4 columns (reasonid, reason, count, percentage). count is the count of each reason and percentage should be each count divided by total of count(*) from q1data (i.e. total rows if reasonid).
But it gives an error and says syntax error near count(*). The following is the code I am using. Please help.
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(0) AS count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid;
Firstly, you have a completely invalid piece of syntax there: count(0) AS count(*). Replacing that with a plain count(*), and adding the missing Group By entry for reason, gives this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)
/
(select count(*) from cwfis_web.q1data)
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
However, as this live demo shows this doesn't give the right value for percentage, because count(cwfis_web.q1data.reasonid) and (select count(*) from cwfis_web.q1data) are both of type integer, so integer division is performed, and the result truncated to 0.
If you cast these to numeric (the expected argument type of the 2-parameter round() function, you get this:
select
cwfis_web.q1data.reasonid AS reasonid,
cwfis_web.q1lookup.reason AS reason,
count(cwfis_web.q1data.reasonid) AS count,
round(
(
(
count(cwfis_web.q1data.reasonid)::numeric
/
(select count(*) from cwfis_web.q1data)::numeric
) * 100
)
,0) AS percentage
from
cwfis_web.q1data
join
cwfis_web.q1lookup
ON cwfis_web.q1data.reasonid = cwfis_web.q1lookup.reasonid
group by
cwfis_web.q1data.reasonid,
cwfis_web.q1lookup.reason;
Which as this live demo shows gives something more like you were hoping for. (Alternatively, you can cast to float, and lose the ,0 argument to round(), as in this demo.)
Try changing your subquery from
select count(0) AS count(*) from cwfis_web.q1data
to
select count(0) from cwfis_web.q1data
Also you need to add cwfis_web.q1lookup.reason to group by.

Error "Invalid column name" in CTE

I'm having an issue using a column alias for a join in a cte. Invalid column name on the line with RowNumber2 >= (t1.RowNumber - 20) Anyone have a suggestion? Thanks..
DECLARE #latestDate Date = dbo.LatestDateWithPricingVolCountOver4k()
;WITH AllSymbsAndDates AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY Symbol ORDER BY TradingDate) AS RowNumber,
Symbol, TradingDate
FROM tblSymbolsMain
CROSS JOIN tblTradingDays
WHERE TradingDate <= #latestDate
),
SymbsDatesGrouped AS
(
SELECT * FROM
(
SELECT
t1.Symbol, t1.TradingDate, t2.TradingDate AS TradingDate2, t1.RowNumber,
t2.RowNumber AS RowNumber2
FROM AllSymbsAndDates t1
JOIN AllSymbsAndDates t2 ON t1.Symbol = t2.Symbol
AND RowNumber2 >= (t1.RowNumber - 20)
) t
)
SELECT
Symbol, TradingDate, TradingDate2, RowNumber, RowNumber2
FROM
SymbsDatesGrouped
ORDER BY
Symbol, TradingDate, TradingDate2
You can't reference a column alias in the WHERE or JOIN clauses - actually the only clause where you can reference an alias from the SELECT list is either in the ORDER BY (or in an outer scope, e.g. selecting from a subquery or CTE).
In this case, the solution is pretty trivial. Why not just say:
AND t2.RowNumber >= (t1.RowNumber - 20)
?

Perl prepare DB2 statement not returning what I need

Since I am using DB2, in order to select a portion of a database in the middle (like a limit/offset pairing), I need to do a different kind of prepare statement. The example I was given was this:
SELECT *
FROM (SELECT col1, col2, col3, ROW_NUMBER() OVER () AS RN FROM table) AS cols
WHERE RN BETWEEN 1 AND 10000;
Which I adapted to this:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000
And when I call the fetchall_arrayref(), I do come out with 500 results like I want to, but it is only returning an array with references to the row number, and not all of the data I want to pull. I know for a fact that that is what the code is SUPPOSED to do as its written, and I have tried a bunch of permutations to get my desired result with no luck.
All I want is to grab all of the columns like my previous prepare statement into an array of arrays:
SELECT * FROM TU_TRANSACTIONS ORDER BY 2, 3, 4, 6, 7
but just on a designated section. There is just a fundamental thing I am missing, and I just cant see it.
Any help is appreciated, even if its paired with some constructive criticism.
Your table expression:
(SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) as foo
Has only one column - rownum - so when you select "*" from "foo" you get only the one column.
Your table expression needs to include all of the columns you want, just like e example you posted.
I don't use DB2 so I could be off-base but it seems that:
SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000
Would only return the row numbers because while the sub-query references the table the main query does not. All it seems it would see is the set of numbers (which would return a single column with the number filled in)
Perhaps this would work:
SELECT * FROM TRANSACTIONS, (SELECT ROW_NUMBER() OVER (ORDER BY 2,3,4,6,7 ASC) AS rownum FROM TRANSACTIONS) AS foo WHERE rownum >= 500 AND rownum <1000