Postgresql subqueries using a calculated column - postgresql

I am new to this platform and need to get a value using a column I already calculated. I know I need a subquery, but am confused by the proper syntax.
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date,
LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(oil/hourly_rate)::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
The error I am getting is
ERROR: column "hourly_rate" does not exist
LINE 4: (oil/hourly_rate)::double precision as six
^
HINT: Perhaps you meant to reference the column "production.hour_rate".
SQL state: 42703
Character: 171
Which I understand...I have tried brackets, naming the sub queries and different tactics. I know this is a syntax thing can someone please give me a hand. Thank you

I'm a bit confused with your notation, but it looks like there are parenthesis issues: your from statement is not linked to the select.
In my opinion, the best way to manage subqueries is to wrinte someting like this :
WITH query1 AS (
select col1, col2
from table1
),
query2 as (
select col1, col2
from query1
(additional clauses)
),
select (what you want)
from query2
(additional statements)
Then you can manipulate your data progressively until you have the right organisation of your data for the final select, including aggregations

You cannot use alias in the select list. YOu need to include the original calculation in the column. So your updated query would look alike -
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(Oil/(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600))::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)

Related

How do I select only 1 record per user id using ROW_NUMBER without a subquery?

My current method of de-duping is really dumb.
select col1, col2 ... col500 from
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;
Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.
In Postgres distinct on () is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:
select distinct on (uuid) *
from the_table
order by something
You have to supply an order by (which is something you should have done with row_number() as well) to get stable results - otherwise the chosen row is "random".
The above is true for Postgres. You also tagged your question with amazon-redshift - I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.

Postgres poor performance "in-clause"

I have this query:
with serie as (
select to_char(kj, 'yyyymmdd')::numeric
from generate_series('2016-02-06 01:56:00','2016-02-06 23:57:00', '1 day'::interval) kj
)
select col1,col2,col3
from foreign_table
where col3 in (select * from serie) -- from CTE serie here is only one number 20160216
And its performance is poor, the foreign table has an index on col3.
But if I write the values from CTE serie manually it performs fast
select col1,col2,col3
from foreign_table
where col3 in (20160216,20160217)
I put there one more value just to show it works fast with more than one value
And if I write "=" to first query instead of "in" it also performs fast
with serie as (
select to_char(kj, 'yyyymmdd')::numeric
from generate_series('2016-02-06 01:56:00','2016-02-06 23:57:00', '1 day'::interval) kj
)
select col1,col2,col3
from foreign_table
where col3 = (select * from serie) -- I can write "=" in this case because I have just one number returned from CTE
(I am using Postgres 9.5.1)
Why does Postgres performs so poorly with in-clase with CTE compare to manually writing these values or using "=". I obviously can not write values manually all the time since I need this query universal and I can not put there "=" because I need it universal here as well.
So any ideas here ?
btw: This is not the only case when in-clause made a poor performance compare to other two methods I showed here
These are the query plans, I have other queries that are not affected by foreign table, once I find them I will put them here as well
http://i.imgur.com/zeiXwwW.png

SqlAlchemy: count of distinct over multiple columns

I can't do:
>>> session.query(
func.count(distinct(Hit.ip_address, Hit.user_agent)).first()
TypeError: distinct() takes exactly 1 argument (2 given)
I can do:
session.query(
func.count(distinct(func.concat(Hit.ip_address, Hit.user_agent))).first()
Which is fine (count of unique users in a 'pageload' db table).
This isn't correct in the general case, e.g. will give a count of 1 instead of 2 for the following table:
col_a | col_b
----------------
xx | yy
xxy | y
Is there any way to generate the following SQL (which is valid in postgresql at least)?
SELECT count(distinct (col_a, col_b)) FROM my_table;
distinct() accepts more than one argument when appended to the query object:
session.query(Hit).distinct(Hit.ip_address, Hit.user_agent).count()
It should generate something like:
SELECT count(*) AS count_1
FROM (SELECT DISTINCT ON (hit.ip_address, hit.user_agent)
hit.ip_address AS hit_ip_address, hit.user_agent AS hit_user_agent
FROM hit) AS anon_1
which is even a bit closer to what you wanted.
The exact query can be produced using the tuple_() construct:
session.query(
func.count(distinct(tuple_(Hit.ip_address, Hit.user_agent)))).scalar()
Looks like sqlalchemy distinct() accepts only one column or expression.
Another way around is to use group_by and count. This should be more efficient than using concat of two columns - with group by database would be able to use indexes if they do exist:
session.query(Hit.ip_address, Hit.user_agent).\
group_by(Hit.ip_address, Hit.user_agent).count()
Generated query would still look different from what you asked about:
SELECT count(*) AS count_1
FROM (SELECT hittable.user_agent AS hittableuser_agent, hittable.ip_address AS sometable_column2
FROM hittable GROUP BY hittable.user_agent, hittable.ip_address) AS anon_1
You can add some variables or characters in concat function in order to make it distinct. Taking your example as reference it should be:
session.query(
func.count(distinct(func.concat(Hit.ip_address, "-", Hit.user_agent))).first()

ROWID equivalent in postgres 9.2

Is there any way to get rowid of a record in postgres??
In oracle i can use like
SELECT MAX(BILLS.ROWID) FROM BILLS
Yes, there is ctid column which is equivalent for rowid. But is useless for you. Rowid and ctid are physical row/tuple identifiers => can change after rebuild/vacuum.
See: Chapter 5. Data Definition > 5.4. System Columns
The PostgreSQL row_number() window function can be used for most purposes where you would use rowid. Whereas in Oracle the rowid is an intrinsic numbering of the result data rows, in Postgres row_number() computes a numbering within a logical ordering of the returned data. Normally if you want to number the rows, it means you expect them in a particular order, so you would specify which column(s) to order the rows when numbering them:
select client_name, row_number() over (order by date) from bills;
If you just want the rows numbered arbitrarily you can leave the over clause empty:
select client_name, row_number() over () from bills;
If you want to calculate an aggregate over the row number you'll have to use a subquery:
select max(rownum) from (
select row_number() over () as rownum from bills
) r;
If all you need is the last item from a table, and you have a column to sort sequentially, there's a simpler approach than using row_number(). Just reverse the sort order and select the first item:
select * from bills
order by date desc limit 1;
Use a Sequence. You can choose 4 or 8 byte values.
http://www.neilconway.org/docs/sequences/
Add any unique column to your table(name maybe rowid).
And prevent changing it by creating BEFORE UPDATE trigger, which will raise exception if someone will try to update.
You may populate this column with sequence as #JohnMudd mentioned.

Is there a way to find TOP X records with grouped data?

I'm working with a Sybase 12.5 server and I have a table defined as such:
CREATE TABLE SomeTable(
[GroupID] [int] NOT NULL,
[DateStamp] [datetime] NOT NULL,
[SomeName] varchar(100),
PRIMARY KEY CLUSTERED (GroupID,DateStamp)
)
I want to be able to list, per [GroupID], only the latest X records by [DateStamp]. The kicker is X > 1, so plain old MAX() won't cut it. I'm assuming there's a wonderfully nasty way to do this with cursors and what-not, but I'm wondering if there is a simpler way without that stuff.
I know I'm missing something blatantly obvious and I'm gonna kick myself for not getting it, but .... I'm not getting it. Please help.
Is there a way to find TOP X records, but with grouped data?
According to the online manual, Sybase 12.5 supports WINDOW functions and ROW_NUMBER(), though their syntax differs from standard SQL slightly.
Try something like this:
SELECT SP.*
FROM (
SELECT *, ROW_NUMBER() OVER (windowA ORDER BY [DateStamp] DESC) AS RowNum
FROM SomeTable
WINDOW windowA AS (PARTITION BY [GroupID])
) AS SP
WHERE SP.RowNum <= 3
ORDER BY RowNum DESC;
I don't have an instance of Sybase, so I haven't tested this. I'm just synthesizing this example from the doc.
I made a mistake. The doc I was looking at was Sybase SQL Anywhere 11. It seems that Sybase ASA does not support the WINDOW clause at all, even in the most recent version.
Here's another query that could accomplish the same thing. You can use a self-join to match each row of SomeTable to all rows with the same GroupID and a later DateStamp. If there are three or fewer later rows, then we've got one of the top three.
SELECT s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
FROM SomeTable s1
LEFT OUTER JOIN SomeTable s2
ON s1.[GroupID] = s2.[GroupID] AND s1.[DateStamp] < s2.[DateStamp]
GROUP BY s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
HAVING COUNT(*) < 3
ORDER BY s1.[DateStamp] DESC;
Note that you must list the same columns in the SELECT list as you list in the GROUP BY clause. Basically, all columns from s1 that you want this query to return.
Here's quite an unscalable way!
SELECT GroupID, DateStamp, SomeName
FROM SomeTable ST1
WHERE X <
(SELECT COUNT(*)
FROM SomeTable ST2
WHERE ST1.GroupID=ST2.GroupID AND ST2.DateStamp > ST1.DateStamp)
Edit Bill's solution is vastly preferable though.