Postgres poor performance "in-clause" - postgresql

I have this query:
with serie as (
select to_char(kj, 'yyyymmdd')::numeric
from generate_series('2016-02-06 01:56:00','2016-02-06 23:57:00', '1 day'::interval) kj
)
select col1,col2,col3
from foreign_table
where col3 in (select * from serie) -- from CTE serie here is only one number 20160216
And its performance is poor, the foreign table has an index on col3.
But if I write the values from CTE serie manually it performs fast
select col1,col2,col3
from foreign_table
where col3 in (20160216,20160217)
I put there one more value just to show it works fast with more than one value
And if I write "=" to first query instead of "in" it also performs fast
with serie as (
select to_char(kj, 'yyyymmdd')::numeric
from generate_series('2016-02-06 01:56:00','2016-02-06 23:57:00', '1 day'::interval) kj
)
select col1,col2,col3
from foreign_table
where col3 = (select * from serie) -- I can write "=" in this case because I have just one number returned from CTE
(I am using Postgres 9.5.1)
Why does Postgres performs so poorly with in-clase with CTE compare to manually writing these values or using "=". I obviously can not write values manually all the time since I need this query universal and I can not put there "=" because I need it universal here as well.
So any ideas here ?
btw: This is not the only case when in-clause made a poor performance compare to other two methods I showed here
These are the query plans, I have other queries that are not affected by foreign table, once I find them I will put them here as well
http://i.imgur.com/zeiXwwW.png

Related

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.
Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

Postgresql subqueries using a calculated column

I am new to this platform and need to get a value using a column I already calculated. I know I need a subquery, but am confused by the proper syntax.
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date,
LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(oil/hourly_rate)::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
The error I am getting is
ERROR: column "hourly_rate" does not exist
LINE 4: (oil/hourly_rate)::double precision as six
^
HINT: Perhaps you meant to reference the column "production.hour_rate".
SQL state: 42703
Character: 171
Which I understand...I have tried brackets, naming the sub queries and different tactics. I know this is a syntax thing can someone please give me a hand. Thank you
I'm a bit confused with your notation, but it looks like there are parenthesis issues: your from statement is not linked to the select.
In my opinion, the best way to manage subqueries is to wrinte someting like this :
WITH query1 AS (
select col1, col2
from table1
),
query2 as (
select col1, col2
from query1
(additional clauses)
),
select (what you want)
from query2
(additional statements)
Then you can manipulate your data progressively until you have the right organisation of your data for the final select, including aggregations
You cannot use alias in the select list. YOu need to include the original calculation in the column. So your updated query would look alike -
SELECT well_id, reported_date, oil,
(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600)::int as hourly_rate,
(Oil/(EXTRACT(EPOCH FROM age(reported_date, LAG(reported_date) OVER w))/3600))::double precision as six
FROM public.production
WINDOW w AS (PARTITION BY well_id ORDER BY well_id, reported_date
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)

How do I select only 1 record per user id using ROW_NUMBER without a subquery?

My current method of de-duping is really dumb.
select col1, col2 ... col500 from
(select col1, col2 ... col500, ROW_NUMBER() OVER(PARTITION BY uid) as row_num)
where row_num=1;
Is there a way to do this without a subquery? Select distinct is not an option as there can be small variations in the columns which are not significant for this output.
In Postgres distinct on () is typically faster then the equivalent solution using a window function and also doesn't require a sub-query:
select distinct on (uuid) *
from the_table
order by something
You have to supply an order by (which is something you should have done with row_number() as well) to get stable results - otherwise the chosen row is "random".
The above is true for Postgres. You also tagged your question with amazon-redshift - I have no idea if Redshift (which is in fact a very different DBMS) supports the same thing nor if it is as efficient.

Does PostgreSQL have an equivalent to SAS' obsnum?

When converting scripts, tables, datasets, etc. from a SAS environment to a PostgreSQL environment, is there an equivalent to referencing SAS' obsnum in PostgreSQL? For example, if a query says:
SELECT FROM schema.table
WHERE obsnum = 1
Is there a way to track the observation number or similar in PostgreSQL? Or should a different approach be taken?
Thanks.
Should probably specify that I was told obsnum is a built-in SAS value associated with datasets and tables, and in my SAS scripts there is no declaration for obsnum, only a singular reference to it in a SELECT statement.
OBSNUM isn't an automatic variable in SAS so its a variable value.
You should be able to use a similar query in Postgres to limit where the variable value is 1.
To add on - PROC SQL doesn't have an automatic variable to do numbering, it can use monotonic() for row numbers but it's not supported.
(wrong answer):
Try CTID
See your previous question, the first answer and the comments/questions.
What exactly is the/this data statement in SAS doing? PostgreSQL equivalent?
PostgresSQL
http://www.postgresql.org/docs/8.2/static/ddl-system-columns.html
-- make(fake) a dataset
CREATE TABLE dataset
( val double precision NOT NULL
);
-- populate it with random
INSERT INTO dataset(val)
SELECT random()
FROM generate_series(1,100)
;
-- Use row_number() to enumerate the unordered tuples
-- note the subquery. It is needed because otherwise
-- you cannot refer to row_number
SELECT * FROM (
SELECT val
, row_number() OVER() AS obsnum
FROM dataset
) qq -- subquery MUST have an alias
WHERE qq.obsnum = 1
;
-- you can just as well order over ctid (the result is the same)
SELECT * FROM (
SELECT val
, row_number() OVER(ORDER BY ctid) AS obsnum
FROM dataset
) qq -- subquery MUST have an alias
WHERE qq.obsnum = 1
;
-- In stead of a subquery you could use
-- a CTE to wrap the enumeration part
WITH zzz AS (
SELECT val
, row_number() OVER(ORDER BY ctid) AS obsnum
FROM dataset
)
SELECT * FROM zzz
WHERE obsnum = 1
;
-- Or, if you just want one observation: use LIMIT
-- (order of records is not defined,
-- but the order of ctid is not stable either)
SELECT * FROM dataset
LIMIT 1
;

nested SELECT statements interact in ways that I don't understand

I thought I understood how I can do a SELECT from the results of another SELECT statement, but there seems to be some sort of blurring of scope that I don't understand. I am using SQL Server 2008R2.
It is easiest to explain with an example.
Create a table with a single nvarchar column - load the table with a single text value and a couple of numbers:
CREATE TABLE #temptable( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES('apple');
INSERT INTO #temptable( a )
VALUES(1);
INSERT INTO #temptable( a )
VALUES(2);
select * from #temptable;
This will return: apple, 1, 2
Use IsNumeric to get only the rows of the table that can be cast to numeric - this will leave the text value apple behind. This works fine.
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1 ;
This returns: 1, 2
However, if I use that exact same query as an inner select, and try to do a numeric WHERE clause, it fails saying cannot convert nvarchar value 'apple' to data type int. How has it got the value 'apple' back??
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
where x.NumA > 1
;
Note that the failing query works just fine without the WHERE clause:
select
x.NumA
from
(
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
) x
;
I find this very surprising. What am I not getting? TIA
If you take a look at the estimated execution plan you'll find that it has optimized the inner query into the outer and combined the WHERE clauses.
Using a CTE to isolate the operations works (in SQL Server 2008 R2):
declare #temptable as table ( a nvarchar(30) );
INSERT INTO #temptable( a )
VALUES ('apple'), ('1'), ('2');
with Numbers as (
select cast(a as int) as NumA
from #temptable
where IsNumeric(a) = 1
)
select * from Numbers
The reason you are getting this is fair and simple. When a query is executed there are some steps that are being followed. This is a parse, algebrize, optimize and compile.
The algebrize part in this case will get all the objects you need for this query. The optimize will use these objects to create a best query plan which will be compiled and executed...
So, when you look into that part you will see it will do a table scan on #temptable. And #temptable is defined as the way you created your table. That you will do some compute on it is a different thing..... The column still has the nvarchar datatype..
To know how this works you have to know how to read a query. First all the objects are retrieved (from table, inner join table), then the predicates (where, on), then the grouping and such, then the select of the columns (with the cast) and then the orderby.
So with that in mind, when you have a combination of selects, the optimizer will still process it that way.. since your select is subordinate to the from and join parts of your query, it will be a reason for getting this error.
I hope i made it a little clear?
The optimizer is free to move expressions in the query plan in order to produce the most cost efficient plan for retrieving the data (the evaluation order of the predicates is not guaranteed). I think using the case expression like bellow produces a NULL in absence of the ELSE clause and thus takes the APPLE out
select a from #temptable where case when isnumeric(a) = 1 then a end > 1