Why does this SQL unnest query result in 2 rows rather than 4? - postgresql

Relatively new SQL user question....
If my postgresql query looks like this:
select
to_timestamp((unnest(enrolled_ranges) ->> 'start_time')::float) as start_time
, to_timestamp((unnest(enrolled_ranges) ->> 'end_time')::float) as end_time
from student_inclusions
where student_id = '123456'
And the initial enrolled_ranges json data is this:
{"{\"start_time\":1536652800.00007,\"end_time\":1563981839.966626}","{\"start_time\":1563982078.624668,\"end_time\":1563989693.830777}"}
Why does sql do this
instead of this
The first answer is what I want, I just don't understand how sql knows from the query to associate the matching start and end times. Do you have any insight?

The documentation on set-returning functions describes the behavior you observed:
For each row from the underlying query, there is an output row using the first result from each set-returning function, then an output row using the second result, and so on.
See also What is the expected behaviour for multiple set-returning functions in SELECT clause?

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

Postgres "first" aggregation function

I am aggregation a table using file ID field. Each file has a name which matched exactly one (his) file id.
select file_key, min(fullfilepath)
from table
group by file_key
Because I know the structure of the table, I know that I need ANY fullfilepath. The min and the max are ok, but it requires a lot of time.
I came across this aggregation function which returns the first value. Unfortunately, this function takes a long time, because it scans the whole table. For example, this is very slow:
select first(file_id) from table;
What is the fastest way to do that? With or without aggregation function.
There is no way to make your first query with the GROUP BY clause faster, because it has to scan the whole table to find all groups.
Your second query can be made faster:
SELECT (
SELECT file_id FROM "table"
WHERE file_id IS NOT NULL
LIMIT 1
);
There is no way to optimize the query as you wrote it, because the aggregate function is a black box to PostgreSQL.
I doubt that this will help performance but it may be useful if anyone actually wants a first agregate.
-- coaslesce isn't a function so make an equivalent function.
create function coalesce_("anyelement","anyelement") returns "anyelement"
language sql as $$ select coalesce( $1,$2 ) $$;
create aggregate first("anyelement") (sfunc=coalesce_, stype="anyelement");
select
distinct on (file_key)
file_key, fullfilepath
from table
order by file_key
That will return one record for each file_key

How to group by in cypher efficiently?

I translated the below SQL query to cypher. group by in cypher is implicit and it causes confusion and more query execution time. My SQL query is:
INSERT INTO tmp_build
(result_id, hshld_id, product_id)
SELECT b.result_id, a.hshld_id, b.cluster_id
FROM fact a
INNER JOIN productdata b ON a.product_id = b.barcode
WHERE b.result_id = 1
GROUP BY b.result_id, a.hshld_id, b.cluster_id;
The equivalent cypher query is:
MATCH (b:PRODUCTDATA {RESULT_ID: 1 })
WITH b
MATCH (b)<-[:CREATES_PRODUCTDATA]-(a:FACT)
WITH b.RESULT_ID as RESULT_ID , collect(b.RESULT_ID) as result, a.HSHLD_ID as HSHLD_ID,
collect(a.HSHLD_ID) as hshld, b.CLUSTER_ID as CLUSTER_ID, collect(b.CLUSTER_ID) as cluster
CREATE (:TMP_BUILD { RESULT_ID:RESULT_ID , HSHLD_ID:HSHLD_ID , PRODUCT_ID:CLUSTER_ID });
This query is running slow because of collect(). Without collect function is not giving me the group by results. Is there any way to optimise it? or better implementation of group by in cypher?
In the Cypher query, you are attempting to return rows with both a singular values (RESULT_ID, HSHLD_ID, CLUSTER_ID) and their collections, but since you're returning both, your collections will only have the same value repeated the number of times it occurred in the results (for example, RESULT_ID = 1, result = [1,1,1,1]). I don't think that's useful for you.
Also, nothing in your original query seems to suggest you need aggregations. Your GROUP BY columns are the only columns being returned, there are no aggregation columns, so that seems like you just need distinct rows. Try removing the collection columns from your Cypher query, and use WITH DISTINCT instead of just WITH.
If that doesn't work, then I think you will need to further explain exactly what it is that you are attempting to get as the result.

PostgreSQL use function result in ORDER BY

Is there a way to use the results of a function call in the order by clause?
My current attempt (I've also tried some slight variations).
SELECT it.item_type_id, it.asset_tag, split_part(it.asset_tag, 'ASSET', 2)::INT as tag_num
FROM serials.item_types it
WHERE it.asset_tag LIKE 'ASSET%'
ORDER BY split_part(it.asset_tag, 'ASSET', 2)::INT;
While my general assumption is that this can't be done, I wanted to know if there was a way to accomplish this that I wasn't thinking of.
EDIT: The query above gives the following error [22P02] ERROR: invalid input syntax for integer: "******"
Your query is generally OK, the problem is that for some row the result of split_part(it.asset_tag, 'ASSET', 2) is the string ******. And that string cannot be cast to an integer.
You may want to remove the order by and the cast in the select list and add a where split_part(it.asset_tag, 'ASSET', 2) = '******', for instance, to narrow down that data issue.
Once that is resolved, having such a function in the order by list is perfectly fine. The quoted section of the documentation in the comments on the question is referring to applying an order by clause to the results of UNION, INTERSECTION, etc. queries. In other words, the order by found in this query:
(select column1 as result_column1 from table1
union
select column2 from table 2)
order by result_column1
can only refer to the accumulated result columns, not to expressions on individual rows.

DB2 to Netezza Migration

I have one query in DB2 which has mentioned below.
What would be the syntax for the same in NETEZZA?
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
First, I don't think your statement is valid.
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
The where clause needs to be finished and you've used a closing parenthesis without an opening one.
fetch first is common (standard?) ODBC syntax, so it's very likely that this will work. However, the usual way to do this in netezza is using a limit. All that said, this is how I'd query and expect the intended result (omitting your where since I can't infer the intent):
select distinct acct_num from gtd_demo_dim limit 1;