Why LATERAL not works with values? - postgresql

It not make sense, a literal is not a valid column?
SELECT x, y FROM (select 1 as x) t, LATERAL CAST(2 AS FLOAT) AS y; -- fine
SELECT x, y FROM (select 1 as x) t, LATERAL 2.0 AS y; -- SYNNTAX ERROR!
Same if you use CASE clause or x+1 expression or (x+1)... seems ERROR for any non-function.
The Pg Guide, about LATERAL expression (not LATERAL subquery), say
LATERAL is primarily useful when the cross-referenced column is necessary for computing the row(s) to be joined (...)
NOTES
The question is about LATERAL 1_column_expression not LATERAL multicolumn_subquery. Example:
SELECT x, y, exp, z
FROM (select 3) t(x), -- subquery
LATERAL round(x*0.2+1.2) as exp, -- expression!
LATERAL (SELECT exp+2.0 AS y, x||'foo' as z) t2 --subquery
;
... After #klin comment showing that the Guide in another point say "only functions", the question Why? must be expressed in a more specific way, changing a litle bit the scope of the question:
Not make sense "only funcions", the syntax (x) or (x+1), encapsulatening expression in parentesis, is fine, is not?Why only functions?
PS: perhaps there is a future plan, or perhaps a real problem on parsing generic expressions... As users we must show to PostgreSQL developers what make sense and we need.

It'll all work fine if you wrap it in its own subquery
SELECT x, y FROM (select 1 as x) t, LATERAL (SELECT 2.0 AS y) z;

A literal is a valid value for a column, but as the docs you quoted say, LATERAL syntax is used
for computing the row(s) to be joined
A relation, such as a FROM or JOIN or LATERAL subquery clause, always computes tuples of (a single or multiple) columns. The alias you're assigning is not for an individual row, but for the whole tuple.

Answering "Why only functions?" by intuition.
Or "Why does the PostgreSQL spec use only functions?". Of course, it's not a question about the parser, because it complies with the specification.
The SELECT syntax Guide show the only occasions when we can use LATERAL:
[ LATERAL ] ( select ) [ AS ] alias [ ( column_alias [, ...] ) ] ...
[ LATERAL ] function_name ( [ argument [, ...] ] ) ...
[ LATERAL ] ROWS FROM( function_name ( [ argument [, ...] ] ) ...
So, no conflict on
[ LATERAL ] (single_expression) [ AS ] alias
The guess of #Bergi is that a literal expression like LATERAL 2.0 AS y could be interpreted as LATERAL "2"."0", the "table 0" and "schema 2"... But, as we saw above, not make sense to expect a table name after clause LATERAL, so, in fact, no ambiguity.
Conclusion: it looks like the specification of LATERAL can grow and allow the use of expressions.This is the great advantage of being able to discuss and participate in an open community software!
Why LATERAL single_expression AS alias? Rationale:
to be orthogonal: any new user of PostgreSQL, that see that is valid SELECT a, x, x+b AS y FROM t, LATERAL f(a) AS x, will naturally try also expressions instead functions. It is expected in a "orthogonal system" and is intuitive for any programmer.
to reuse expressions: we use "chain of dependent expressions" in any language, things like a=b+c; x=a+y; z=a/2; .... It is ugly to do "SELECT(SELECT(SELECT))" in SQL, only for reuse expressions. The "chains of LATERALs" is more elegant and human-readable. And perhaps is better also for query optimization.

Related

how to unpivot large AWS Redshift table

I am trying to run a query against a table in AWS Redshift (i.e., postgresql). Below is a simplified definition of the table:
CREATE TABLE some_schema.some_table (
row_id int
,productid_level1 char(1)
,productid_level2 char(1)
,productid_level3 char(1)
)
;
INSERT INTO some_schema.some_table
VALUES
(1, a, b, c)
,(2, d, c, e)
,(3, c, f, g)
,(4, e, h, i)
,(5, f, j, k)
,(6, g, l, m)
;
I need to return a de-duped, single column table of a given productid and all of its children. "Children" means any productid that has "level" higher than the given product (for a given row) and also its grandchildren.
For example, for productid 'c', I expect to return...
'c' (because it's found in rows 1, 2, and 3)
'e' (because it's a child of 'c' in row 2)
'f' and 'g' (because they're children of 'c' in row 3)
'h' and 'i' (because they're children of 'e' in row 4)
'j' and 'k' (because they're children of 'f' in row 5)
and 'l' and 'm' (because they're children of 'g' in row 6)
Visually, I expect to return the following:
productid
---------
c
e
f
g
h
i
j
k
l
m
The actual table has about 3M rows and has about 20 "levels".
I think there are 2 parts to this query -- (1) a recursive CTE to build out the hierarchy and (2) an unpivot operation.
I have not attempted (1) yet. For (2), I have tried a query like the following, but it hasn't returned even after 3 minutes. As this will be used for an operational report, I need it to return in < 15 seconds.
select
b.productid
,b.product_level
from
some_schema.some_table as a
cross join lateral (
values
(a.productid_level1, 1)
,(a.productid_level2, 2)
...
,(a.productid_level20, 20)
) as b(productid, product_level)
How can I write the query to achieve (1) and (2) and be very performant?
I would avoid using the term Hierarchy, as that "usually" implies any node having a single parent at most.
I admit I'm lost as to the nature of the graph/network this table represents. But you might benefit from a little brute force and code repetition.
Whatever eventually works for you, I think you'll need to persist/materialise/cache the results, as repeating this at report time is unlikely to ever be a good idea.
I'm a data engineer by trade, and I'm sure they have good reasons for what they've done (or, like me, they maybe screwed up). Either way, there are many good reasons to ask them to materialise the graph in more than just one form, each suited to different use cases. So, asking them for a traditional adjacency list, as well as the table you already have, is a reasonable request. Or, at the very least, a good starting point for a conversation.
So, a brute force approach?
WITH
adjacency AS
(
SELECT level01, level02 FROM some_table WHERE level02 IS NOT NULL
UNION
SELECT level02, level03 FROM some_table WHERE level03 IS NOT NULL
UNION
...
UNION
SELECT level19, level20 FROM some_table WHERE level20 IS NOT NULL
)
The WHERE clause elimates any sparse data before it enters the map.
The UNION (without ALL) ensures duplicate links are eliminated. You should also test UNION ALL and then wrapping a SELECT DISTINCT around it (or similar).
Then you can use that adjacency list in the usual recursive walk, to find all children of a given node. (Taking care that there aren't any cyclic paths.)

Postgres WHERE clause with infinity

I have the following WHERE clause on a table x with column 1 as integer.
The SELECT is called in a function with parameters (a integer, b integer)
Call the function:
SELECT * FROM function(0, 10)
Script in funcion:
SELECT * FROM tablex x WHERE x.column1 between a and b
Now i miss the results where column1 is null, but in this case it is important to get these. Depends who calls the function. What should be parameter "a" to also get the null values. Or is there a way to disable the where clause depending on the paramter which are coming?
I found a nice solution for this:
SELECT * FROM tablex x WHERE coalesce(x.column1, 0) between a and b
Add logic to admit null column1 values:
SELECT *
FROM tablex x
WHERE x.column1 BETWEEN a AND b OR x.column1 IS NULL;
To make the NULLs behave as actual values, invert the logic.
[and add a comment to the code, because the negation might confuse future readers/maintainers]:
SELECT * FROM tablex x WHERE NOT x.column1 < a and NOT x.column1 >= b;
Note the above assumes you want NULLs in both a and b to be treated as -inf and +inf , respectively.

How to use postgresql any with jsonb data

Related
see this question
Question
I have a postgresql table that has a column of type jsonb. the json data looks like this
{
"personal":{
"gender":"male",
"contact":{
"home":{
"email":"ceo#home.me",
"phone_number":"5551234"
},
"work":{
"email":"ceo#work.id",
"phone_number":"5551111"
}
},
..
"nationality":"Martian",
..
},
"employment":{
"title":"Chief Executive Officer",
"benefits":[
"Insurance A",
"Company Car"
],
..
}
}
This query works perfectly well
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'personal'->>'nationality' in ('Martian','Terran')
I would like to fetch all employees who have benefits of type Insurance A OR Insurance B, this ugly query works:
select employees->'personal'->'contact'->'work'->>'email'
from employees
where employees->'employment'->'benefits' ? 'Insurance A'
OR employees->'employment'->'benefits' ? 'Insurance B';
I would like to use any instead like so:
select * from employees
where employees->'employment'->>'benefits' =
any('{Insurance A, Insurance B}'::text[]);
but this returns 0 results.. ideas?
What i've also tried
I tried the following syntaxes (all failed):
.. = any({'Insurance A','Insurance B'}::text[]);
.. = any('Insurance A'::text,'Insurance B'::text}::array);
.. = any({'Insurance A'::text,'Insurance B'::text}::array);
.. = any(['Insurance A'::text,'Insurance B'::text]::array);
employees->'employment'->'benefits' is a json array, so you should unnest it to use its elements in any comparison.
Use the function jsonb_array_elements_text() in lateral join:
select *
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
where
benefit = any('{Insurance A, Insurance B}'::text[]);
The syntax
from
employees,
jsonb_array_elements_text(employees->'employment'->'benefits')
is equivalent to
from
employees,
lateral jsonb_array_elements_text(employees->'employment'->'benefits')
The word lateral may be omitted. For the documentation:
LATERAL can also precede a function-call FROM item, but in this case
it is a noise word, because the function expression can refer to
earlier FROM items in any case.
See also: What is the difference between LATERAL and a subquery in PostgreSQL?
The syntax
from jsonb_array_elements_text(employees->'employment'->'benefits') benefits(benefit)
is a form of aliasing, per the documentation
Another form of table aliasing gives temporary names to the columns of
the table, as well as the table itself:
FROM table_reference [AS] alias ( column1 [, column2 [, ...]] )
You can use the containment operator ?| to check if the array contains any of the values you want.
select * from employees
where employees->'employment'->'benefits' ?| array['Insurance A', 'Insurance B']
If you happen to a case where you want all of the values to be in the array, then there's the ?& operator to check for that.

Greatest N per group in Open SQL

Selecting the rows from a table by (partial) key with the maximum value in a particular column is a common task in SQL. This question has some excellent answers that cover a variety of approaches to it. Unfortunately I'm struggling to replicate this in my ABAP program.
None of the commonly used approaches seem to be supported:
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons: SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
After trying each of these solutions in turn only to discover that ABAP doesn't seem to support them and being unable to find any equivalents I'm starting to think that I'll have no choice but to dump the data of the subquery to an itab.
What is the best practice for this common programming requirement in ABAP development?
First of all, specific requirement, would give you a better answer. As it happens I bumped into this question when working on a program, that uses 3 distinct methods of pseudo-grouping, (while looking for alternatives) and ALL 3 can be used to answer your question, depending on what exactly you need to do. I'm sure there are more ways to do it.
For instance, you can pull maximum values within a group by simply selecting max( your_field ) and grouping by some fields, if that's all you need.
select bname, nation, max( date_from ) from adrp group by bname, nation. "selects highest "from" date for each bname
If you need to use that max value as a filter condition within a query, you can do it by performing pseudo-grouping using sub-query and max within sub-query like this (notice how I move out the BNAME check into sub query, which means I don't have to check both fields using in (subquery) addition):
select ... from adrp as b_adrp "Pulls the latest person info for a user (some conditions are missing, but this is a part of an actual query)
where b_adrp~date_from in (
select max( date_from ) "Highest date_from where both dates are valid
from adrp where persnumber = b_adrp~persnumber and nation = b_adrp~nation and date_from <= #sy-datum )
The query above allows you to select selects all user info from base query and (where the first one only allows to take aggregated and grouped data).
Finally, If you need to check based on composite key and compare it to multiple agregate function results, the implementation will heavily depend on specifics of your requirement (and since your question has none, I'll provide a generic one). Easiest option is to use exists / not exists instead of in (subquery), in exact same way and form the subquery to check for existance of specific key or condition rather than pull a list ( you can nest subqueries if you have to ):
select * from bkpf where exists ( select 1 from bkpf as b where belnr = bkpf~belnr and gjahr = bkpf~gjahr group by belnr, gjahr having max( budat ) = bkpf~budat ) "Took an available example, that I had in testing program.
All 3 queries will get you max value of a column within a group and in fact, all 3 can use joins to achieve identical results.
please find my answers below your questions.
Joining on a subquery is not supported in syntax: SELECT * FROM X as x INNER JOIN ( SELECT ... ) AS y
Putting the subquery in your where condition should do the work SELECT * FROM X AS x INNER JOIN Y AS y ON x~a = y~b WHERE ( SELECT * FROM y WHERE ... )
Using IN for a composite key is not supported in syntax as far as I know: SELECT * FROM X WHERE (key1, key2) IN ( SELECT key1 key2 FROM ... )
You have to split your WHERE clause: SELECT * FROM X WHERE key1 IN ( SELECT key1 FROM y ) AND key2 IN ( SELECT key2 FROM y )
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons.
Yes, thats right at the moment.
Left join to itself with smaller-than comparison is not supported, outer joins only support EQ comparisons:
SELECT * FROM X AS x LEFT JOIN X as xmax ON x-key1 = xmax-key1 AND x-key2 < xmax-key2 WHERE xmax-key IS INITIAL
This is not true. This SELECT is perfectly valid:
SELECT b1~budat
INTO TABLE lt_bkpf
FROM bkpf AS b1
LEFT JOIN bkpf AS b2
ON b2~belnr < b1~belnr
WHERE b1~bukrs <> ''.
And was valid at least since 7.40 SP08, since July 2013, so at the time you asked this question it was valid as well.

Combine SUM and CAST - not working?

PostgreSQL Unicode 9.01 doesn't like:
SELECT table1.fielda,
SUM (CAST (table2.fielda AS INT)) AS header.specific
FROM *etc*
What is wrong with SUM-CAST?
Error Message:
Incorrect column expression: 'SUM (CAST
(specifics_nfl_3pl_work_order_item.delivery_quantity AS INT))
Query:
SELECT specifics_nfl_3pl_work_order.work_order_number,
specifics_nfl_3pl_work_order.goods_issue_date,
specifics_nfl_3pl_work_order.order_status_id,
SUM (CAST (specifics_nfl_3pl_work_order_item.delivery_quantity AS INT)) AS units
FROM public.specifics_nfl_3pl_work_order specifics_nfl_3pl_work_order,
public.specifics_nfl_3pl_work_order_item specifics_nfl_3pl_work_order_item,
public.specifics_nfl_order_status specifics_nfl_order_status
WHERE specifics_nfl_3pl_work_order.order_status_id In (3,17,14)
AND specifics_nfl_3pl_work_order_item.specifics_nfl_work_order_id=
specifics_nfl_3pl_work_order.id
AND ((specifics_nfl_3pl_work_order.sold_to_id<>'0000000000')
AND (specifics_nfl_3pl_work_order.goods_issue_date>={d '2013-08-01'}))
It would be really great if you can help.
If I were you, then I would do these steps:
give your table short aliases
format the query
use proper ANSI joins:
remove spaces between function name and (
select
o.work_order_number,
o.goods_issue_date,
o.order_status_id,
sum(cast(oi.delivery_quantity as int)) as units
from public.specifics_nfl_3pl_work_order as o
inner join public.specifics_nfl_3pl_work_order_item as oi on
oi.specifics_nfl_work_order_id = o.id
-- inner join public.specifics_nfl_order_status os -- seems redundant
where
o.order_status_id In (3,17,14) and
o.sold_to_id <> '0000000000' and
o.goods_issue_date >= {d '2013-08-01'}
Actually I really think you need group by clause here:
select
o.work_order_number,
o.goods_issue_date,
o.order_status_id,
sum(cast(oi.delivery_quantity as int)) as units
from public.specifics_nfl_3pl_work_order as o
inner join public.specifics_nfl_3pl_work_order_item as oi on
oi.specifics_nfl_work_order_id = o.id
where
o.order_status_id In (3,17,14) and
o.sold_to_id <> '0000000000' and
o.goods_issue_date >= {d '2013-08-01'}
group by
o.work_order_number,
o.goods_issue_date,
o.order_status_id
if it still doesn't work - try to comment sum and see is it working?
But you have a table2 or only table1?
Try:
SELECT table1.fielda,
SUM (CAST (table1.fielda AS INT)) AS "header.specific"
FROM etc
In addition to what #Roman already cleared up, there are more problems here:
SELECT o.work_order_number
,o.goods_issue_date
,o.order_status_id
,SUM(CAST(oi.delivery_quantity AS INT)) AS units -- suspicious
FROM public.specifics_nfl_3pl_work_order o,
JOIN public.specifics_nfl_3pl_work_order_item oi
ON oi.specifics_nfl_work_order_id = o.id
CROSS JOIN public.specifics_nfl_order_status os -- probably wrong
WHERE o.order_status_id IN (3,17,14)
AND o.sold_to_id <> '0000000000' -- suspicious
AND o.goods_issue_date> = {d '2013-08-01'} -- nonsense
GROUP BY 1, 2, 3
o.goods_issue_date> = {d '2013-08-01'} is syntactical nonsense. Maybe you mean:
o.goods_issue_date> = '2013-08-01'
You have the table specifics_nfl_order_status in your FROM list, but without any expression connecting it to the rest. This effectively results in a CROSS JOIN, which results in a Cartesian product and is almost certainly wrong in a very expensive way: every row is combined with every row of the rest:
CROSS JOIN public.specifics_nfl_order_status os
Either remove the table (since you don't use it) or add a WHERE or ON clause to connect it to the rest. Note, that it is not just redundant, it has a dramatic effect on the result as it is.
This WHERE clause is suspicious:
AND o.sold_to_id <> '0000000000'
Seems like you are storing numbers as strings or otherwise confusing the two.
Also, CAST (oi.delivery_quantity AS INT) should not be needed to begin with. The column should be of data type integer or some other appropriate numeric type to begin with. Be sure to use proper data types.
The default setting of search_path includes public, and you may not need to schema-qualify tables. Instead of public.specifics_nfl_3pl_work_order, it may suffice to use:
specifics_nfl_3pl_work_order
GROUP BY 1, 2, 3 is using positional parameters, just a notational shortcut for:
GROUP BY o.work_order_number, o.goods_issue_date, o.order_status_id
Details in the manual.
According to comments you are using MS Query to create the query. This is not the best of ideas. Produces the kind of inferior code you presented us with. You may want to get rid of that while you are working with PostgreSQL.