PonyORM: how to get rid of "subquery uses ungrouped column"? - ponyorm

Please assume this data model. This is a simplified model of what I really have here but all important fields are there. Database: Postgres. I want to calculate some stats on those models and I stumbled upon this problem:
c = Customer.select().random(1)[0] # example
query = left_join(
(
p,
count(i.order.customer == c),
count((b.is_used == True) for b in i.bonuses),
count(i)
) for p in Product
for i in p.order_items)
Now trying to show results (in reality I have some more filtering on the aggregations to do)
query.show()
yields this:
ProgrammingError: subquery uses ungrouped column "i.id" from outer query
LINE 4: WHERE "i"."id" = "b"."order_item"
What can I do to correct this?
Resulting SQL looks like this:
SELECT "p"."id", COUNT(case when "order"."customer" = ? then 1 else null end), (
SELECT COUNT(DISTINCT "b"."is_used" = 1)
FROM "Bonus" "b"
WHERE "i"."id" = "b"."order_item"
), COUNT(DISTINCT "i"."id")
FROM "Product" "p"
LEFT JOIN "OrderItem" "i"
ON "p"."id" = "i"."product"
LEFT JOIN "Order" "order"
ON "i"."order" = "order"."id"
GROUP BY "p"."id"
EDIT:
My real models are made to fit a db used by Django, so I have _table_ in each class and column=something_id in all foreign keys. But aside that everything looks OK to the point I can make any simpler query with ease.
EDIT2:
Here's the gist with my test data.

It appears that I needed to rephrase the code to this form:
query = left_join(
(
p,
count(i.order.customer == c),
count(b.is_used == True),
count(i)
) for p in Product
for i in p.order_items
for b in i.bonuses)

Related

Is there any way to write this code avoiding intermediate steps/views in PostgreSQL coming from different tables?

I am working in a large query I would like to eliminate intermediate
steps, so I am trying to write the two queries below in just one.
The first query (QUERY 1) select the grid id from a table called
tiles, from here i obtained an UUID who correspond to a value in a
second table so to obtain the real value of this grid id I have to
query a second query (QUERY 2). I have try to cast everything as I did
in the third query for other values, but this approach doesn't work.
Has someone an idea how can I manage to do this query (query 1 and 2)
just in one:
QUERY 1:
SELECT
jsonb_array_elements(grid_id_tile.tiledata -> '34cfea5d-c2c0-11ea-9026-02e7594ce0a0'::text) ->> 'resourceId'::text AS grid_id
FROM mv_geojson_geoms mv
LEFT JOIN tiles grid_id_tile ON tv1.resourceinstanceid = grid_id_tile.resourceinstanceid
WHERE (( SELECT resource_instances.graphid
FROM resource_instances
WHERE mv.resourceinstanceid = resource_instances.resourceinstanceid)) = '34cfe98e-c2c0-11ea-9026-02e7594ce0a0'::uuid;
QUERY 2:
SELECT grid_id.legacyid AS grid_id,
FROM table 1 (Where I have obtained the grid id)
LEFT JOIN resource_instances grid_id ON hb1.grid_id = grid_id.resourceinstanceid::text
QUERY 3:
( SELECT "values".value
FROM "values"
WHERE ((name_ft_tile.tiledata ->> '34cfea97-c2c0-11ea-9026-02e7594ce0a0'::text)::uuid) = "values".valueid) AS nametype,
FROM mv_geojson_geoms mv
LEFT JOIN tiles name_ft_tile ON mv.resourceinstanceid = name_ft_tile.resourceinstanceid AND (name_ft_tile.tiledata ->> '34cfea97-c2c0-11ea-9026-02e7594ce0a0'::text) <> ''::text
WHERE (( SELECT resource_instances.graphid
FROM resource_instances
WHERE mv.resourceinstanceid = resource_instances.resourceinstanceid)) = '34cfe98e-c2c0-11ea-9026-02e7594ce0a0'::uuid
Those are the type of data I am managing at the moment:
This is the table tiles where in the jsonb got the UUID from the feature i would like to get the gridid
This is the table resource instance where the legacyid is
So from the query 1 I get this result Gridid is a UUID
And from query 2 I get this result with the grid_id code
This is what I obtain and I would like to get directly the grid_id value without intermediate steps
The third query is a sample of similar approach I did so in one query I get the value instead of the UUID, and it is what I would like to do with the grid_id.
But when I run the similar code I get the error, because I get the element from an array:
ERROR: cannot extract elements from a scalar
CONTEXT: parallel worker
SQL state: 22023
You can literally inline the query 1 as a subquery where you've written "table 1 (Where I have obtained the grid id)":
SELECT grid_id.legacyid AS grid_id
FROM (
SELECT jsonb_array_elements(grid_id_tile.tiledata -> '34cfea5d-c2c0-11ea-9026-02e7594ce0a0'::text) ->> 'resourceId'::text AS grid_id
FROM mv_geojson_geoms mv
LEFT JOIN tiles grid_id_tile ON mv.resourceinstanceid = grid_id_tile.resourceinstanceid
JOIN resource_instances ON mv.resourceinstanceid = resource_instances.resourceinstanceid
WHERE resource_instances.graphid = '34cfe98e-c2c0-11ea-9026-02e7594ce0a0'::uuid;
) AS hb1
LEFT JOIN resource_instances grid_id ON hb1.grid_id = grid_id.resourceinstanceid::text;

EFCore returning too many columns for a simple LEFT OUTER join

I am currently using EFCore 1.1 (preview release) with SQL Server.
I am doing what I thought was a simple OUTER JOIN between an Order and OrderItem table.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId into tmp
from oi in tmp.DefaultIfEmpty()
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};
The actual data returned is correct (I know earlier versions had issues with OUTER JOINS not working at all). However the SQL is horrible and includes every column in Order and OrderItem which is problematic considering one of them is a large XML Blob.
SELECT [order].[OrderId], [order].[OrderStatusTypeId],
[order].[OrderSummary], [order].[OrderTotal], [order].[OrderTypeId],
[order].[ParentFSPId], [order].[ParentOrderId],
[order].[PayPalECToken], [order].[PaymentFailureTypeId] ....
...[orderItem].[OrderId], [orderItem].[OrderItemType], [orderItem].[Qty],
[orderItem].[SKU] FROM [Order] AS [order] LEFT JOIN [OrderItem] AS
[orderItem] ON [order].[OrderId] = [orderItem].[OrderId] ORDER BY
[order].[OrderId]
(There are many more columns not shown here.)
On the other hand - if I make it an INNER JOIN then the SQL is as expected with only the columns in my select clause:
SELECT [order].[OrderDt], [orderItem].[SKU], [orderItem].[Qty] FROM
[Order] AS [order] INNER JOIN [OrderItem] AS [orderItem] ON
[order].[OrderId] = [orderItem].[OrderId]
I tried reverting to EFCore 1.01, but got some horrible nuget package errors and gave up with that.
Not clear whether this is an actual regression issue or an incomplete feature in EFCore. But couldn't find any further information about this elsewhere.
Edit: EFCore 2.1 has addressed a lot of issues with grouping and also N+1 type issues where a separate query is made for every child entity. Very impressed with the performance in fact.
3/14/18 - 2.1 Preview 1 of EFCore isn't recommended because the GROUP BY SQL has some issues when using OrderBy() but it's fixed in nightly builds and Preview 2.
The following applies to EF Core 1.1.0 (release).
Although shouldn't be doing such things, tried several alternative syntax queries (using navigation property instead of manual join, joining subqueries containing anonymous type projection, using let / intermediate Select, using Concat / Union to emulate left join, alternative left join syntax etc.) The result - either the same as in the post, and/or executing more than one query, and/or invalid SQL queries, and/or strange runtime exceptions like IndexOutOfRange, InvalidArgument etc.
What I can say based on tests is that most likely the problem is related to bug(s) (regression, incomplete implementation - does it really matter) in GroupJoin translation. For instance, #7003: Wrong SQL generated for query with group join on a subquery that is not present in the final projection or #6647 - Left Join (GroupJoin) always materializes elements resulting in unnecessary data pulling etc.
Until it get fixed (when?), as a (far from perfect) workaround I could suggest using the alternative left outer join syntax (from a in A from b in B.Where(b = b.Key == a.Key).DefaultIfEmpty()):
var orders = from o in ctx.Order
from oi in ctx.OrderItem.Where(oi => oi.OrderId == o.OrderId).DefaultIfEmpty()
select new
{
OrderDt = o.OrderDt,
Sku = oi.Sku,
Qty = (int?)oi.Qty
};
which produces the following SQL:
SELECT [o].[OrderDt], [t1].[Sku], [t1].[Qty]
FROM [Order] AS [o]
CROSS APPLY (
SELECT [t0].*
FROM (
SELECT NULL AS [empty]
) AS [empty0]
LEFT JOIN (
SELECT [oi0].*
FROM [OrderItem] AS [oi0]
WHERE [oi0].[OrderId] = [o].[OrderId]
) AS [t0] ON 1 = 1
) AS [t1]
As you can see, the projection is ok, but instead of LEFT JOIN it uses strange CROSS APPLY which might introduce another performance issue.
Also note that you have to use casts for value types and nothing for strings when accessing the right joined table as shown above. If you use null checks as in the original query, you'll get ArgumentNullException at runtime (yet another bug).
Using "into" will create a temporary identifier to store the results.
Reference : MDSN: into (C# Reference)
So removing the "into tmp from oi in tmp.DefaultIfEmpty()" will result in the clean sql with the three columns.
var orders = from order in ctx.Order
join orderItem in ctx.OrderItem
on order.OrderId equals orderItem.OrderId
select new
{
order.OrderDt,
Sku = (oi == null) ? null : oi.Sku,
Qty = (oi == null) ? (int?) null : oi.Qty
};

OrientDB SQL Check if multiple pairs of vertices are connected

I haven't been able to find an answer for the SQL for this.
Given pairs of vertices (record ids) and edge types between them, I want to check if all pairs exists.
V1 --E1--> V2
V3 --E2--> V4
... and so on. The answer I want is true / false or something equivalent. ALL connections must be present in order to evaluate to true, so at least one edge (of correct type) must exist for each pair.
Pseudo, the question would be:
Does V1 have edge <E1EdgeType> to V2?
AND
Does V3 have edge <E2EdgeType> to V4?
AND
... and so on
Does anyone know what the orientDB SQL would be to achieve this?
UPDATE
I did already have one way of checking if one single edge exists between known vertices. It's perhaps not very pretty either, but it works:
SELECT FROM (
SELECT EXPAND(out('TestEdge')) FROM #12:0
) WHERE #rid=#12:1
This will return the destination record (#12:0) if an edge of type 'TestEdge' exists from #12:0 to #12:1. However, if I have two of those, how can I query for one single result for both queries. Something like:
SELECT <something with $c> LET
$a = (SELECT FROM (SELECT EXPAND(out('TestEdge')) FROM #12:0) WHERE #rid=#12:1)
$b = (SELECT FROM (SELECT EXPAND(out('AnotherTestEdge')) FROM #12:2) WHERE #rid=#12:3)
$c = <something that checks that both a and b yield results>
That's what I aim towards doing. Please tell me if I'm solving this the wrong way. I'm not even sure what the gain is to merge queries like this compared to just repeat queries.
Given a pair of vertices, say #11:0 and #12:0, the following query will effectively check whether there is an edge of type E from #11:0
to #12:0
select from (select #this, out(E) from #11:0 unwind out) where out = #12:0
----+------+-----+-----
# |#CLASS|this |out
----+------+-----+-----
0 |null |#11:0|#12:0
----+------+-----+-----
This is highly inelegant and I would encourage you to think about formulating an enhancement request accordingly at https://github.com/orientechnologies/orientdb/issues
One way to incorporate the boolean tests you have in mind is illustrated by the following:
select from
(select $a.size() as a, $b.size() as b
let a=(select count(*) as e from (select out(E) from #11:0 unwind out)
where out = #12:0),
b=(select count(*) as e from (select out(E) from #11:1 unwind out)
where out = #12:2))
where a > 0 and b > 0
Yes, inelegance again :-(
It might be useful to you the following query
SELECT eval('sum($a.size(),$b.size())==2') as existing_edges
let $a = ( SELECT from TestEdge where out = #12:0 and in = #12:1 limit 1),
$b = ( SELECT from AnotherTestEdge where out = #12:2 and in = #12:3 limit 1)
Hope it helps.

PostgreSQL - select the results of two subqueries

I have 2 complex queries that are both subqueries in postgres, the results of which are:
q1_results = id , delta , metric_1
q2_results = id , delta , metric_2
i'd like to combine the results of the queries, so the outer query can access either:
results_a = id , delta , metric_1 , metric_2
results_b = id , delta , combined_metric
i can't figure out how to do this. online searches keep leading me to UNION , but that keeps the metrics in the same column. i need to keep them split.
It's not entirely clear what you're asking in the question and the comments, but it sounds like you might be looking for a full join with a bunch of coalesce statements, e.g.:
-- create view at your option, e.g.:
-- create view combined_query as
select coalesce(a.id, b.id) as id,
coalesce(a.delta, b.delta) as delta,
a.metric1 as metric1,
b.metric2 as metric2,
coalesce(a.metric1,0) + coalesce(b.metric2,0) as combined
from (...) as results_a a
full join (...) as results_b b on a.id = b.id -- and a.delta = b.delta maybe?

Convert SQL to LINQ, nested select, top, "distinct" using group by and multiple order bys

I have the following SQL query, which I'm struggling to convert to LINQ.
Purpose: Get the top 10 coupons from the table, ordered by the date they expire (i.e. list the ones that are about to expire first) and then randomly choosing one of those for publication.
Notes: Because of the way the database is structured, there maybe duplicate Codes in the Coupon table. Therefore, I am using a GROUP BY to enforce distinction, because I can't use DISTINCT in the sub select query (which I think is correct). The SQL query works.
SELECT TOP 1
c1.*
FROM
Coupon c1
WHERE
Code IN (
SELECT TOP 10
c2.Code
FROM
Coupon c2
WHERE
c2.Published = 0
GROUP BY
c2.Code,
c2.Expires
ORDER BY
c2.Expires
)
ORDER BY NEWID()
Update:
This is as close as I have got, but in two queries:
var result1 = (from c in Coupons
where c.Published == false
orderby c.Expires
group c by new { c.Code, c.Expires } into coupon
select coupon.FirstOrDefault()).Take(10);
var result2 = (from c in result1
orderby Guid.NewGuid()
select c).Take(1);
Here's one possible way:
from c in Coupons
from cs in
((from c in coupons
where c.published == false
select c).Distinct()
).Take(10)
where cs.ID == c.ID
select c
Keep in mind that LINQ creates a strongly-typed data set, so an IN statement has no general equivalent. I understand trying to keep the SQL tight, but LINQ may not be the best answer for this. If you are using MS SQL Server (not SQL Server Compact) you might want to consider doing this as a Stored Procedure.
Using MercurioJ's slightly buggy response, in combination with another SO suggested random row solution my solution was:
var result3 = (from c in _dataContext.Coupons
from cs in
((from c1 in _dataContext.Coupons
where
c1.IsPublished == false
select c1).Distinct()
).Take(10)
where cs.CouponId == c.CouponId
orderby _dataContext.NewId()
select c).Take(1);