Subquery in JPA - jpa

I am trying to write the following SQL query as a JPA query. The SQL query works (MySQL database) but I don't know how to translate it. I get a error token right after the first FROM. There are probably other errors here too because I was not able to find any guides on how to do sub-queries in the from part, aliasing and so on.
SQL query
SELECT tbl.* from (
SELECT u.*, COUNT(u.id) AS question_count FROM app_user AS u
INNER JOIN question AS q ON u.id = q.user_id GROUP BY u.id
) AS tbl ORDER BY tbl.question_count DESC LIMIT 10;
JPA query:
SELECT tbl FROM (SELECT u, COUNT(u.id) question_count FROM User u
INNER JOIN u.questions q ON u.id = q.user_id GROUP BY u.id) tbl
ORDER BY tbl.question_count LIMIT 10")

I can't test this with anything right now, but something along the lines of:
final String queryStr = "SELECT u, COUNT(u.id) FROM User u, Questions q WHERE u.id = q.user_id GROUP BY u.id ORDER BY COUNT(u.id) DESC";
Query query = em().createQuery(queryStr);
query.setMaxResults(10);
List<Object[]> results = query.getResultList(); //Index [0] will contain the User-object, [1] will contain Long with result of COUNT(u.id)

Related

Postgres string_agg function not recognized as aggregate function

I am attempting to run this query
SELECT u.*, string_agg(CAST(uar.roleid AS VARCHAR(100)), ',') AS roleids, string_agg(CAST(r.role AS VARCHAR(100)), ',') AS systemroles
FROM idpro.users AS u
INNER JOIN idpro.userapplicationroles AS uar ON u.id = uar.userid
INNER JOIN idpro.roles AS r ON r.id = uar.roleid
GROUP BY u.id, uar.applicationid
HAVING u.organizationid = '77777777-f892-4f4a-8328-c31df32bd6ba'
AND uar.applicationid = 'd88fbf05-c048-4697-8bf3-036f39897183'
AND (u.statusid = '7f9f0b75-44b7-4216-bf2a-03abc47dcff8')
AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','de087148-4788-43da-89e2-dd7dff097735');
However, I'm getting an error stating that
ERROR: column "uar.roleid" must appear in the GROUP BY clause or be used in an aggregate function
LINE 9: AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','...
string_agg() IS an aggregate function, is it not? My intent, if it isn't obvious, is to return each user record with the roleids and rolenames in comma-delimited lists. If I am doing everything wrong, could you please point me in the right direction?
You are filtering the data, so a WHERE clause would be needed. This tutorial is worth reading.
SELECT u.*,
string_agg(CAST(uar.roleid AS VARCHAR(100)), ',') AS roleids,
string_agg(CAST(r.role AS VARCHAR(100)), ',') AS systemroles
FROM idpro.users AS u
INNER JOIN idpro.userapplicationroles AS uar ON u.id = uar.userid
INNER JOIN idpro.roles AS r ON r.id = uar.roleid
WHERE u.organizationid = '77777777-f892-4f4a-8328-c31df32bd6ba'
AND uar.applicationid = 'd88fbf05-c048-4697-8bf3-036f39897183'
AND (u.statusid = '7f9f0b75-44b7-4216-bf2a-03abc47dcff8')
AND uar.roleid IN ('cc9ada1c-fa21-400b-be98-c563ebb65a9c','de087148-4788-43da-89e2-dd7dff097735');
GROUP BY u.id, uar.applicationid
The HAVING clause is helpful for filtering the aggregated values or the groups.
Since you are grouping by u.id, the table primary key you have access to every column of the u table. You can either use a where clause or a having clause.
For uar.applicationid, it is part of the group by so you can also use either a where or a having.
uar.roleid is not part of the group by clause, so to be usable in the having clause, you would have to consider the aggregated value.
The following example filters out rows whose aggregated length is more than 10 chars.
HAVING length(string_agg(CAST(uar.roleid AS VARCHAR(100)), ',')) > 10
A more common usage, on numerical field, is to filter out if the number of aggregated rows is less than a threshold (having count(*) > 2) or a sum of some kind (having sum(vacation_days) > 21)

Optimise With Query in PostgreSQL

I have a working PostgreSQL query, but it is taking a considerable amount of time to execute. I need help optimising it.
I have:
Removed inner queries as much as possible.
Removed the unnecessary data from the query.
Created a with query which gets the required data from the beginning
I need help to optimise this query
with data as (
select
e.id,
e.name,
t.barcode,
tt.variant,
t.cost_cents::decimal / 100 as ticket_cost,
t.fee_cents::decimal / 100 as booking_fee
from
tickets t
inner join events e on t.event_id = e.id
inner join ticket_types tt on t.ticket_type_id = tt.id
where
t.status = 2
and e.source in ('source1', 'source2')
)
select
d.name,
count(distinct d.barcode) as issued,
(select count(distinct d2.barcode) from data d2 where d2.id = d.id and d2.variant is null) as sold,
sum(d.ticket_cost) as ticket_revenue,
sum(d.booking_fee) as booking_fees
from
data d
group by
id,
name
Better to detect slow parts with using EXPLAIN .
It will show cost of all parts
You can speed up joins by creating proper indexes.
Also, remove the subquery
(select count(distinct d2.barcode) from data d2 where d2.id = d.id and d2.variant is null)
from the SELECT clause and add a join to d2 table something like this:
select
d.name,
count(distinct d.barcode) as issued,
count(distinct d2.barcode) as sold,
sum(d.ticket_cost) as ticket_revenue,
sum(d.booking_fee) as booking_fees
from
data d
left join data d2 on (d2.id = d.id and d2.variant is null)
group by
d.id,
d.name

Replacing nested SELECT

How can I make postgreSQL query like this:
SELECT event_id, user_id FROM public."point"
WHERE user_id = (SELECT id FROM public."user"
WHERE email='test#gmail.com')
with JOINstatement and without nested SELECT statement. Above works but I think it is not optimal. Thanks for your answers.
For your particular case, this should work:
SELECT p.event_id, p.user_id
FROM public."point" p JOIN
public."user" u
ON p.user_id = u.id
WHERE u.email = 'test#gmail.com';
In general, when switching between JOIN and IN, you need to be careful about duplicates. So the general solution would be:
SELECT p.event_id, p.user_id
FROM public."point" p JOIN
(SELECT DISTINCT u.id
FROM public."user" u
WHERE u.email = 'test#gmail.com'
) u
ON p.user_id = u.id ;
But the id is probably already unique in user.

Lateral query syntax

I'm trying to get lateral to work in a Postgres 9.5.3 query.
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id",
lateral (select ci."MaximumPlaces" - "EnrolledStudents") x
I want the right-most column to be the result of "MaximumPlaces" - "EnrolledStudents" for that row but am struggling to get it to work. At the moment PG is complaining that "EnrolledStudents" does not exist - which is exactly the point of "lateral", isn't it?
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
lateral (select "MaximumPlaces" - "EnrolledStudents") as "x"
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id"
If I try inlining the lateral clause (shown above) in the select it gets upset too and gives me a syntax error - so where does it go?
Thanks,
Adam.
You are missing the point with LATERAL. It can access columns in tables in the FROM clause, but not aliases defined in SELECT clause.
If you want to access alias defined in SELECT clause, you need to add another query level, either using a subquery in FROM clause (AKA derived table) or using a CTE (Common Table Expression). As CTE in PostgreSQL acts as an optimization fence, I strongly recommend going with subquery in this case, like:
select
-- get all columns on the inner query
t.*,
-- get your new expression based on the ones defined in the inner query
t."MaximumPlaces" - t."EnrolledStudents" AS new_alias
from (
select b_ci."IdOwner",
ci."MinimumPlaces",
ci."MaximumPlaces",
(select count(*) from "LNK_Stu_CI" lnk
where lnk."FK_CourseInstanceId" = b_ci."Id") as "EnrolledStudents",
from "Course" c
join "DBObjectBases" b_c on c."Id" = b_c."Id"
join "DBObjectBases" b_ci on b_ci."IdOwner" = b_c."Id"
join "CourseInstance" ci on ci."Id" = b_ci."Id"
) t

Get Greatest date across multiple columns with entity framework

I have three entities: Group, Activity, and Comment. Each entity is represented in the db as a table. A Group has many Activities, and an Activity has many comments. Each entity has a CreatedDate field. I need to query for all groups + the CreatedDate of the most recent entity created on that Group's object graph.
I've constructed a sql query that gives me what I need, but I'm not sure how to do this in entity framework. Specifically this line: (SELECT MAX(X)
FROM (VALUES (g.CreatedDate), (a.CreatedDate), (c.CreatedDate)) Thanks in advance for your help. Here's the full query:
WITH GroupWithLastActivityDate AS (
SELECT DISTINCT
g.Id
,g.GroupName
,g.GroupDescription
,g.CreatedDate
,g.ApartmentComplexId
,(SELECT MAX(X)
FROM (VALUES (g.CreatedDate), (a.CreatedDate), (c.CreatedDate)) AS AllDates(X)) AS LastActivityDate
FROM Groups g
LEFT OUTER JOIN Activities a
on g.Id = a.GroupId
LEFT OUTER JOIN Comments c
on a.Id = c.ActivityId
WHERE g.IsActive = 1
)
SELECT
GroupId = g.Id
,g.GroupName
,g.GroupDescription
,g.ApartmentComplexId
,NumberOfActivities = COUNT(DISTINCT a.Id)
,g.CreatedDate
,LastActivityDate = Max(g.LastActivityDate)
FROM GroupWithLastActivityDate g
INNER JOIN Activities a
on g.Id = a.GroupId
WHERE a.IsActive = 1
GROUP BY g.Id
,g.GroupName
,g.GroupDescription
,g.CreatedDate
,g.ApartmentComplexId
I should add that for now I've constructed a view with this query (plus some other stuff) which I'm querying with a SqlQuery.