PostgreSQL get nested rows with limit - postgresql

I got a self referencing table that's only 1 deep: comments and replies. A reply is just a comment with a parent id:
Comments (simplified):
- comment_id
- parentCommentId
Users have to scroll through the comments and replies and typically 10 new rows are being fetched each time, I'm trying out an recursive query for this:
WITH RECURSIVE included_childs(comment_id, parent_comment_id) AS
(SELECT comment_id, parent_comment_id
FROM comments
UNION
SELECT c.comment_id, c.parent_comment_id
FROM included_childs ch, comments c
WHERE c.comment_id = ch.parent_comment_id
)
SELECT *
FROM included_childs
limit 10
obviously because of limit 10 not all the childs are being included this way and conversations will be cut off. What I actually want is a limit on the parents and have all childs included, regardless of how many total rows.
update
This is the actual query, now with limit in the first branch:
WITH RECURSIVE included_childs(comment_id, from_user_id, fk_topic_id, comment_text, parent_comment_id, created) AS
((SELECT comment_id, from_user_id, fk_topic_id, comment_text, parent_comment_id, created
FROM vw_comments WHERE fk_topic_id = 2
and parent_comment_id is null
limit 1)
union all
SELECT c.comment_id, c.from_user_id, c.fk_topic_id, c.comment_text, c.parent_comment_id, c.created
FROM included_childs ch, vw_comments c
WHERE c.comment_id = ch.parent_comment_id
)
SELECT *
FROM included_childs
still, this doesn't give me the expected results, I get 1 comment as a result with no replies.
update 2
silly mistake on the where clause:
WHERE c.comment_id = ch.parent_comment_id
should've been
WHERE ch.comment_id = c.parent_comment_id
it's working now.

I think the first branch in the UNION in the recursive CTE should be something like:
SELECT comment_id, parent_comment_id
FROM comments
WHERE parent_comment_id IS NULL
LIMIT 10
Then you'd get all replies for these 10 "root" comments.
I'd expect some sort of ORDER BY in there, unless you don't care for the order.
UNION ALL would be fester than UNION, and there cannot be cycles, right?

Related

postgresql multiple Common Table Expressions

I have to tables : mutation and ann_commune. The colums code_postal and id_commune are in ann_commune, all the other ones in mutation.
I need to get together the average for the top 3 values of several departments
WITH six as (
SELECT commune as commune_six, AVG(valeur_fonciere) as moyenne_six
FROM mutation
LEFT JOIN ann_commune on mutation.id_commune = ann_commune.id_commune
WHERE code_postal LIKE '06%'
GROUP BY commune_six
ORDER BY moyenne_six DESC
LIMIT 3
),
treize as (
SELECT commune as commune_treize, AVG(valeur_fonciere) as moyenne_treize
FROM mutation
LEFT JOIN ann_commune on mutation.id_commune = ann_commune.id_commune
WHERE code_postal LIKE '13%'
GROUP BY commune_treize
ORDER BY moyenne_treize DESC
LIMIT 3
)
select AVG(moyenne_six)::NUMERIC(7,0) as Moyenne_top3_06,
AVG(moyenne_treize)::NUMERIC(7,0) as Moyenne_top3_13
from six, treize
I am sure there is a way to create a nicer quode, because I have to do the same with 10 averages. I did not find the solutions, but I am sure it exists.
Thank you in advance :).

more than one row returned by a subquery used as an expression problem

I am trying to update a column in one database with a query:
Here the query
and this is the output i think it is impossible to asign a query to a field but what is the solution for that plz.
enter image description here
= can be used when we are pretty sure that the subquery returns only 1 value.
When we are not sure whether subquery returns more than 1 value, we will have to use IN to accommodate all values or simply use TOP 1 to limit the equality matching to one value:
UPDATE mascir_fiche SET partner = (SELECT TOP 1 id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee));
With Limit:
UPDATE mascir_fiche SET artner = (SELECT id FROM hr_employee WHERE parent_id IN (SELECT id FROM hr_employee) limit 1);

Postgres query for report

I'm trying to solve this problem:
I have a query/view that will join ~10 tables to extract some fields for a report (if any). The query doesn't use any grouping function, only joins and cut off some unuseful data.
I have to take this one big view, get the group for the first index, take the max of a date in the second column and take all the information from other fields referring the record of the max value.
I cannot be able to to this in postgres.
As a pseudo code I can give this:
select 1
, max(2)
, 3 referred to the record from max(2)
, 4 referred to the record from max(2)
, ...
, 20 referred to the record from max(2)
from (ViewWithAllJoins) a
group by 1
For privacy and business problem I had to obfuscate some informations, 1/2/3/4... are the name of the column from the view "ViewWithAllJoins", I hope that the problem is still understandable and resolvable!
I've tryied with WINDOW command as reported in Convert keep dense_rank from Oracle query into postgres but I cannot be able to use the group by that I need. Other tryes that I've done was about the dense_rank like shown in Dense_rank first Oracle to Postgresql convert but I can't do any assumption on the order of the data in any of the other fields in exception of 1 and 2, so I can't use any of the aggregate function on them.
Any ideas? Possibly without adding too much subqueryes.
Thank you!
EDIT:
As suggested I'll add some synthetic data to better understand the problem and what I want.
Start:
ID DATE COLUMN1 COLUMN2 COLUMN3
=====================================================================
88888888;"2016-04-02 09:00:00";"aaaaaaaaaaa";"TEXT89" ; 999999999
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
88888888;"2017-11-09 09:00:00";"zzzz" ;"TEXT80000" ; 850580582
75858585;"2017-01-31 09:00:00";"~~~~~~~~~~~";"TEXT10" ; 101010101
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
What I want:
ID DATE COLUMN1 COLUMN2 COLUMN3
===================================================================
88888888;"2018-08-21 09:00:00";"a" ;"TEXT1" ; 988888888
75858585;"2018-04-02 09:00:00";"eeeeeeeeeee";"TEXT1000" ; 111111111
99999999;"2016-04-02 09:00:00";"8d2ecafd866";"TEXT808911"; 777777777
So the group by id, the max of the date and the other fields related to the row of the max date.
-- So you have duplicate records per ID, and for every ID you want to select the record with the most recent date ?
Use NOT EXISTS:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM queryview t
WHERE NOT EXISTS (
SELECT *
FROM queryview x
WHERE x.id=t.id
AND x.zdate > t.zdate
);
Or, use row_number() over a window, and pick only the row with the final date:
SELECT id,zdate,column1,column2,column3 -- , ...
FROM ( SELECT *
, row_number() OVER(PARTITION BY id, ORDER BY zdate DESC) AS rn
FROM queryview
) q
WHERE q.rn = 1
;

EXISTS in filter returning too many values

I need to write a query that uses EXISTS, rather than IN, so that it will run fast. The filter is being fed so many parameter values that EXISTS seems like the only option. The difference is between a 20+ minute query and a 5 second query.
This is the query I have:
SELECT DISTINCT d.GROUP_NAME
FROM [EMPLOYEE] e JOIN [DATA_FACT] d ON (e.KEY = d.KEY)
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select '1234567' -- #ID
)
AND e.Location IN (#Location)
ORDER BY d.GROUP_NAME ASC
The problem is that it is returning too many records. Based on the values I'm passing to filter on, I should get 1 row back but instead I am getting 28.
If I remove the EXISTS and add the following then I get the 1 record I need:
AND e.ID IN ('1234567')
Is there a way to fix the query to work with EXISTS so that I get the correct results?
This is essentially what you want if you are going to try to use exists to filter your data_fact table by parameters in your employee table. Not sure how much it's going to improve your performance though when you throw a massive number of employee IDs at it.
SELECT
d.GROUP_NAME
FROM [DATA_FACT] AS d
WHERE d.DATE BETWEEN #Start and #End
AND EXISTS
(
select 1
from EMPLOYEE AS e
WHERE d.[KEY] = e.[KEY]
AND e.[Location] IN (#Location)
AND e.ID IN ('1234567')
)
ORDER BY d.GROUP_NAME ASC

SQLITE : Optimize ORDER BY Query

All,
I am iOS developer. Currently we have stored 2.5 lacks data in database. And we have implemented search functionality on that. Below is the query that we are using.
select CustomerMaster.CustomerName ,CustomerMaster.CustomerNumber,
CallActivityList.CallActivityID,CallActivityList.CustomerID,CallActivityList.UserID,
CallActivityList.ActivityType,CallActivityList.Objective,CallActivityList.Result,
CallActivityList.Comments,CallActivityList.CreatedDate,CallActivityList.UpdateDate,
CallActivityList.CallDate,CallActivityList.OrderID,CallActivityList.SalesPerson,
CallActivityList.GratisProduct,CallActivityList.CallActivityDeviceID,
CallActivityList.IsExported,CallActivityList.isDeleted,CallActivityList.TerritoryID,
CallActivityList.TerritoryName,CallActivityList.Hours,UserMaster.UserName,
(FirstName ||' '||LastName) as UserNameFull,UserMaster.TerritoryID as UserTerritory
from
CallActivityList
inner join CustomerMaster
ON CustomerMaster.DeviceCustomerID = CallActivityList.CustomerID
inner Join UserMaster
On UserMaster.UserID = CallActivityList.UserID
where
(CustomerMaster.CustomerName like '%T%' or
CustomerMaster.CustomerNumber like '%T%' or
CallActivityList.ActivityType like '%T%' or
CallActivityList.TerritoryName like '%T%' or
CallActivityList.SalesPerson like '%T%' )
and CallActivityList.IsExported!='2' and CallActivityList.isDeleted != '1'
order by
CustomerMaster.CustomerName
limit 50 offset 0
Without using 'order by' The query is returning result in 0.5 second. But when i am attaching 'order by', Time is increasing to 2 seconds.
I have tried indexing but it is not making any noticeable change. Any one please help. If we are not going through Query then how can we do it fast.
Thanks in advance.
This is due to the the limit. Without ORDER BY only 50 records have to be processed and any 50 will be returned. With ORDER BY all the records have to be processed in order to determine which ones are the first 50 (in order).
The problem is that the ORDER BY is performed on a joined table. Otherise you could apply the limit on the main table (I assume it is the CallActivityList) first and then join.
SELECT ...
FROM
(SELECT ... FROM CallActivityList ORDER BY ... LIMIT 50 OFFSET 0) AS CAL
INNER JOIN CustomerMaster ON ...
INNER JOIN UserMaster ON ...
ORDER BY ...
This would reduce the costs for joining the tables. If this is not possible, try at least to join CallActivityList with CustomerMaster. Apply the limit to those and finally join with UserMaster.
SELECT ...
FROM
(SELECT ...
FROM
CallActivityList
INNER JOIN CustomerMaster ON ...
ORDER BY CustomerMaster.CustomerName
LIMIT 50 OFFSET 0) AS ActCust
INNER JOIN UserMaster ON ...
ORDER BY ...
Also, in order to make the ordering unambiguous, I would include more columns into the order by, like call date and call id. Otherwise this could result in a inconsistent paging.