Postgres count total matches per group - postgresql

Input data
I have the following association table:
AssociationTable
- Item ID: Integer
- Tag ID: Integer
Referring to the following example data
Item Tag
1 1
1 2
1 3
2 1
and some input list of tags T (e.g. [1, 2])
What I want
For each item, I would like to know which tags were not provided in the input list T.
With our sample data, we'd get:
Item Num missing
1 1
2 0
My thoughts
The best I've done so far is: select "ItemId", count("TagId") as "Num missing" from "AssociationTab" where "TagId" not in (1) group by "ItemId";
The problem here is that items where all tags match will not be included in the output.

You could use a calendar table with anti-join approach:
WITH cte AS (
SELECT t1.Item, t2.Tag
FROM (SELECT DISTINCT Item FROM AssociationTable) t1
CROSS JOIN (SELECT 1 AS Tag UNION ALL SELECT 2) t2
)
SELECT
t1.Item,
COUNT(*) FILTER (WHERE t2.Item IS NULL) AS num_missing
FROM cte t1
LEFT JOIN AssociationTable t2
ON t1.Item = t2.Item AND
t1.Tag = t2.Tag AND
t2.Tag IN (1, 2)
GROUP BY
t1.Item;
Demo
The strategy here is to build a calendar/reference table in the first CTE which contains all combinations of items and tags. Then, we left join this CTE to your association table, aggregate by item, and then detect how many tags are missing for each item.

Simplest solution is
SELECT
ItemId,
count(*) FILTER (WHERE TagId NOT IN (1,2))
FROM AssociationTab
GROUP BY ItemId
Alternatively, if you already have an Items table with the item list, you could do this:
SELECT
i.ItemId,
count(a.TagId)
FROM Items i
LEFT JOIN AssociationTab a ON a.ItemId = i.ItemId AND a.TagId NOT IN (1,2)
GROUP BY i.ItemId
The key is that LEFT JOIN does not remove the Items row if no tags match.

Related

Restrict string_agg order by in postgres

While working with postgres db, I came across a situation where I will have to display column names based on their ids stored in a table with comma separated. Here is a sample:
table1 name: labelprint
id field_id
1 1,2
table2 name: datafields
id field_name
1 Age
2 Name
3 Sex
Now in order to display the field name by picking ids from table1 i.e. 1,2 from field_id column, I want the field_name to be displayed in same order as their respective ids as
Expected result:
id field_id field_name
1 2,1 Name,Age
To achieve the above result, I have written the following query:
select l.id,l.field_id ,string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id
However, the string_agg() functions sort the final string in ascending order and displays the output as shown below:
id field_id field_name
1 2,1 Age, Name
As you can see the order is not maintained in the field_name column which I want to display as per field_id value order.
Any suggestion/help is highly appreciated.
Thanks in advance!
Already mentioned in the description.
While this will probably be horrible for performance, as well as readability and maintainability, you can dynamically compute the order you want:
select l.id,l.field_id,
string_agg(d.field_name,','
order by array_position(string_to_array(l.field_id::text,','),d.id)
) as field_names
from labelprint l
join datafields d on d.id = ANY(string_to_array(l.field_id::text,','))
group by l.id
order by l.id;
You should at least store your array as an actual array, not as a comma delimited string. Or maybe use an intermediate table and don't store arrays at all.
With a small modification to your existing query you could do it as follows :
select l.id, l.field_id, string_agg(d.field_name,',') as field_names
from labelprint l
join datafields d on d.id::varchar = ANY(string_to_array(l.field_id,','))
group by l.id, l.field_id
order by l.id
Demo here

Postgresql recursive query

I have table with self-related foreign keys and can not get how I can receive firs child or descendant which meet condition. My_table structure is:
id
parent_id
type
1
null
union
2
1
group
3
2
group
4
3
depart
5
1
depart
6
5
unit
7
1
unit
I should for id 1 (union) receive all direct child or first descendant, excluding all groups between first descendant and union. So in this example as result I should receive:
id
type
4
depart
5
depart
7
unit
id 4 because it's connected to union through group with id 3 and group with id 2 and id 5 because it's connected directly to union.
I've tried to write recursive query with condition for recursive part: when parent_id = 1 or parent_type = 'depart' but it doesn't lead to expected result
with recursive cte AS (
select b.id, p.type_id
from my_table b
join my_table p on p.id = b.parent_id
where b.id = 1
union
select c.id, cte.type_id
from my_table c
join cte on cte.id = c.parent_id
where c.parent_id = 1 or cte.type_id = 'group'
)
Here's my interpretation:
if type='group', then id and parent_id are considered in the same group
id#1 and id#2 are in the same group, they're equals
id#2 and id#3 are in the same group, they're equals
id#1, id#2 and id#3 are in the same group
If the above is correct, you want to get all the first descendent of id#1's group. The way to do that:
Get all the ids in the same group with id#1
Get all the first descendants of the above group (type not in ('union', 'group'))
with recursive cte_group as (
select 1 as id
union all
select m.id
from my_table m
join cte_group g
on m.parent_id = g.id
and m.type = 'group')
select mt.id,
mt.type
from my_table mt
join cte_group cg
on mt.parent_id = cg.id
and mt.type not in ('union','group');
Result:
id|type |
--+------+
4|depart|
5|depart|
7|unit |
Sounds like you want to start with the row of id 1, then get its children, and continue recursively on rows of type group. To do that, use
WITH RECURSIVE tree AS (
SELECT b.id, b.type, TRUE AS skip
FROM my_table b
WHERE id = 1
UNION ALL
SELECT c.id, c.type, (c.type = 'group') AS skip
FROM my_table c
JOIN tree p ON c.parent_id = p.id AND p.skip
)
SELECT id, type
FROM tree
WHERE NOT skip

How to use OPENJSON on multiple rows

I have a temp table with multiple rows in it and each row has a column called Categories; which contains a very simple json array of ids for categories in a different table.
A few example rows of the temp table:
Id Name Categories
---------------------------------------------------------------------------------------------
'539f7e28-143e-41bb-8814-a7b93b846007' Test 1 ["category1Id", "category2Id", "category3Id"]
'f29e2ecf-6e37-4aa9-aa56-4a351d298bfc' Test 2 ["category1Id", "category2Id"]
'34e41a0a-ad92-4cd7-bf5c-8df6bfd6ed5c' Test 3 NULL
Now what I would like to do is to select all of the category ids from all of the rows in the temp table.
What I have is the following and it's not working as it's giving me the error of :
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
SELECT
c.Id
,c.[Name]
,c.Color
FROM
dbo.Category as c
WHERE
c.Id in (SELECT [value] FROM OPENJSON((SELECT Categories FROM #TempTable)))
and c.IsDeleted = 0
Which I guess it makes sense that's failing on that because I'm selecting multiple rows and needing to parse each row's respective category ids json. I'm just not sure what to do/change to give me the results that I want. Thank you in advance for any help.
You'd need to use CROSS APPLY like so:
SELECT id ,
name ,
t.Value AS category_id
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t;
And then, you can JOIN to your Categories table using the category_id column, something like this:
SELECT id ,
name ,
t.Value AS category_id,
c.*
FROM #temp
CROSS APPLY OPENJSON(categories, '$') t
LEFT JOIN Categories c ON c.Id = t.Value

Union which excludes values from the first table

The origional problem I am attempting to solve is that I need to show all rows from a specific "joined" table. However these are sometimes blank with no totals and normally would not show (think categories and counts for each).
So what I am attempting to do is union to a "0 value" data set to show all categories. However when I do the union it shows a 0 value row, as well as the normal data. Here is an example..
SELECT category_name, COUNT(files_number)
FROM files
LEFT JOIN categories ON categories.category_id = files.category_id
UNION
SELECT category_name, 0
FROM categories
This will give me a result set that looks similar to this:
category_name | value
----------------------
open file | 0
open file | 23
closed file | 0
Is there any way to remove duplicate zero value entries? Please not there is also a complex WHERE clause in the actual query, so avoiding duplication on it is preferred.
I don't get why you are doing left join and union..
You can do below to remove duplicates,wrap your query and do group by
;with cte
as
(
SELECT category_name, COUNT(files_number)
FROM files
LEFT JOIN categories ON categories.category_id = files.category_id
UNION
SELECT category_name, 0
FROM categories
)
select categoryname,sum(aggcol)
from cte
group by
category
One way is to select all categories from the categories table, and LEFT JOIN onto the file counts (grouped by category_id).
SELECT c.category_name, ISNULL(fc.FileCount, 0) AS FileCount
FROM categories c
LEFT JOIN (
SELECT category_id, COUNT(files_number) AS FileCount
FROM files
GROUP BY category_id
) fc ON c.category_id = fc.category_id
Edit
If you want to reverse the query, you could do it something like this, using a RIGHT OUTER JOIN - so every category from categories table is returned, regardless of if there are any files for it:
SELECT c.category_name, COUNT(f.category_id) AS FileCount
FROM files f
RIGHT JOIN categories c ON c.category_id = f.category_id
GROUP BY c.name

Query to get row from one table, else random row from another

tblUserProfile - I have a table which holds all the Profile Info (too many fields)
tblMonthlyProfiles - Another table which has just the ProfileID in it (the idea is that this table holds 2 profileids which sometimes become monthly profiles (on selection))
Now when I need to show monthly profiles, I simply do a select from this tblMonthlyProfiles and Join with tblUserProfile to get all valid info.
If there are no rows in tblMonthlyProfile, then monthly profile section is not displayed.
Now the requirement is to ALWAYS show Monthly Profiles. If there are no rows in monthlyProfiles, it should pick up 2 random profiles from tblUserProfile. If there is only one row in monthlyProfiles, it should pick up only one random row from tblUserProfile.
What is the best way to do all this in one single query ?
I thought something like this
select top 2 * from tblUserProfile P
LEFT OUTER JOIN tblMonthlyProfiles M
on M.profileid = P.profileid
ORder by NEWID()
But this always gives me 2 random rows from tblProfile. How can I solve this ?
Try something like this:
SELECT TOP 2 Field1, Field2, Field3, FinalOrder FROM
(
select top 2 Field1, Field2, Field3, FinalOrder, '1' As FinalOrder from tblUserProfile P JOIN tblMonthlyProfiles M on M.profileid = P.profileid
UNION
select top 2 Field1, Field2, Field3, FinalOrder, '2' AS FinalOrder from tblUserProfile P LEFT OUTER JOIN tblMonthlyProfiles M on M.profileid = P.profileid ORDER BY NEWID()
)
ORDER BY FinalOrder
The idea being to pick two monthly profiles (if that many exist) and then 2 random profiles (as you correctly did) and then UNION them. You'll have between 2 and 4 records at that point. Grab the top two. FinalOrder column is an easy way to make sure that you try and get the monthly's first.
If you have control of the table structure, you might save yourself some trouble by simply adding a boolean field IsMonthlyProfile to the UserProfile table. Then it's a single table query, order by IsBoolean, NewID()
In SQL 2000+ compliant syntax you could do something like:
Select ...
From (
Select TOP 2 ...
From tblUserProfile As UP
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Order By NewId()
) As RandomProfile
Union All
Select MP....
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From (
Select TOP 1 ...
From tblUserProfile As UP
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
Order By NewId()
) As RandomProfile
Using SQL 2005+ CTE you can do:
With
TwoRandomProfiles As
(
Select TOP 2 ..., ROW_NUMBER() OVER ( ORDER BY UP.ProfileID ) As Num
From tblUserProfile As UP
Order By NewId()
)
Select MP.Col1, ...
From tblUserProfile As UP
Join tblMonthlyProfile As MP
On MP.ProfileId = UP.ProfileId
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) >= 1
Union All
Select ...
From TwoRandomProfiles
Where Not Exists( Select 1 From tblMonthlyProfile As MP1 )
Union All
Select ...
From TwoRandomProfiles
Where ( Select Count(*) From tblMonthlyProfile As MP1 ) = 1
And Num = 1
The CTE has the advantage of only querying for the random profiles once and the use of the ROW_NUMBER() column.
Obviously, in all the UNION statements the number and type of the columns must match.