Postgres Pivot based on variable column to create a new id - postgresql

I have the following table
type attribute order
1 11 1
1 12 2
2 11 1
2 12 2
3 15 1
3 16 2
4 15 1
4 16 2
I need to understand which types have identical attributes and then assign them a new id. The order column can be as well if it's helpful because each attribute can only have one order, but you don't need to use it.
Ideally the result set would be the following where you have a new id for each type that is based on the attributes in the first table.
type new_id
1 1
2 1
3 2
4 2
I was planning on trying to pivot the table based on the order column and concatenating the attribute id's to create a new id, but I cannot use crosstab and the number of attributes a type has could vary and I need to account for that.
Any suggestions on what to do here?

This works, there's possibly a better way to do it but it's what came to mind:
SELECT UNNEST(types) AS type, new_id
FROM (
SELECT ARRAY_AGG(type) AS types, ROW_NUMBER() OVER() AS new_id
FROM (
SELECT type, ARRAY_AGG(attribute ORDER BY attribute) AS attr
FROM t
GROUP BY type
) x
GROUP BY attr
) y
Output:
1;1
2;1
3;2
4;2
So first it gets the list of attributes for each type, then it gets the list of types for each common list of attributes (this is where it makes sure each type shares the same attributes) and gets a new id for each group of types. Then unnest that to put each type on a new row, and that row number is the new id.

Related

Postgres sequence that resets once the id is different

I am trying to make a Postgres sequence that will reset once the id of the item it is linked to changes, e.g:
ID SEQUENCE_VALUE
1 1
2 1
1 2
1 3
2 2
3 1
I don't know PSQL or SQL in general very well and I can't find a similar question, any Help Is greatly appreciated!
Just use a normal sequence that does not reset and calculate the desired value in the query:
SELECT id,
row_number() OVER (PARTITION BY id
ORDER BY seq_col)
AS sequence_value
FROM mytable;
Here, seq_col is a column that is auto-generated from a sequence (an identity column).

Merge selected group keys in KDB (Q) group by query

I have a query that essentially does counting by group key in KDB, in which I want to treat some of the groups as one for the purpose of this query. A simplified description of what I'm trying to do would be to count orders by customer in a month, where I have a couple of customers in the database that are actually subsidiaries of another customer, and I want to combine the counts of the subsidiaries with their parent organisation. The real scenario us much more complicated than that and without getting into unnecessary detail, suffice to say that I can't just group by customer and manipulate the results to merge counts after the query is executed - I need the "by" clause of my query to do the merging directly.
In SQL, I would do something like this:
select customer_id, count(*) as order_count
from orders
order by select case when customer_id = 1 then 2 when customer_id = 3 then 4 else customer_id end
In the above example, customer 1 is a subsidiary of customer 2, customer 3 is a subsidiary of customer 4 and every other customer is treated normally
Let's say the equivalent code in Q (without the manipulation of group keys) is:
select order_count:count i by customer_id from orders
How would I put in the equivalent select case statement to manipulate the group key? I tried this, but got a rank error:
select order_count:count i by $[customer_id=1;2;customer_id=3;4;customer_id] from orders
I'm terrible at Q so I'm probably making a very simple mistake. Any advice greatly appreciated.
One approach might be to have a dictionary of subsidiaries and use a lookup/re-map in your by clause:
q)dict:1 3!2 4
q)show t:([] order:1+til 10;customer:1+10?6)
order customer
--------------
1 1
2 1
3 6
4 2
5 3
6 4
7 5
8 5
9 3
10 5
q)select order_count:count i by customer^dict[customer] from t
customer| order_count
--------| -----------
2 | 3
4 | 3
5 | 3
6 | 1
You will lose some information about who actually owns the orders though, you'll only know at the parent level

How do I select rows from one table that do not exist in another table with a specific value

I know this sounds simple and I have found many similar examples but I can't make sense of it to match my specific problem.
I have tried some nested selects and some left joins but they didn't work.
I have a Building table and a ComplianceItems Table
A Building can have many ComplianceItems.
I also have a ComplianceItemType table that contains all the possible types.
I want to find all the Buildings that don't have a ComplianceItem of Type 17 or 18.
I see lots of examples that select the parent record when there are no child records.
But I want to select all the parent records that don't have a Compliance Item of type 17 or 18
How can I add this condition to my query?
select OC.lOwnersCorporationID
From Strata.dbo.OwnersCorporation OC
Left Join ComplianceDEMO.dbo.ComplianceItem CI on OC.lOwnersCorporationID = CI.OwnersCorporationID
Where CI.OwnersCorporationID IS NULL
AND OC.bManaged = 'Y'
BTW I don't care about performance as this query will only be run once and used as part of an insert statement to create missing records.
UPDATE
Obviously my data is more complex than this but this should give you an idea.
Table Definition
Ownerscorporation Table
OwnersCorporationID INT PK
PlanNumber Varchar(10)
ComplianceItem Table
ComplianceItemID INT PK
ComplianceTypeID INT FK
OwnersCorporationID INT FK
ComplianceType Table
ComplianceTypeID INT PK
Name varchar(50)
Test Data
Owners Corporation Table
ID PlanNumber
===============================================
1 1001
2 1002
3 1003
Compliance Item Table
ComplianceItemID ComplianceTypeID OwnersCorporationID
==================================================================
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 4 2
9 1 3
10 2 3
11 3 3
Compliance Type Table
==================================================================
ComplianceTypeID Name
1 Asbestos Report
2 Capital Works Fund
3 Anchor Point Compliance
4 Window Lock Compliance
5 Pool Compliance
The Problem
=========================
Probably easier in two parts.
Find all the OwnersCorporations That Don't have a Compliance type of 4
Find all the OwnersCorporations That Don't have a Compliance type of 5
Expected Results
=========================
Part 1
OwnersCorporationID
3
Part 2
OwnersCorporationID
2
3
UPDATE 2
It looks like Matt's answer in the comments worked on the test data.
Select OwnersCorporationID
From OwnersCorporationTable
Where OwnersCorporationID Not In
(
Select OwnersCorporationID
From ComplianceItemTable
Where (ComplianceTypeID =5 )
Returned 2 and 3
And
Select OwnersCorporationID
From OwnersCorporationTable
Where OwnersCorporationID Not In
(
Select OwnersCorporationID
From ComplianceItemTable
Where (ComplianceTypeID =4 )
Returned 3
I will need to run it in two parts when I use it to create the missing records.
I'll try and convert it to run on my real data. Hopefully I haven't simplified it too much :)
Thanks for your help Matt. If you would like the points please post this as an answer.
You may be looking for this
SELECT OC.lOwnersCorporationID
From Strata.dbo.OwnersCorporation OC
INNER JOIN ComplianceDEMO.dbo.ComplianceItem CI on OC.lOwnersCorporationID = CI.OwnersCorporationID AND
CI.ComplianceTypeID NOT IN (17,18) AND
OC.bManaged = 'Y'

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.

Creating a view with a column not in the base table

I have a table with 4 columns. I am being asked to create a view that performs a calculation and then puts the results in a column not in the table.
Here it is: Create a view called v_count that shows the number of students working on each assignment. The view should have columns for the assignment number and the count.
The underlying table does not have a count column.
Well you have to make use of Count function and GROUP BY clause. Suppose you have student id and assignment id in your table:
sId AsnId
1 1
1 2
2 1
2 5
2 8
3 2
3 4
Then following query will give you count of students working on an assignments:
SELECT asnId [Assignment], COUNT(sid) [Students]
FROM Assignment
GROUP BY asnid
Now you can use this query to create your view. But do read docs about Count and Group By