Merge selected group keys in KDB (Q) group by query - kdb

I have a query that essentially does counting by group key in KDB, in which I want to treat some of the groups as one for the purpose of this query. A simplified description of what I'm trying to do would be to count orders by customer in a month, where I have a couple of customers in the database that are actually subsidiaries of another customer, and I want to combine the counts of the subsidiaries with their parent organisation. The real scenario us much more complicated than that and without getting into unnecessary detail, suffice to say that I can't just group by customer and manipulate the results to merge counts after the query is executed - I need the "by" clause of my query to do the merging directly.
In SQL, I would do something like this:
select customer_id, count(*) as order_count
from orders
order by select case when customer_id = 1 then 2 when customer_id = 3 then 4 else customer_id end
In the above example, customer 1 is a subsidiary of customer 2, customer 3 is a subsidiary of customer 4 and every other customer is treated normally
Let's say the equivalent code in Q (without the manipulation of group keys) is:
select order_count:count i by customer_id from orders
How would I put in the equivalent select case statement to manipulate the group key? I tried this, but got a rank error:
select order_count:count i by $[customer_id=1;2;customer_id=3;4;customer_id] from orders
I'm terrible at Q so I'm probably making a very simple mistake. Any advice greatly appreciated.

One approach might be to have a dictionary of subsidiaries and use a lookup/re-map in your by clause:
q)dict:1 3!2 4
q)show t:([] order:1+til 10;customer:1+10?6)
order customer
--------------
1 1
2 1
3 6
4 2
5 3
6 4
7 5
8 5
9 3
10 5
q)select order_count:count i by customer^dict[customer] from t
customer| order_count
--------| -----------
2 | 3
4 | 3
5 | 3
6 | 1
You will lose some information about who actually owns the orders though, you'll only know at the parent level

Related

Query a history table to find state on a given date in postgresql

I have created a history table that is populated by triggers on another "live" table. I now want to be able to see how it looked on a given date. I am able to query a single product using a where clause which gives me the desired output for a single product.
SELECT * FROM test
WHERE productid = 1
AND updated < '2020-02-15'
ORDER BY updated DESC
LIMIT 1
But how do I get the last updated value before my given date (mid-Feb in this example) for each product in the table?
A simple version of my table looks like this:-
productid amount updated
1 5 01/01/2020
1 6 01/02/2020
1 7 01/03/2020
2 13 01/01/2020
2 14 01/02/2020
2 15 01/04/2020
and my desired outcome is:
productid amount updated
1 6 01/02/2020
2 14 01/02/2020
Many thanks
You can use distinct on:
select distinct on (productid) t.*
from test t
where updated < date '2020-02-15'
order by productid, updated desc

Inserting multiple rows from an array of arrays in postgreSQL where arrayA[0] =>arrayB[arr1[]], arrayA[1]=>arrayB[arr2[]]

I have this scenario I wish to implement using the query directly in postgresql.
I have user inputs in an array and the size of these arrays can vary. A sample user input is given below:
arrayA=[1,2,3];
arrayB=[[11,22,33],[12,23,34],[4,5,6]];
In the table I need to insert the above data in the following manner
Table A
id dataA dataB
1 1 11
2 1 22
3 1 33
4 2 12
5 2 23
6 2 34
7 3 4
8 3 5
9 3 6
I have tried using unnest() but I'm not able to get the output I want. I'm not sure how to use it to get the required output or if it is the right way to use it. Can someone please help me with this!!
This is a bit tricky due to the two dimensional array, but the following works with your two sample arrays:
select row_number() over (order by a.idx, b.idx) as id,
a.data_a,
b.data_b
from unnest(array[1,2,3]) with ordinality as a(data_a, idx)
cross join lateral unnest( (array[ [11,22,33], [12,23,34], [4,5,6]])[a.idx:a.idx][1:]) with ordinality as b(data_b,idx);
Online example

How do I select rows from one table that do not exist in another table with a specific value

I know this sounds simple and I have found many similar examples but I can't make sense of it to match my specific problem.
I have tried some nested selects and some left joins but they didn't work.
I have a Building table and a ComplianceItems Table
A Building can have many ComplianceItems.
I also have a ComplianceItemType table that contains all the possible types.
I want to find all the Buildings that don't have a ComplianceItem of Type 17 or 18.
I see lots of examples that select the parent record when there are no child records.
But I want to select all the parent records that don't have a Compliance Item of type 17 or 18
How can I add this condition to my query?
select OC.lOwnersCorporationID
From Strata.dbo.OwnersCorporation OC
Left Join ComplianceDEMO.dbo.ComplianceItem CI on OC.lOwnersCorporationID = CI.OwnersCorporationID
Where CI.OwnersCorporationID IS NULL
AND OC.bManaged = 'Y'
BTW I don't care about performance as this query will only be run once and used as part of an insert statement to create missing records.
UPDATE
Obviously my data is more complex than this but this should give you an idea.
Table Definition
Ownerscorporation Table
OwnersCorporationID INT PK
PlanNumber Varchar(10)
ComplianceItem Table
ComplianceItemID INT PK
ComplianceTypeID INT FK
OwnersCorporationID INT FK
ComplianceType Table
ComplianceTypeID INT PK
Name varchar(50)
Test Data
Owners Corporation Table
ID PlanNumber
===============================================
1 1001
2 1002
3 1003
Compliance Item Table
ComplianceItemID ComplianceTypeID OwnersCorporationID
==================================================================
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 1 2
7 2 2
8 4 2
9 1 3
10 2 3
11 3 3
Compliance Type Table
==================================================================
ComplianceTypeID Name
1 Asbestos Report
2 Capital Works Fund
3 Anchor Point Compliance
4 Window Lock Compliance
5 Pool Compliance
The Problem
=========================
Probably easier in two parts.
Find all the OwnersCorporations That Don't have a Compliance type of 4
Find all the OwnersCorporations That Don't have a Compliance type of 5
Expected Results
=========================
Part 1
OwnersCorporationID
3
Part 2
OwnersCorporationID
2
3
UPDATE 2
It looks like Matt's answer in the comments worked on the test data.
Select OwnersCorporationID
From OwnersCorporationTable
Where OwnersCorporationID Not In
(
Select OwnersCorporationID
From ComplianceItemTable
Where (ComplianceTypeID =5 )
Returned 2 and 3
And
Select OwnersCorporationID
From OwnersCorporationTable
Where OwnersCorporationID Not In
(
Select OwnersCorporationID
From ComplianceItemTable
Where (ComplianceTypeID =4 )
Returned 3
I will need to run it in two parts when I use it to create the missing records.
I'll try and convert it to run on my real data. Hopefully I haven't simplified it too much :)
Thanks for your help Matt. If you would like the points please post this as an answer.
You may be looking for this
SELECT OC.lOwnersCorporationID
From Strata.dbo.OwnersCorporation OC
INNER JOIN ComplianceDEMO.dbo.ComplianceItem CI on OC.lOwnersCorporationID = CI.OwnersCorporationID AND
CI.ComplianceTypeID NOT IN (17,18) AND
OC.bManaged = 'Y'

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.

Creating a view with a column not in the base table

I have a table with 4 columns. I am being asked to create a view that performs a calculation and then puts the results in a column not in the table.
Here it is: Create a view called v_count that shows the number of students working on each assignment. The view should have columns for the assignment number and the count.
The underlying table does not have a count column.
Well you have to make use of Count function and GROUP BY clause. Suppose you have student id and assignment id in your table:
sId AsnId
1 1
1 2
2 1
2 5
2 8
3 2
3 4
Then following query will give you count of students working on an assignments:
SELECT asnId [Assignment], COUNT(sid) [Students]
FROM Assignment
GROUP BY asnid
Now you can use this query to create your view. But do read docs about Count and Group By