KDB/Q query for rows matching a contiguous criteria? - kdb

I'm looking to find transactions (contiguous?) in a data set of shops which follows the trend that they eventually complete a transaction despite a few prior cancellations in a day.
A valid batch transaction must meet a set of criteria.
They should be from the same shop.
They should eventually be completed i.e. X amount of cancellation but 1 completion.
The pending batch transactions (cancelled and completion) should not exceed a certain time frame, for example, 1 day.
The transactions should have the same amount of cash tagged to be considered the 'same' transaction.
Transactions should be binned by days i.e. any pending batches should not be considered as continuity for the next day.
Cancelled transactions with amount in power of tens i.e. 10, 1000, 10000, should be ignored.
The query should retain all batches which meet the above criteria. The final table should have a column batch with the running total of the batches to differentiate them.
Initial Table:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Delineated By Days:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
------------------------------
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
------------------------------
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
------------------------------
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Resultant Table :
shop amount status date batch
-------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
-------------------------------------
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
-------------------------------------
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
-------------------------------------
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Table Query:
([] shop:`A`A`A`A`A`A`B`B`B`B`C`C`C`D`D`D`D; amount: 1234 1234 1234 1234 1234 1000 100 100 4321 4321 333 333 333 111 155 111 155; status:`Cancelled`Cancelled`Completed`Cancelled`Completed`Completed`Cancelled`Cancelled`Cancelled`Cancelled`Cancelled`Completed`Completed`Cancelled`Cancelled`Completed`Completed; date: `20101010`20101010`20101010`20101010`20101011`20101011`20101011`20101011`20101011`20101011`20101012`20101012`20101012`20101013`20101013`20101013`20101013)
Explanation:
On the first day, A makes 4 transactions. The first three are batched together as they have the same amount [cancelled -> cancelled -> completed]. The last one transaction is ignored as it is the end of day.
On the second day, A makes a transaction of the same amount of 1234 but it does not take the previous day transaction as part of its batch. A completes another transaction of 1000. B make four transaction but they not tracked as they are a) cancelled or b) powers of ten.
On the third day, C makes three transaction of the same amount. This is considered two batches since the first cancellation and completion forms the initial batch, and the the final completed transaction is a batch by its own.
On the fourth day, D makes four transactions and form two batches. Note that the transaction are not contiguous here since there are two cancelled transaction with varying amount, but both are completed in the future.
Table is ordered by timestamp and date i.e. 23:59:59 to 00:00:00. Query need not be a one-liner and can be a multi line query writing to any temp table/variable etc.
Additionally, if there's a way to get number of cancelled transaction per batch that will be helpful.

So first count the number of batches that are completed.
q)n:count select from tab where status=`Completed
Then use the below query to assign the batch numbers to each Completed row
q)btab:update batch:1+til n from tab where status=`Completed
q)btab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Then reverse the table to fill forwards the nulls by date,shop and amount and reverse back and remove any Cancellations that are powers of 10 (using same logic as terrylynch)
q)ftab:reverse update fills batch by date,shop,amount from reverse btab where not (status=`Cancelled)&{x=`int$x}10 xlog amount
q)ftab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
And then select from the table and pull data that has batch numbers
q)stab:select from ftab where batch<>0N
q)stab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
q)
Finally here is a query to get the number of cancellations per batch
q)select numberOfCancellations:-1+count batch by batch from stab
batch| numberOfCancellations
-----| ---------------------
1 | 2
2 | 0
3 | 0
4 | 1
5 | 0
6 | 1
7 | 1

This is not the final query but a starting point at least:
q)select from tab where not (status=`Cancelled)&{x=`int$x}10 xlog amount, ({raze(reverse maxs reverse#)each`Completed=x[`status] group x`amount};([]amount;status)) fby ([]date;shop)
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Completed 20101011
A 1000 Completed 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
The batch logic could be done with a subsequent query

Related

How to calculate the amount spent for the last month in SQL?

I have a table transaction_details:
transaction_id
customer_id
item_id
item_number
transaction_dttm
7765
1
23
1
2022-01-15
1254
2
12
4
2022-02-03
3332
3
56
2
2022-02-15
7658
1
43
1
2022-03-01
7231
4
56
1
2022-01-15
7231
2
23
2
2022-01-29
I need to form a table of the following type customer_aggr:
customer_id
amount_spent_lm
top_item_lm
1
700
glasses
2
20000
notebook
3
100
cup
When calculating, it is necessary to take into account the current price at the time of the transaction (dict_item_prices). Customers who have not made purchases in the last month are not included in the final table. he last month is defined as the last 30 days at the time of the report creation.
There is also a table dict_item_prices:
item_id
item_name
item_price
valid_from_dt
valid_to_dt
23
phone 1
1000
2022-01-01
2022-12-31
12
notebook
5000
2022-01-02
2022-12-31
56
cup
50
2022-01-02
2022-12-31
43
glasses
700
2022-01-01
2022-12-31

How to remove duplicate values from a query

I have the following issue, I need to remove duplicate values from a specific column I query. No deleting!(ClassID)
SchoolNo
Schoolyear
Schoolgrade
Classname
ClassId
65432
2001
5
ab
441
65432
2001
5
cd
442
65432
2001
6
a
443
65432
2001
6
b
444
56838
2001
5
ab
445
56838
2001
5
cd
446
56838
2001
6
ab
445
56838
2001
6
ef
447
12726
2001
5
ms
448
12726
2001
6
ms
448
If you look at the values of classId I have repeated class numbers because some special schools sometimes put 2 classes together for both grades. The problem is my query needs to show only 1 classid value. No repeats. Therefore we can remove any extra class that is repeated in value and only show for grade 5.
In other words my table should at end up looking like this.
SchoolNo
Schoolyear
Schoolgrade
Classname
ClassId
65432
2001
5
ab
441
65432
2001
5
cd
442
65432
2001
6
a
443
65432
2001
6
b
444
56838
2001
5
ab
445
56838
2001
5
cd
446
56838
2001
6
ef
447
12726
2001
5
ms
448
The code generally looks like this.
select schoolno,schoolyear,schoolgrade,classname,classId
from classgroup cg
How should I approach this?
maybe you can do it like this:
select
first_value(schoolno) over w,
first_value(schoolyear) over w,
first_value(schoolgrade) over w,
first_value(classname) over w,
first_value(classId) over w
FROM
classgroup
WINDOW w AS (PARTITION BY schoolno, schoolyear, classId ORDER BY schoolgrade);
You partition the data by schoolno, schoolyear and classId and order by schoolgrade then take only the first row of each partition.
Note: the syntax may be a bit off since I couldn't test it
Try this
select cg1.* from classgroup cg1
left join classgroup cg2 on (cg1."ClassId"=cg2."ClassId" and cg1."Schoolgrade"<cg2."Schoolgrade")
where cg2."Schoolgrade" is null
The output:

Selecting Rows within Separate IDs until a first occurrence of given value

I am try to get all rows within each separate ID until the first occurrence of a given value, in this case "CR" but have to reverse the order of the rows. Here is how the data is stored on the table:
ID DebtMth YearMth Status Balance
1 5 2015-02 DR 10.00
1 4 2015-03 DR 10.00
1 3 2015-04 CR 00.00
1 2 2015-06 DR 10.00
1 1 2015-07 DR 10.00
2 10 2011-01 DR 20.00
2 9 2011-02 DR 20.00
2 8 2011-03 CR 20.00
3 11 2012-02 DR 30.00
3 10 2012-03 DR 30.00
3 8 2012-05 CR 00.00
3 7 2012-06 CR 00.00
3 6 2012-07 DR 30.00
I need to reverse the order so the last row within each ID group becomes the first and so on. So the table would be sorted as follows.
ID DebtMth YearMth Status Balance
1 1 2015-07 DR 10.00
1 2 2015-06 DR 10.00
1 3 2015-04 CR 00.00
1 4 2015-03 DR 10.00
1 5 2015-02 DR 10.00
2 8 2011-03 CR 20.00
2 9 2011-02 DR 20.00
2 10 2011-01 DR 20.00
3 6 2012-07 DR 30.00
3 7 2012-06 CR 00.00
3 8 2012-05 CR 00.00
3 10 2012-03 DR 30.00
3 11 2012-02 DR 30.00
Now I need to select rows within each ID group up until the Status is 'CR' and exclude any ID whose first row is 'CR'. So the output would look like this.
ID DebtMth YearMth Status Balance
1 1 2015-07 DR 10.00
1 2 2015-06 DR 10.00
3 6 2012-07 DR 30.00
I am using the Query Designer in Report Builder 3 connecting to an Microsoft SQL2012 Server.
I would very much appreciate any suggestions.
Martin
SELECT
id , DebtMth , YearMth , Status , Balance
FROM
(
SELECT
*
, MAX(CASE WHEN status = 'CR' THEN YearMth END) OVER(PARTITION BY id) AS first_cr_yearMth
FROM YourTable
) AS T
WHERE YearMth > first_cr_yearMth OR first_cr_yearMth IS NULL

Using a join instead of subquery

For this question:
Get the pids of products ordered through any agent who makes at least one order for a customer in Kyoto. Use joins this time; no sub-queries.
I was able to get the answer using a subquery:
select distinct pid
from orders
where aid in (
select aid
from orders
where cid in(
select cid
from customers
where city = 'Kyoto'
)
)
I cannot figure out how to do this using only joins however.
This code returns the aid's that i need to get the pid's however i can't come up with a way to get them without using a sub query:
select distinct o.aid
from orders o, customers c
where o.cid = c.cid
and c.city = 'Kyoto'
Here are the two tables i am using:
Customers:
cid name city discount
c001 Tiptop Duluth 10.00
c002 Basics Dallas 12.00
c003 Allied Dallas 8.00
c004 ACME Duluth 8.00
c005 Weyland-Yutani Acheron 0.00
c006 ACME Kyoto 0.00
and Orders:
ordno mon cid aid pid qty dollars
1011 jan c001 a01 p01 1000 450.00
1013 jan c002 a03 p03 1000 880.00
1015 jan c003 a03 p05 1200 1104.00
1016 jan c006 a01 p01 1000 500.00
1017 feb c001 a06 p03 600 540.00
1018 feb c001 a03 p04 600 540.00
1019 feb c001 a02 p02 400 180.00
1020 feb c006 a03 p07 600 600.00
1021 feb c004 a06 p01 1000 460.00
1022 mar c001 a05 p06 400 720.00
1023 mar c001 a04 p05 500 450.00
1024 mar c006 a06 p01 800 400.00
1025 apr c001 a05 p07 800 720.00
1026 may c002 a05 p03 800 740.00

Selecting all data for records based on most recent date

MySQL client version: 5.0.24a
Hey Folks,
I have a table WorkOrders_errors that looks like this:
ID CO CAR NAME CAN BLN INDATE MODDATE EX
66897 461 57 KKLU KKLUSH9862088 AKLU6013312 1/27/2014 1:00 1/27/2014 1:00 -1
60782 461 57 KKLU KKLUHB21629300 AKLU6501153 1/26/2014 22:00 1/26/2014 22:00 1
74188 461 57 KKLU KKLUHB21629300 AKLU6501153 1/27/2014 10:00 1/27/2014 10:00 1
66645 461 57 KKLU KKLUSH8222080 AKLU6501744 1/26/2014 21:45 1/26/2014 21:45 1
63307 461 126 ZIMU ZIMUGOA321986 AMFU3037671 1/27/2014 1:15 1/27/2014 1:15 1
65081 461 24 CMDU CMDUAU1337382 AMFU3043761 1/26/2014 21:30 1/26/2014 21:30 1
72660 461 24 CMDU CMDUAU1337382 AMFU3043761 1/27/2014 9:30 1/27/2014 9:30 1
I need only the records with the most recent MODDATE, ie ID Record 74188, not 60782.
I have tried this a few ways, but without success. Most recently tried
SELECT * FROM (
SELECT * FROM WorkOrders_errors ORDER BY ModDate DESC) as tmp
GROUP BY can
ORDER BY can
'ALSO TRIED
SELECT t1.*
FROM WorkOrders_errors t1
WHERE t1.Can = (SELECT t2.Can
FROM WorkOrders_errors t2
WHERE t2.Can = t1.Can
ORDER BY t2.Can DESC
LIMIT 1)
These both seem to take a Huge amount of resources/time. The table only has about 80,000 rows.
Thanks anyone!