KDB/Q: Get sum of a certain column value - kdb

I want to create a column that equals the sum of the value in another column (colA) when the value of another column (colB) equal something.
Without the condition, I can get the sum of all values in colA by using:
update TotalVal: sum colB by date from myTable
I tried to achieve what I want by using
update GOT: sum colB by date from myTable where colA in (`abc,`edf)
This creates the correct values for GOT, but the GOT column only has value where colA is abc or edf. This is not what I really want.
To visualise, what I want is the column WANTED
date colA colB GOT WANTED
2020.08.17 abc 5 13 13
2020.08.17 mom 7 13
2020.08.17 xyz 8 13 13
2020.08.17 tuf 9 13

I just fixed it myself by
update (GOT: sum colB where colA in (`abc,`edf)) by date from myTable

Related

query specific table columns

I have table with specific column names. They have the prefix 'file_'.
For example:
Column Name
Value
name
somename
date
2000-01-01
size
15
file_type1
1
file_type2
34
.....
....
file_typeN
12
The file types columns 'file_typeN' can be added by another team to table (even may be deleted).
So I want to create sql query to select only values for columns with prefix 'file_'.
The one query for the table my_files_description_table, which can have different number of columns with 'file_' prefix.
Something like:
select <only columns with 'file_' prefix> from my_files_description_table;
I can query all columns with 'file_' prefix:
SELECT column_name FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'my_files_description_table' and column_name like 'file_%';
But I don't know what to do with that.
I need the query that for this table
Column Name
Value
name
somename
date
2000-01-01
size
15
file_type1
1
file_type2
34
should return
Column Name
Value
file_type1
1
file_type2
34
And for this table
Column Name
Value
name
somename
date
2000-01-01
size
15
file_type1
2
file_type2
5
file_type3
134
file_type4
12
should return
Column Name
Value
file_type1
2
file_type2
5
file_type3
134
file_type4
12
I use PostgreSQL 9.6.

How do I calculate cumulative sum for last 7 rows on a specific date in Postgresql?

I have a table that has these columns: user_id, day, valueA, valueB.
I'd like to calculate the running sum of last 7 rows of valueA and valueB for each user that has data on a specific day, for example '2020-08-01'.
(Note: Users only have a row when their valueA and valueB is not zero so there are some dates not in the table.)
I tried this query:
select user_id, day,
sum(valueA) over(partition by user_id rows between 7 preceding and current row) as last_7_A,
sum(valueB) over(partition by user_id rows between 7 preceding and current row) as last_7_B
from table where day='2020-08-01'
But this query doesn't calculate the running sum and returns me the valueA and valueB on date 2020-08-01
I could just calculate on each day and select the date I want but that'll be really inefficient. Any ideas how to add the date constraint and let it just calculate on just one row's last 7 running sum for each user?
As per question:
sum of last 7 rows for each user for a particular date, this might work
select user_id, sum(valueA) "sum of valueA", sum(valueB) "sum of valueB"
from sample_table
where id in (
select id
from sample_table
where day='2020-08-08'
order by id desc limit 7)
group by user_id;

how to obtain a column from a table based on columns from another table kdb

I have queried 3 columns from a table as follows:
lst: distinct select b_market_order_no,instrumentID,mkt from tb where event=`OvernightOrder
based on these I want to query another table and get a dates column from it
select dates from tbp where
I am not quite sure how to apply the where clause or join clause here so values from lst get the corresponding dates column from tbp. Both tb and tbp tables have the same columns, they are created for different days from the same schema.
If I understand your use case correctly then you can use a table in your where clause as follows:
q)show tab1:([]a:1 2 3;b:4 5 6)
a b
---
1 4
2 5
3 6
q)show tab2:([]date:.z.d+1 2 3;a:2 3 4;b:5 6 7)
date a b
--------------
2020.04.29 2 5
2020.04.30 3 6
2020.05.01 4 7
q)select date from tab2 where([]a;b)in tab1
date
----------
2020.04.29
2020.04.30
Basically this builds up a table of the relevant columns from tab2 that are in tab1 and compares them.
If the schema of the table being joined is variable another approach may be required, such as this:
q)select date from tab2 where(cols[tab1]#tab2)in tab1
date
----------
2020.04.29
2020.04.30
Or even using lj and adding an additional Boolean column to mark valid rows in tab1 to select from tab2:
select date from(tab2 lj cols[tab1]xkey update c:1b from tab1)where c

Get distinct rows based on one column with T-SQL

I have a column in the following format:
Time Value
17:27 2
17:27 3
I want to get the distinct rows based on one column: Time. So my expected result would be one result. Either 17:27 3 or 17:27 3.
Distinct
T-SQL uses distinct on multiple columns instead of one. Distinct would return two rows since the combinations of Time and Value are unique (see below).
select distinct [Time], * from SAPQMDATA
would return
Time Value
17:27 2
17:27 3
instead of
Time Value
17:27 2
Group by
Also group by does not appear to work
select * from table group by [Time]
Will result in:
Column 'Value' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Questions
How can I select all unique 'Time' columns without taking into account other columns provided in a select query?
How can I remove duplicate entries?
This is where ROW_NUMBER will be your best friend. Using this as your sample data...
time value
-------------------- -----------
17:27 2
17:27 3
11:36 9
15:14 5
15:14 6
.. below are two solutions with that you can copy/paste/run.
DECLARE #youtable TABLE ([time] VARCHAR(20), [value] INT);
INSERT #youtable VALUES ('17:27',2),('17:27',3),('11:36',9),('15:14',5),('15:14',6);
-- The most elegant way solve this
SELECT TOP (1) WITH TIES t.[time], t.[value]
FROM #youtable AS t
ORDER BY ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL));
-- A more efficient way solve this
SELECT t.[time], t.[value]
FROM
(
SELECT t.[time], t.[value], ROW_NUMBER() OVER (PARTITION BY t.[time] ORDER BY (SELECT NULL)) AS RN
FROM #youtable AS t
) AS t
WHERE t.RN = 1;
Each returns:
time value
-------------------- -----------
11:36 9
15:14 5
17:27 2

SQL - how to sum groups of 15 rows and find the max sum

The purpose of this question is to optimize some SQL by using set-based operations vs iterative (looping, like I'm doing below):
Some Explanation -
I have this cte that is inserted to a temp table #dataForPeak. Each row represents a minute, and a respective value retrieved.
For every row, my code uses a while loop to add 15 rows at a time (the current row + the next 14 rows). These sums are inserted into another temp table #PeakDemandIntervals, which is my workaround for then finding the max sum of these groups of 15.
I've bolded my end goal above. My code achieves this but in about 12 seconds for 26k rows. I'll be looking at much more data, so I know this is not enough for my use case.
My question is,
can anyone help me find a fast alternative to this loop?
It can include more tables, CTEs, nested queries, whatever. The while loop might not even be the issue, it's probably the inner code.
insert into #dataForPeak
select timestamp, value
from cte
order by timestamp;
while ##ROWCOUNT<>0
begin
declare #timestamp datetime = (select top 1 timestamp from #dataForPeak);
insert into #PeakDemandIntervals
select #timestamp, sum(interval.value) as peak
from (select * from #dataForPeak base
where base.timestamp >= #timestamp
and base.timestamp < DATEADD(minute,14,#timestamp)
) interval;
delete from #dataForPeak where timestamp = #timestamp;
end
select max(peak)
from #PeakDemandIntervals;
Edit
Here's an example of my goal, using groups of 3min instead of 15min.
Given the data:
Time | Value
1:50 | 2
1:51 | 4
1:52 | 6
1:53 | 8
1:54 | 6
1:55 | 4
1:56 | 2
the max sum (peak) I'm looking for is 20, because the group
1:52 | 6
1:53 | 8
1:54 | 6
has the highest sum.
Let me know if I need to clarify more than that.
Based on the example given it seems like you are trying to get the maximum value of a rolling sum. You can calculate the 15-minute rolling sum very easily as follow:
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
Note the key here is the ROWS 14 PRECEDING statement. It effectively state that SQL Server should sum the preceding 14 records with the current record which will give you your 15 minute interval.
Now you can simply max the result of the rolling sum. The full query will look as follow:
;WITH CTE_RollingSum
AS
(
SELECT [Time]
,[Value]
,SUM([Value]) OVER (ORDER BY [Time] ASC ROWS 14 PRECEDING) [RollingSum]
FROM #dataForPeak
)
SELECT MAX([RollingSum]) AS Peak
FROM CTE_RollingSum