Changing a functional qSQL query to involve multiple columns in calculation KDB+/Q - kdb

I have a ? exec query like so:
t:([]Quantity: 1 2 3;Price 4 5 6;date:2020.01.01 2020.01.02 2020.01.03);
?[t;enlist(within;`date;(2020.01.01,2020.01.02));0b;(enlist `Quantity)!enlist (sum;(`Quantity))]
to get me the sum of the Quantity in the given date range. I want to adjust this to get me the sum of the Notional in the date range; Quantity*Price. So the result should be (1x4)+(2x5)=14.
I tried things like the following
?[t;enlist(within;`date;(2020.01.01,2020.01.02));0b;(enlist `Quantity)!enlist (sum;(`Price*`Quantity))]
but couldn't get it to work. Any advice would be greatly appreciated!

I would advise in such a scenario to think about the qSql style query that you are looking for and then work from there.
So in this case you are looking, I believe, to do something like:
select sum Quantity*Price from t where date within 2020.01.01 2020.01.02
You can then run parse on this to break it into its function form i.e the ? exec query you refer to.
q)parse"select sum Quantity*Price from t where date within 2020.01.01 2020.01.02"
?
`t
,,(within;`date;2020.01.01 2020.01.02)
0b
(,`Quantity)!,(sum;(*;`Quantity;`Price))
This is your functional form that you need; table, where clause, by and aggregation.
You can see your quantity here is just the sum of the multiplication of the two columns.
q)?[t;enlist(within;`date;(2020.01.01;2020.01.02));0b;enlist[`Quantity]!enlist(sum;(*;`Quantity;`Price))]
Quantity
--------
14
You could also extend this to change the column as necessary and create a function for it too, if you so wish:
q)calcNtnl:{[sd;ed] ?[t;enlist(within;`date;(sd;ed));0b;enlist[`Quantity]!enlist(sum;(*;`Quantity;`Price))]}
q)calcNtnl[2020.01.01;2020.01.02]
Quantity
--------
14

Related

sort data in hdb by using dbmain.q in kdb

I am trying to sort 1 or 2 columns in a hdb in kdb but failed. This is the code I have
fncol[dbdir;`trade;`sym;xasc];
and got a length error when I called it. But I don't have a length error if I use this code
fncol[dbdir;`trade;`sym;asc];.
However this only sorts the sym column itself. I want the data from other columns change according to sym column as well.
In addition, I would like to apply parted attribute to sym column. Also, I tried to sort this way
fncol[dbdir;`trade;`sym`ptime;xasc];. also failed
You should always be careful with dbmaint.q if you are unsure what it is going to do. I gather from the fact asc worked after xasc that you are using a test hdb each time.
fncol should be used with unary functions i.e. 1 argument. It's use case is for modifying individual columns. What you are trying to do is modifying the entire table as you want to sort the entire table relative to the sym column. Using .Q.dpft for each date is what you want as outlined by Cathal in your follow-up question. using .Q.dpft function to resave table
When you run this fncol[dbdir;`trade;`sym;xasc]; You are saving down a projection in place of the sym column in each date.
fncol[`:.;`trades;`sym;xasc];
select from trades where date = 2014.04.21
'length
[0] select from trades where date = 2014.04.21
q)get `:2014.04.21/trades/sym
k){$[$[#x;~`s=-2!(0!.Q.v y)x;0];.Q.ft[#[;*x;`s#]].Q.ord[<:;x]y;y]}[`p#`sym$`A..
// This is the k definition of xasc with the sym column as the first parameter.
q)xasc
k){$[$[#x;~`s=-2!(0!.Q.v y)x;0];.Q.ft[#[;*x;`s#]].Q.ord[<:;x]y;y]}
// Had you needed to fix your hdb, I managed to undo this using value and indexing to the sym col data.
fncol[`:.;`trades;`sym;{(value x)[1]}];
q)select from trades where date = 2014.04.21
date sym time src price size
------------------------------------------------------------
2014.04.21 AAPL 2014.04.21D08:00:12.155000000 N 25.31 2450
2014.04.21 AAPL 2014.04.21D08:00:42.186000000 N 25.32 289
2014.04.21 AAPL 2014.04.21D08:00:51.764000000 O 25.34 3167
asc will not break the hdb as it just takes 1 argument and saves down ONLY the sym column in ascending order not the table.
Is there any indication of what date is failing with a length error? It could be something wrong with one of the partitions.
Perhaps if you try to load one of the dates into memory and sort it manually IE
`sym xasc select from trade where date=last date
that might indicate if there's a specific partition causing issues.
FYI if you're intersted in applying the p# attribute you should try setattrcol in dbmaint.q. I think the data will need to be sorted first though.

How to dynamically pivot based on rows data and parameter value?

I am trying to pivot using crosstab function and unable to achieve for the requirement. Is there is a way to perform crosstab dynamically and also dynamic result set?
I have tried using crosstab built-in function and unable to meet my requirement.
select * from crosstab ('select item,cd, type, parts, part, cnt
from item
order by 1,2')
AS results (item text,cd text, SUM NUMERIC, AVG NUMERIC);
Sample Data:
ITEM CD TYPE PARTS PART CNT
Item 1 A AVG 4 1 10
Item 1 B AVG 4 2 20
Item 1 C AVG 4 3 30
Item 1 D AVG 4 4 40
Item 1 A SUM 4 1 10
Item 1 B SUM 4 2 20
Item 1 C SUM 4 3 30
Item 1 D SUM 4 4 40
Expected Results:
ITEM CD PARTS TYPE_1 CNT_1 TYPE_1 CNT_1 TYPE_2 CNT_2 TYPE_2 CNT_2 TYPE_3 CNT_3 TYPE_3 CNT_3 TYPE_4 CNT_4 TYPE_4 CNT_4
Item 1 A 4 AVG 10 SUM 10 AVG 20 SUM 20 AVG 30 SUM 30 AVG 40 SUM 40
The PARTS value is based on a parameter passed by the user. If the user passes 2 for example, there will be 4 rows in the result set (2 parts for AVG and 2 parts of SUM).
Can I achieve this requirement using CROSSTAB function or is there a custom SQL statement that need to be developed?
I'm not following your data, so I can't offer examples based on it. But I have been looking at pivot/cross-tab features over the past few days. I was just looking at dynamic cross tabs just before seeing your post. I'm hoping that your question gets some good answers, I'll start off with a bit of background.
You can use the crosstab extension for standard cross tabs, what when wrong when you tried it? Here's an example I wrote for myself the other day with a bunch of comments and aliases for clarity. The pivot is looking at item scans to see where the scans were "to", like the warehouse or the floor.
/* Basic cross-tab example for crosstab (text) format of pivot command.
Notice that the embedded query has to return three columns, see the aliases.
#1 is the row label, it shows up in the output.
#2 is the category, what determines how many columns there are. *You have to work this out in advance to declare them in the return.*
#3 is the cell data, what goes in the cross tabs. Note that this form of the crosstab command may return NULL, and coalesce does not work.
To get rid of the null count/sums/whatever, you need crosstab (text, text).
*/
select *
from crosstab ('select
specialty_name as row_label,
scanned_to as column_splitter,
count(num_inst)::numeric as cell_data
from scan_table
group by 1,2
order by 1,2')
as scan_pivot (
row_label citext,
"Assembly" numeric,
"Warehouse" numeric,
"Floor" numeric,
"QA" numeric);
As a manual alternative, you can use a series of FILTER statements. Here's an example that summaries errors_log records by day of the week. The "down" is the error name, the "across" (columns) are the days of the week.
select "error_name",
count(*) as "Overall",
count(*) filter (where extract(dow from "updated_dts") = 0) as "Sun",
count(*) filter (where extract(dow from "updated_dts") = 1) as "Mon",
count(*) filter (where extract(dow from "updated_dts") = 2) as "Tue",
count(*) filter (where extract(dow from "updated_dts") = 3) as "Wed",
count(*) filter (where extract(dow from "updated_dts") = 4) as "Thu",
count(*) filter (where extract(dow from "updated_dts") = 5) as "Fri",
count(*) filter (where extract(dow from "updated_dts") = 6) as "Sat"
from error_log
where "error_name" is not null
group by "error_name"
order by 1;
You can do the same thing with CASE, but FILTER is easier to write.
It looks like you want something basic, maybe the FILTER solution appeals? It's easier to read than calls to crosstab(), since that was giving you trouble.
FILTER may be slower than crosstab. Probably. (The crosstab extension is written in C, and I'm not sure how smart FILTER is about reading off indexes.) But I'm not sure as I haven't tested it out yet. (It's on my to do list, but I haven't had time yet.) I'd be super interested if anyone can offer results. We're on 11.4.
I wrote a client-side tool to build FILTER-based pivots over the past few days. You have to supply the down and across fields, an aggregate formula and the tool spits out the SQL. With support for coalesce for folks who don't want NULL, ROLLUP, TABLESAMPLE, view creation, and some other stuff. It was a fun project. Why go to that effort? (Apart from the fun part.) Because I haven't found a way to do dynamic pivots that I actually understand. I love this quote:
"Dynamic crosstab queries in Postgres has been asked many times on SO all involving advanced level functions/types. Consider building your needed query in application layer (Java, Python, PHP, etc.) and pass it in a Postgres connected query call. Recall SQL is a special-purpose, declarative type while app layers are general-purpose, imperative types." – Parfait
So, I wrote a tool to pre-calculate and declare the output columns. But I'm still curious about dynamic options in SQL. If that's of interest to you, have a look at these two items:
https://postgresql.verite.pro/blog/2018/06/19/crosstab-pivot.html
Flatten aggregated key/value pairs from a JSONB field?
Deep magic in both.

How to sum a calculated a field on top of another calculated field?

The current issue I have maybe a bit difficult to describe but I will do my best.
Currently, in my workbook, I am experiencing duplicates at the product level so I created a calculated field to work around that. I wanted to know which individual products have not been quoted the past year so the answer should be 1 for not quoted and 0 for quoted, I worked around that by doing an if statement with % products not quoted the formula looks as such:
IF [% Not Quoted] > 0
then 1
else 0
end
which I named Prod Not Quoted. That worked great for me,
however now I want do the count or sum(?) of products not quoted at the vendor level which would mean my products not quoted needs to be grouped by the vendor name. To be specific my objective is to provide a table of all products not quoted by vendor name without duplicates at the product level. What I tried to do is create a new calculated field using the previously which I calculated as the following:
IF [Prod Not Quoted] = 1
then sum(1)
else 0
end
The latter calculation, however, gave me the sum of all products quoted and not quoted along with the duplicates I am trying to avoid. Why would IF [prod Not Quoted] = 1 then sum(1) else 0 end not work?
Someone on tableau forum with the following suggestion
SUM (IF [Prod Not Quoted] = 1 THEN 1 ELSE 0 END)
however I got the error cannot sum something that is not already aggregated.
Is there any kind of work around to the issue I am having?
If you have duplicates at the dimension level you can use the FIXED calculation to select just a single row at the dimension level. For example:
{FIXED [Product]: MAX(quoted)}
What this is doing is saying "for each Product, what is the max quoted value?" So if you have duplicates, it'll return only one value
Now we are solving the duplicates problem we can wrap it in SUM:
SUM({FIXED [Product]: MAX(quoted)})
To answer your question about why the sum isn't working, it's because you're trying to do a sum of a sum. If you still wanted to take the approach that was offered to you then you could change it to:
IF [Prod Not Quoted] = 1
then {FIXED [Product]: MAX(quoted)}
end
(I called the aggregation field quoted, but it'll be the measure you're working with)

pentaho distinct count over date

I am currently working on Pentaho and I have the following problem:
I want to get a "rooling distinct count on a value, which ignores the "group by" performed by Business Analytics. For instance:
Date Field
2013-01-01 A
2013-02-05 B
2013-02-06 A
2013-02-07 A
2013-03-02 C
2013-04-03 B
When I use a classical "distinct count" aggregator in my schema, sum it, and then add "month" to column, I get:
Month Count Sum
2013-01 1 1
2013-02 2 3
2013-03 1 4
2013-04 1 5
What I would like to get would be:
Month Sum
2013-01 1
2013-02 2
2013-03 3
2013-04 3
which is the distinct count of all Fields so far. Does anyone has any idea on this topic?
my database is in Postgre, and I'm looking for any solution under PDI, PSW, PBA or PME.
Thank you!
A naive approach in PDI is the following:
Sort the rows by the Field column
Add a sequence for changing values in the Field column
Map all sequence values > 1 to zero
These first 3 effectively flag the first time a value was seen (no matter the date).
Sort the rows by year/month
Sum the mapped sequence values by year+month
Get a Cumulative Sum of all the previous sums
These 3 aggregate the distinct values per month, then keep a cumulative sum. In PDI this might look something like:
I posted a Gist of this transformation here.
A more efficient solution is to parallelize the two sorts, then join at the latest point possible. I posted this one as it is easier to explain, but it shouldn't be too difficult to take this transformation and make it more parallel.

Return 0 value if Details duplicated

I need your help in creating crystal report.
I have a formula in details section that computes working time.
How do I make the value return 0 if it is duplicated?
Here's the scenario
Name Time (Hours:Minutes)
John 1:20
........ 3:30
........ 3:30
Total Hours -> ?
My problem is I dont want to use the duplicated values (3:30) like shown above. I want a total hours for 4:50.
you have two options:
check the option in Database tab.. Select Distinct Records so that duplicate records will be eliminated.
If you don't want to use the first option then to calculate use Running Total so that you sum only those that are distinct...
Create running total something like Do sum only after change of time value
You can use the function "previous" to compare the current value with the previous value, but it works only with fields.
But i am not sure if i understood, you may be more precise about your question.
1) make a formula called "hours" or some other name
if not isnull(previous({Result.Time}) and {Result.Time} = previous({Result.Time}
then 0
else {Result.Time} /* you have to assure the same return type */
2) let the "total hours" be a sum of the formula "hours"
Note that it will work only if the rows are ordered by hours.
The result is the same of using a running total fields as purposed by Siva.