sum by groups in KDB - kdb

I have a PnL table with 3 columns. date, region, product.
I'm trying to group all PnL rows by region and product. One way that i've tried is to sum by region and product as following
select PnL : sum(PnL) by region, product from table where date within (d1;d2)
The issue I have is unexpected results. For a given date range (d1;d2) I'm getting the results I'm expecting. However for date range (d1;d2+1) I'm getting 0 everywhere.
I checked the data availability on the d2+1 and data is already available on that day.
Please note that the server is stateless and it is not possible to use intermediate results in variables.
What is the best way to achieve a grouping sum in KDB?

Related

Tableau measure count items if between dates

What I am trying to achieve is to get a count of people employed in a particular period.
I have 3 variables:
Employee ID (integer)
Hire date (date)
Termination date (date or null)
Example
the formula I am looking for is something like
if termination_date is null
then
count employee_ID in
dates between Hire_date and max of either hire_date or termination_date
else
count employee ID in
dates between hire_date and termination_date
This aims to show the dynamic of staff level over the time.
I am new to Tableau, not sure how to even start with it. Any suggestions welcome.
This problem will be simpler if you reshape your data to have the following three columns
Employee ID
Date
Action. (where action takes on the values of ‘Hire’ or ‘Terminate’).
Each data row represents one change in status for an employees. If an employee had a termination date, they will have two records in this new format, otherwise just one record showing the hiring date.
You can reshape your data by hand, or leave the original and use Tableau Prep or the Tableau data source page to reshape using a self Union and a few simple calculated fields.
Define a calculated field called Staffing_Change as
if Action=‘Hire’ then 1 else -1 end
Now you can plot the change in staff level over time by putting exact date on columns and sum(Staffing_Change) on Rows. You can use a quick Table calc, Running Sum, to see the net staffing level. For line mark types, I’d use a step style by pressing on the path button on the Marks card. Otherwise, the chart can give the impression of fractional number of employees.

Tableau Count Distinct when graphed shows chronological last date, when deduplicated, not first

I'm doing a break fix on a Tableau report visualization that shows the outcomes of clients by client id for a given year by showing a running sum of distinct count of client id or RUNNING_SUM(COUNTD([ID])). The X axis of the visualization is the initial date of contact with the client. Occasionally, due to errors in the data or weird behavior, there are clients that have two initial dates, listed as two separate data rows where the column Initial Date will have different values but they will share an ID.
Currently, the visualization shows such people with their chronological last Initial Date and I need it to dedup such that the visualization shows them as starting from the chronological first Initial Date.
I could create a calculated field for if there are two IDs with multiple non identical Initial Dates then use the first, but I'm not sure how to create a calculated field that can groupby or otherwise check multiple dates per ID.
In Python/psuedo code, it would be something like
For ID in IDS:
if len(groupby.IDS.ID)>1:
then Initial_Date = min(InitialDate)
But I have to do the transformation in Tableau
Keep everything the same, but create a calculated field named "Initial Contact Date" with the calculation:
{FIXED [ID]: MIN(InitialDate)}
Then replace the date field on the X axis (Columns) with this date field instead.
That LOD Expression loops through all rows given the ID, and returns only the min one.

Calculate a running total that works with relative date filters

I have a Union table of my various bank accounts to create a personal finance analysis dashboard.
I am trying to make a Running Total to show my total capital available at any given date. Using a Running Total table calculation works, just as much as using a RUNNING_SUM() calculated field. They both work up until I filter the dates. So I am trying to find a way to make the running calculation work without being thrown off by Date Filters (I would like to implement relative dates for visualisation in the dashboard).
My union table has the following relevant data columns:
Order ID: Descending number from 1 for each entry per account.
Date: Date of entry.
Item: Entry name.
Account: Name of bank account.
Amount: +ive for credit or -ive for debit.
Balance: balance after entry value for each given account.
So the table can look like this:
So on 07/05/2019 the Running total should be 229.64.
The running sum formula mentioned above is currently RUNNING_SUM(SUM([Amount])), so if any dates are excluded via filter the running total doesn't add up to the right amount.
A way I can see around the problem could be to get the sum over all accounts of the last balance reading at a given date. The balance is a running total but only if the final entry per time period for all accounts are summed would it work. Would it be possible to make a calculated field that gets the last balance reading for each account at any given date and then sums them?
Or is there a simpler smarter way I am not aware of?
This comes down to an Order of Operations problem. Once you filter the dates the viz doesn't have access to the data anymore.
Your best approach would be to add the running sum to the data source before you bring it into Tableau. Then the running sum isn't a calculated field dependent on the data in the Viz.

kdb/q: use function in a select from partitioned table

I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.

MS Access 03 Query Criteras

If I have a report that tracks data for several accounts for each month with rows labeled:
UNITS,
REVENUE,
AVG REV/UNIT
How would I create a query that will filter the report to just show accounts where the UNITS row has increase/decreased 25% and the AVG REV/UNIT has increased/decreased 10%, from the previous month to the current month.
An example would be for the month of June I have the numbers....
JUN
UNITS 3,271
Revenue $3,598.10
Avg R/U $1.08
So when I run the report at the end of July I only want accounts that have a 25% difference in UNITS and/or a 10% difference in AVG REV/UNIT to show on a report.
qryPharmacy
SELECT PHAR_REPORT.*, (IIf(u1 Is Null,0,u1)+IIf(u2 Is Null,0,u2)+IIf(u3 Is Null,0,u3)+IIf(u4 Is Null,0,u4)+IIf(u5 Is Null,0,u5)+IIf(u6 Is Null,0,u6)+IIf(u7 Is Null,0,u7)+IIf(u8 Is Null,0,u8)+IIf(u9 Is Null,0,u9)+IIf(u10 Is Null,0,u10)+IIf(u11 Is Null,0,u11)+IIf(u12 Is Null,0,u12)) AS USUM, (IIf(r1 Is Null,0,r1)+IIf(r2 Is Null,0,r2)+IIf(r3 Is Null,0,r3)+IIf(r4 Is Null,0,r4)+IIf(r5 Is Null,0,r5)+IIf(r6 Is Null,0,r6)+IIf(r7 Is Null,0,r7)+IIf(r8 Is Null,0,r8)+IIf(r9 Is Null,0,r9)+IIf(r10 Is Null,0,r10)+IIf(r11 Is Null,0,r11)+IIf(r12 Is Null,0,r12)) AS RSUM, RMonth.*, PG2.*, PG.pGroup
FROM PHAR_REPORT, RMonth, PG2, PG
WHERE (((PHAR_REPORT.PR) Like ([PCODE] & '*')) And ((PG.pID)=PG2.PID))
ORDER BY PG2.pID, PHAR_REPORT.PR;
You should do it with more than one query. In the first query, select the data for the first month. On a second, to the desired month to compare. Create a third query that links the two first (be care about the correct relationship). Do the grouping/calculations in these queries.
In the 3rd query, create two fields that calculates increasing/decreasing for units and rev/unit. Now, you can add a criterium on each parameter field in the query columns.
The chalenge here is to be sure about hou would you work with the primary keys on months. Eg: if a A row in the first query isn't in the second (for not having an event on second month, for example), it will not be showed. In this case, the solution would be to create the queryes linking a table or query wich has the entyre set of registers, forcing it to show all the desired records despite they have or not occurrences.

Categories