Drools, accumulate sum and then max

Drools, accumulate sum and then max - drools

My java class Task has member groupId(String) and duration(long). The sum of task durations of each group, say group1 has task1, task2, task3, corresponding durations are d1, d2, d3, so the sum s1=d1+d2+d3; I want to get the largest s of all groups, say for group1 the sum is s1, group2 is s2, I want to get the $smax = max(s1,s2,...) and then do something on $smax in "then". How can I do this? There is no group class, a String named groupId is used to identify the same group. How to specify the "same-group-id"? Do I have to introduce group class? And how to get the $smax? Many thanks.
when
accumulate(Task("same-group-id", $d: duration);
$s: sum($d))
accumulate($smax: max($s))
then
// do something with $smax
Further question. Roddy's answer works perfectly in drools. But in my case it's used in Optaplanner, so Task must show up in "when" otherwise the rule will not be triggered. Is there any way to merge Roddy's two rules into one? The goal is still to get the largest group duration in "then" so that something can be done on it. I can make changes in java if it's needed.

You've got a bit of a wall of text going on here, but I'll try to parse out what you're trying to do.
You have Task objects in working memory with String groupId and long duration members
You want to group Tasks by the groupId and then sum up the durations for each group. --> basically get the total duration for each group.
Finally you want to find the longest of those summed durations. --> Effectively, you want to know which group has the longest duration.
If I'm avoiding using inline custom code in my accumulate, I'd probably do something like this:
declare GroupDuration {
id : String
duration : long
}
rule "Find Group Durations"
salience 1
when
Task( $id: groupId )
not( GroupDuration( id == $id ))
accumulate( Task( groupId == $id, $d: duration);
$duration: sum($d))
then
GroupDuration d = new GroupDuration();
d.setId($id);
d.setDuration($duration);
insert(d)
end
rule "Find Longest Group Duration"
when
GroupDuration( $id: id, $duration: duration )
not( GroupDuration( duration > $duration ))
then
// $id is the longest
end
I start off by declaring a new type GroupDuration to track the id and total duration of the group. Then I populate working memory with those durations in the first rule, using accumulate to sum up all durations for tasks that have matching IDs. Finally, I find the instance with the largest duration and yank its id.

Related

Selecting max value grouped by specific column

Focused DB tables:
Task:
For given location ID and culture ID, get max(crop_yield.value) * culture_price.price (let's call this multiplication monetaryGain) grouped by year, so something like:
[
{
"year":2014,
"monetaryGain":...
},
{
"year":2015,
"monetaryGain":...
},
{
"year":2016,
"monetaryGain":...
},
...
]
Attempt:
SELECT cp.price * max(cy.value) AS monetaryGain, EXTRACT(YEAR FROM cy.date) AS year
FROM culture_price AS cp
JOIN culture AS c ON cp.id_culture = c.id
JOIN crop_yield AS cy ON cy.id_culture = c.id
WHERE c.id = :cultureId AND cy.id_location = :locationId AND cp.year = year
GROUP BY year
ORDER BY year
The problem:
"columns "cp.price", "cy.value" and "cy.date" must appear in the GROUP BY clause or be used in an aggregate function"
If I put these three columns in GROUP BY, I won't get expected result - It won't be grouped just by year obviously.
Does anyone have an idea on how to fix/write this query better in order to get task result?
Thanks in advance!

The fix
Rewrite monetaryGain to be:
max(cp.price * cy.value) AS monetaryGain
That way you will not be required to group by cp.price because it is not outputted as an group member, but used in aggregate.
Why?
When you write GROUP BY query you can output only columns that are in GROUP BY list and aggregate function values. Well this is expected - you expect single row per group, but you may have several distinct values for the field that is not in grouping column list.
For the same reason you can not use a non grouping column(-s) in arithmetic or any other (not aggregate) function because this would lead in several results for in single row - there would not be a way to display.
This is VERY loose explanation but I hope will help to grasp the concept.
Aliases in GROUP BY
Also you should not use aliases in GROUP BY. Use:
GROUP BY EXTRACT(YEAR FROM cy.date)
Using alias in GROUP BY is not allowed. This link might explain why: https://www.postgresql.org/message-id/7608.1259177709%40sss.pgh.pa.us

Optimize KDB query time to get rolling average price from each contributor

Each time a contributor gives an updated price I want to use this quote along with the latest prices of other quotes to calculate the total average at that moment.
t:`time xasc flip (`userID`time`price)!(`quote1`quote2`quote3`quote3`quote3`quote3`quote4`quote2`quote4`quote3`quote2`quote3`quote1`quote3`quote4`quote1`quote4`quote2`quote2`quote4;(21:11:37 03:13:29 15:35:39 09:59:13 04:34:15 13:09:01 21:21:55 16:54:39 04:03:04 18:22:39 17:05:44 05:08:40 07:35:50 15:46:15 17:32:29 19:42:47 03:28:48 04:20:03 14:16:55 09:02:12);86.4 84.4 54.26 7.76 63.75 97.61 53.97 71.63 38.86 52.23 87.25 65.69 96.25 37.15 17.45 58.97 95.51 61.59 70.25 35.5)
Desired output below
delete userIDPriceList,userIDComps from t,'raze {[idx;tab] select avgPrice:avg price, userIDPriceList:price,userIDComps:userID from select last price by userID from t where i <= idx}[;t] each til count t
userIDPriceList,userIDComps columns are not required in final output
Performance is slow and looking for better way to calculate.
q) \t do[200000;delete userIDPriceList,userIdComps from t,'raze {[idx;tab] select avgPrice:avg price, userIDPriceList:price,userIDComps:userID from select last price by userID from t where i <= idx}[;t] each til count t]
10152j
Thanks in advance

Based on your clarified requirements, another approach is to accumulate using scan:
update avgPrice:avg each{x,(1#y)!1#z}\[();userID;price] from t
Igors solution is faster if the data is static (aka you can prep the table with the attribute once).

Below code gives average of all previous prices for given userID including current row:
ungroup 0!select time, price, avgPrice: avgs price by userID from t
Just ensure that t is appropriately sorted by time before getting averages.

According to your comment to one of the answers, you're "trying to take the average prices of each userID as of the time of the record while ignoring any future records."
This query will do exactly that:
select userID,time,price,avgPrice:(avgs;price)fby userID from t
A query of yours (delete userIDPriceList ...) results in something different as #Anton Dovzhenko pointed out in his comment to your original question.
Update
After reading your comment I think I understood your requirement. Probably you could do this.
prices:exec `s#time!price by userID from t;
update avgPrice:avg each flip prices[;time] from t

Indicate more than one record with matching fields

How can I indicate multiple records with the same Invoice number, but a different Sales Person ID? Our commissions can be split into multiple Salespeople, so there can be two different Salespeople per an invoice.
For example:
Grouped by: Sales Person ID (No Changing this option)
These records are in the Group Footer.
Sales Person ID: Invoice: Invoice Amt: Commissions: (Indicator)
4433 R100 20,000 3,025 * More than one record on the same invoice with a different sales person
4450 R096 1,987 320
4599 R100 20,000 3,025 * More than one record on the same invoice with a different sales person
4615 R148 560 75
4777 R122 2,574 356

If your report has less than 1000 invoices, you may try something like this.
This will return true when a second ocurrence of the invoice shows up. Then you can make something like set the row background do red.
Global NumberVar Array invoices;
numbervar nextIndex := count(invoices) + 1;
if nextIndex <= 1000 and not ({Result.InvoiceNumber} in invoices) then (
redim invoices [nextIndex];
invoices[nextIndex] := {Result.InvoiceNumber};
true;
)
else false;
If you want to detect the first occurrence, you will need something more sophisticated.

I think a SQL Expression Field would be a good way to achieve the result you want. You already have an InvoiceNo in each row of data. You just need a SQL Expression Field that uses that INvoiceNo to execute a query to count the number of salespersons who get a commission.
Something along the lines of:
(
Select Count(Sales_Person_Id)
From [Table]
Where [Table].InvoiceNo = InvoiceNo
)
This will return an integer value that represents the number of salespersons who are associated with one invoice. You can either drop the SQL Expression Field in your Indicator column, or write some other formula to do something special.

MDX: Define dimension sub set and show the total

Since in MDX you can specify the member [all] to add the aggregation between all the members of the dimension, if I want to show the totals of a certain measure I can build a query like
SELECT {
[MyGroup].[MyDimension].[MyDimension].members,
[MyGroup].[MyDimension].[all]
} *
[Measures].[Quantity] on 0
FROM [MyDatabase]
Now I want to filter MyDimension for a bunch of values and show the total of the selected values, but of course if I generate the query
SELECT {
[MyGroup].[MyDimension].[MyDimension].&[MyValue1],
[MyGroup].[MyDimension].[MyDimension].&[MyValue2],
[MyGroup].[MyDimension].[all]
} *
[Measures].[Quantity] on 0
FROM [MyDatabase]
it shows the Quantity for MyValue1, MyValue2 and the total of all MyDimension members, not just the ones I selected.
I investigated a bit and came up to a solution that include the generation of a sub query to filter my values
SELECT {
[MyGroup].[MyDimension].[MyDimension].members, [MyGroup].[MyDimension].[all]
} * [Measures].[Quantity] ON 0
FROM (
SELECT {
[MyGroup].[MyDimension].[MyDimension].&[MyValue1],
[MyGroup].[MyDimension].[MyDimension].&[MyValue2]
} ON 0
FROM [MyDatabase]
)
Assuming this works, is there a simplier or more straight forward approach to achieve this?
I tried to use the SET statement to define my custom tuple sets but then I couldn't manage to show the total.
Keep in mind that in my example I kept things as easy as possible, but in real cases I could have multiple dimension on both rows and columns as well as multiple calculated measures defined with MEMBER statement.
Thanks!

What you have done is standard - it is the simple way!
One thing to bear in mind when using a sub-select is that it is not a full filter, in that the original All is still available. I think this is in connection with the query processing of the clauses in mdx - here is an example of what I mean:
WITH
MEMBER [Product].[Product Categories].[All].[All of the Products] AS
[Product].[Product Categories].[All]
SELECT
[Measures].[Internet Sales Amount] ON 0
,NON EMPTY
{
[Product].[Product Categories].[All] //X
,[Product].[Product Categories].[All].[All of the Products] //Y
,[Product].[Product Categories].[Category].MEMBERS
} ON 1
FROM
(
SELECT
{
[Product].[Product Categories].[Category].&[4]
,[Product].[Product Categories].[Category].&[1]
} ON 0
FROM [Adventure Works]
);
So line marked X will be the sum of categories 4 and 1 but line Y will sill refer to the whole of Adventure Works:
This behavior is useful although a little confusing when using All members in the WITH clause.

How can I order a group, based on a summary field of a subgroup

I have a report which has essentially
Order
OrderDetail 1
OrderDetail ..
OrderDetail n
These details can have parts and/or labour costs associated with them.
Currently, I group based on OrderId and then have the OrderDetail information in the details section of the report. This works perfectly.
However, now I need to group the Orders based on two criteria OrderType and LabourCost of the entire Order. I have put together a quick formula to determine order.
if(Sum({order.Labour}, {order.OrderId})> 0) then
if({order.type} = "type1") then 1 else 2
else
if({order.type} = "type1") then 3 else 4
Basically, if it should be sorted based on labour then on type. (the Sum({order.Labour}, {order.OrderId}) sums the labour grouping based on the orderid)
However when I go to the Group Expert and add the group by field to my formula and then preview my report it spins (I cancelled the preview after a minute). If I remove the Sum portion of the formula then it takes less than a second.
Is there a way to order this report?

How I would approach it:
First, create a sql-expresssion field that calculates the labor total for each order:
// {%TOTAL_LABOR}
(
SELECT Sum(Labour)
FROM OrderDetail
WHERE OrderId=Order.OrderId
)
Next, create a formula field:
// {#OrderGroup}
if({%TOTAL_LABOR}> 0) then
if({order.type} = "type1") then 1 else 2
else
if({order.type} = "type1") then 3 else 4
Finally, create a new group, based on the formula field, ensuring that it groups before the order group. You can suppress the group's header and footer if desired.