I have a feeling there may not be an easy answer to this question.
Lets assume this is my decision table, which operates on an object instance called "input".
CONDITION CONDITION ACTION
a == $param b != $param input.setC($param)
1 5 11
1 6 11
My case is that if a is not 1, and b is not in (5,6) then set c to 11.
However, if b is 6, the first rule will still fire since b is not 5, thus setting c to 11.
I would like to keep the organization of the columns without having to put multiple values in a column.
QUESTION: Is there some sort of header I can use which basically turns the decision table into a single rule, where b will not be in any of the rows where a is 1? Or some alternative method?
I am tempted to go with the negation of the rule:
CONDITION CONDITION ACTION
a == $param b == $param input.setC($param)
1 1 11
1 2 11
1 3 11
1 4 11
1 7 11
1 8 11
There are way more in this table and this makes it more difficult to maintain.
If you are using XLS decision table, then a similar to this one should work.
If you are familiar with drools or jbpm workbench I can provide you also solution based on Guided Decision Tables.
Hope this helps and let me know.
Related
In SQL, JOIN and INNER JOIN mean the same thing. In Matlab, they are different commands. Just from perusing the documentation thus far, they appear on the surface to fufill the same general function, with possible differences in the details, as controlled by parameters. I am slogging through the individual examples and may (or may not) find the fundamental difference. However, I feel that the difference should not be a subtlety that users have to ferrut out of the examples. These are two separate commands, and the documentation should make it clear up front why they are both needed. Would anyone be able to chime in about the key difference? Perhaps it could become a request to place it front and centre in the documentation.
I've empirically characterized the difference between JOIN and INNERJOIN (some would refer to this as reverse engineering). I'll summarize from the perspective of one who is comfortable with SQL. As I am new to SQL-like operations in Matlab, I've only been able to test drive it to a limited degree, but the INNERJOIN appears to join records in the same manner as SQL. Since SQL is a pretty open language, the behavioural specification of INNERJOIN is readily available, and I won't dwell on that. It's Matlab's JOIN that I need to suss out.
In short, from my testing, Matlab's JOIN seems to "join" the rows in the two operand table in a manner more like Excel's VLOOKUP rather than any of the JOINS in SQL. In general, the main differences with SQL joins seem to be (i) that the right hand table cannot have repeating values in the columns used matching up rows between the two tables and (ii) all combinations of values in the key columns of the left hand table must show up in the right hand table.
Here is the empirical testing. First, prepare the test tables:
a=array2table([
1 2
3 4
5 4
],'VariableNames',{'col1','col2'})
b=array2table([
4 7
4 8
6 9
],'VariableNames',{'col2','col3'})
c=array2table([
2 10
4 8
6 9
],'VariableNames',{'col2','col3'})
d=array2table([
2 10
4 8
6 9
6 11
],'VariableNames',{'col2','col3'})
a2=array2table([
1 2
3 4
5 4
20 99
],'VariableNames',{'col1','col2'})
Here are the tests:
>> join(a,b)
Error using table/join (line 111)
The key variable for B must have unique values.
>> join(a,c)
ans = col1 col2 col3
____ ____ ____
1 2 10
3 4 8
5 4 8
>> join(a,d)
Error using table/join (line 111)
The key variable for B must have unique values.
>> join(a2,c)
Error using table/join (line 130)
The key variable for B must contain all values in the key
variable for A.
The first thing to notice is that JOIN is not a symmetric operation with respect to the two tables.
It seems that the 2nd table argument is used as a lookup table. Unlike SQL joins, Matlab throws an error if it can't find a match in the 2nd table [See join(a2,d)]. This is somewhat hinted at in the documentation, though not entirely clearly. For example, it says that the key values must be common to both tables, but join(a,c) clearly shows that the tables do not have to have common key values. On the contrary, just as one would expect of a lookup table, 2nd table contains entries that aren't matched do not throw errors.
Another difference with SQL joins is that records that cause the key values to replicate in 2nd table are not allowed in Matlab's join. [See join(a,b) & join(a,d)]. In contrast, the fields used for matching records between tables aren't even referred to as keys in SQL, and hence can have non-unique values in either of the two tables. The disallowance of repeated key values in the 2nd table is consistent with the view of the 2nd table as a lookup table. On the other hand, repetition on of key values are permitted in the 1st table.
How would I represent the below tree structure with the values in each node in kdb?
a : 4
b : 3
c : 1
d : 2
e : 7
f : 5
g : 2
I would need to setup a function to sum the values at the nodes too.
Any tips appreciated.
There are different approaches you can try.
For ex: TreeTable: table having parent and child column.
A treetable is a table with additional properties.
Firstly, the records of the table are related hierarchically. Thus, a record may have one or more child-records, which may in turn have children. If a record has a parent, it has exactly one. A record without a parent is called a root record. A record without any children is called a leaf record. A record with children is called a node record.
Following paper explains TreeTable. http://archive.vector.org.uk/art10500340
A nested-dictionary approach would be:
q)dict.a.b.c:1
q)dict.a.b.d:2
q)dict.a.b[`]:3
q)dict.a[`]:4
q)dict.a.e.f.g:2
q)dict.a.e.f[`]:5
q)dict.a.e[`]:7
/note that it is important that each node is defined starting at the deepest nodes and working backwards to the top level nodes.
/to see the hierarchy use dict.a or dict.a.e etc, or more dynamically use
q)#/[dict;`a]
| 4
b| ``c`d!3 1 2
e| ``f!(7;``g!5 2)
/to get the values at individual nodes
q)first #/[dict;`a`e`f]
5
q)first #/[dict;`a`e`f`g]
2
/to find all values under a node
q){raze $[99h=type x;.z.s each x;x]}dict.a
4 3 1 2 7 5 2
q){raze $[99h=type x;.z.s each x;x]}#/[dict;`a`e]
7 5 2
/to sum all values under a node
q)sum {raze $[99h=type x;.z.s each x;x]}dict.a
24
q)sum {raze $[99h=type x;.z.s each x;x]}#/[dict;`a`e]
14
Obviously these can all be wrapped up into nice neat functions if necessary.
IMO it depends on what how you plan to "query" the tree, and what you want out of it - can you elaborate further? Do you want sub-trees? E.g. if you query for "a" will it give you "4" OR the whole sub-tree (in this case it's the whole tree).
If memory isn't an issue, or your tree is small, you could could have either a nested dictionary of dictionaries, with a function that recurses inside the dictionaries, or you could have the letter symbols as the key of a single dictionary:
q)d:(enlist enlist `a)!enlist 4
q)d,:(enlist `a`b)!enlist 3
q)d,:(enlist `a`b`c)!enlist 1
q)d
,`a | 4
`a`b | 3
`a`b`c| 1
q)d`a
4
q)d`a`b
3
q)d`a`b`c
1
I am having difficulty with a conditional format of a form. I have three conditions and two work.
Conditions 2 & 3 work fine. Condition 1 is another story. I have added condition one after I knew 2 & 3 were working. I basically added a 3rd condition then swapped 1 for 3 out of fear that after condition one was met the conditions stopped. Now condition 1 is the primary. What have I done wrong for condition 1 not to show?
I am looking at this problem from a TSQL point of view, however any advice would be appreciated.
Scenario
I have 2 sets of criteria which identify items in a warehouse to be selected.
Query 1 returns 100 items
Query 2 returns 100 items
I need to pick any 25 of the 100 items returned in query 1.
I need to pick any 25 of the 100 items returned in query 2.
- The items in query 1/2 will not be the same, ever.
Each item is stored in a segment of the warehouse.
A segment of the warehouse may contain numerous items.
I wish to select the 50 items (25 from each query) in a way as to reduce the number of segments I must visit to select the items.
Suggested Approach
My initial idea has been to combined the 2 result sets and produce a list of
Segment ID, NumberOfItemsRequiredInSegment
I would then select 25 items from each query, giving preference to those in a segments with the most NumberOfItemsRequiredInSegment. I know this would not be optimal but would be an easy to implement heuristic.
Questions
1) I suspect this is a standard combinational problem, but I don't recognise it.. perhaps multiple knapsack, does anyone recognise it?
2) Is there a better (easy-ish to impliment) heuristic or solution - ideally in TSQL?
Many thanks.
This might also not be optimal but i think would at least perform fairly well.
Calculate this set for query 1.
Segment ID, NumberOfItemsRequiredInSegment
take the top 25, Just by sorting by NumberOfItemsRequiredInSegment. call this subset A.
take the top 25 from query 2, by joining to A and sorting by "case when A.segmentID is not null then 1 else 0, NumberOfItemsRequiredInSegmentFromQuery2".
repeat this but take the top 25 from query 2 first. return the better performing of the 2 sets.
The one scenario where i think this fails would be if you got something like this.
Segment Count Query 1 Count Query 2
A 10 1
B 5 1
C 5 1
D 5 4
E 5 4
F 4 4
G 4 5
H 1 5
J 1 5
K 1 10
You need to make sure you choose A, D, E, from when choosing the best segments from query 1. To deal with this you'd almost still need to join to query two, so you can get the count from there to use as a tie breaker.
Business problem - understand process fallout using analytics data.
Here is what we have done so far:
Build a dictionary table with every possible process step
Find each process "start"
Find the last step for each start
Join dictionary table to last step to find path to final step
In the final report output we end up with a list of paths for each start to each final step:
User Fallout Step HierarchyID.ToString()
A 1/1/1
B 1/1/1/1/1
C 1/1/1/1
D 1/1/1
E 1/1
What this means is that five users (A-E) started the process. Assume only User B finished, the other four did not. Since this is a simple example (without branching) we want the output to look as follows:
Step Unique Users
1 5
2 5
3 4
4 2
5 1
The easiest solution I could think of is to take each hierarchyID.ToString(), parse that out into a set of subpaths, JOIN back to the dictionary table, and output using GROUP BY.
Given the volume of data, I'd like to use the built-in HierarchyID functions, e.g. IsAncestorOf.
Any ideas or thoughts how I could write this? Maybe a recursive CTE?
Restructuring the data may help with this. For example, structuring the data like this:
User Step Process#
---- ---- --------
A 1 1
A 2 1
A 3 1
B 1 2
B 2 2
B 3 2
B 4 2
B 5 2
E 1 3
E 2 3
E 1 4
E 2 4
E 3 4
Allows you to run the following query:
select step,
count(distinct process#) as process_iterations,
count(distinct user) as unique_users
from stepdata
group by step
order by step;
which returns:
Step Process_Iterations Unique_Users
---- ------------------ ------------
1 4 3
2 4 3
3 3 3
4 1 1
5 1 1
I'm not familiar with hierarchyid, but splitting out that data into chunks for analysis looks like the sort of problem numbers tables are very good for. Join a numbers table against the individual substrings in the fallout and it shouldn't be too hard to treat the whole thing as a table and analyse it on the fly, without any non-set operations.