Crystal Reports Cross-tab with mix of Sum, Percentages and Computed values - crystal-reports

Being new to crystal, I am unable to figure out how to compute rows 3 and 4 below.
Rows 1 and 2 are simple percentages of the sum of the data.
Row 3 is a computed value (see below.)
Row 4 is a sum of the data points (NOT a percentage as in row 1 and row 2)
Can someone give me some pointers on how to generate the display as below.
My data:
2010/01/01 A 10
2010/01/01 B 20
2010/01/01 C 30
2010/02/01 A 40
2010/02/01 B 50
2010/02/01 C 60
2010/03/01 A 70
2010/03/01 B 80
2010/03/01 C 90
I want to display
2010/01/01 2010/02/01 2010/03/01
========== ========== ==========
[ B/(A + B + C) ] 20/60 50/150 80/240 <=== percentage of sum
[ C/(A + B + C) ] 30/60 60/150 90/240 <=== percentage of sum
[ 1 - A/(A + B + C) ] 1 - 10/60 1 - 40/150 1 - 70/240 <=== computed
[ (A + B + C) ] 60 150 250 <=== sum

Assuming you are using a SQL data source, I suggest deriving each of the output rows' values (ie. [B/(A + B + C)], [C/(A + B + C)], [1 - A/(A + B + C)] and [(A + B + C)]) per date in the SQL query, then using Crystal's crosstab feature to pivot them into the output format desired.
Crystal's crosstabs aren't particularly suited to deriving different calculations on different rows of output.

Related

How do I find the amount of levels in an Octree if I have N nodes?

If the octree level is 0, then I have 8 nodes. Now, if the octree level is 1, then I have 72 nodes. But if I have (for example) 500 nodes, how do I calculate what the level would be?
To calculate the max number of nodes at level n you would calculate:
8**1 + 8**2 + 8**3 ... 8**n
So at level 2, that's 8 + 64
This can be generalized as:
((8 ** (h + 1)) - 1)/7 - 1
In javascript you might write:
function maxNodes(h){
return ((8 ** (h + 1)) - 1)/7 - 1
}
for (let i = 1; i < 4; i++){
console.log(maxNodes(i))
}
To find the inverse you will need to use Log base 8 and some algebra and you'll arrive at a formula:
floor (log(base-8)(7 * n + 1))
In some languages like python you can calculate math.floor(math.log(7*n + 1, 8)), but javascript doesn't have logs to arbitrary bases so you need to depend on the identity:
Log(base-b)(n) == Log(n)/Log(b)
and calculate something like:
function height(n){
return Math.floor(Math.log(7 * n + 1)/Math.log(8))
}
console.log(height(8))
console.log(height(72)) // last on level 2
console.log(height(73)) // first on level 3
console.log(height(584)) // last on level 3
console.log(height(585)) // first on level 4

Functions with pre millenium dates in q

I've built a function in q such that I can see how many Sunday's fall on the 1st of the month between two dates
\W 1 f3:{[sd;ed] count distinct `week$(sd + til 1 + ed - sd) where (`dd$distinct `week$sd + til 1 + ed - sd)=01}
How can I edit with to work with pre 2000 dates? Can I put a modulus around the negative dates? Or will that redender my function incorrect?
You can also try this:
q) f:{sum 1=mod[`date$a[1] + til 1+(-). a:(0;1<`dd$x)+`month$(y;x);7]}
q) f[2018.01.01;2018.12.31] / 2
q) f[1998.01.02;1999.12.31] / 4

q/KDB - nprev function to get all the previous n elements

I am struggling to write a nprev function in KDB; xprev function returns the nth element but I need all the prev n elements relative to the current element.
q)t:([] i:1+til 26; s:.Q.a)
q)update xp:xprev[3;]s,p:prev s from t
Any help is greatly appreciated.
You can achieve the desired result by applying prev repeatedly and flipping the result
q)n:3
q)select flip 1_prev\[n;s] from t
s
-----
" "
"a "
"ba "
"cba"
"dcb"
"edc"
..
If n is much smaller than the rows count, this will be faster than some of the more straightforward solutions.
The xprev function basically looks like this :
xprev1:{y til[count y]-x} //readable xprev
We can tweak it to get all n elements
nprev:{y til[count y]-\:1+til x}
using nprev in the query
q)update np: nprev[3;s] , xp1:xprev1[3;s] , xp: xprev[3;s], p:prev[s] from t
i s np xp1 xp p
-------------------
1 a " "
2 b "a " a
3 c "ba " b
4 d "cba" a a c
5 e "dcb" b b d
6 f "edc" c c e
k equivalent of nprev
k)nprev:{$[0h>#y;'`rank;y(!#y)-\:1+!x]}
and similarly nnext would look like
k)nnext:{$[0h>#y;'`rank;y(!#y)+\:1+!x]}

q - apply function on table rowwise

Given a table and a function
t:([] c1:1 2 3; c2:`a`b`c; c3:13:00 13:01 13:02)
f:{[int;sym;date]
symf:{$[x=`a;1;x=`b;2;3]};
datef:{$[x=13:00;1;x=13:01;2;3]};
r:int + symf[sym] + datef[date];
r
};
I noticed that when applying the function f onto columns of t, then the entire columns are passed into f and if they can be operated on atomically then the output will be of the same length as the inputs and a new column is produced. However in our example this wont work:
update newcol:f[c1;c2;c3] from t / 'type error
because the inner functions symf and datef cannot be applied to the entire column c2, c3, respectively.
If I dont want to change the function f at all, how can I apply it row by row and collect the values into a new column in t.
What's the most q style way to do this?
EDIT
If not changing f is really inconvenient one could workaround like so
f:{[arglist]
int:arglist 0;
sym:arglist 1;
date:arglist 2;
symf:{$[x=`a;1;x=`b;2;3]};
datef:{$[x=13:00;1;x=13:01;2;3]};
r:int + symf[sym] + datef[date];
r
};
f each (t`c1),'(t`c2),'(t`c3)
Still I would be interested how to get the same result when working with the original version of f
Thanks!
You can use each-both for this e.g.
q)update newcol:f'[c1;c2;c3] from t
c1 c2 c3 newcol
------------------
1 a 13:00 3
2 b 13:01 6
3 c 13:02 9
However you will likely get better performance by modifying f to be "vectorised" e.g.
q)f2
{[int;sym;date]
symf:3^(`a`b!1 2)sym;
datef:3^(13:00 13:01!1 2)date;
r:int + symf + datef;
r
}
q)update newcol:f2[c1;c2;c3] from t
c1 c2 c3 newcol
------------------
1 a 13:00 3
2 b 13:01 6
3 c 13:02 9
q)\ts:1000 update newcol:f2[c1;c2;c3] from t
4 1664
q)\ts:1000 update newcol:f'[c1;c2;c3] from t
8 1680
In general in KDB, if you can avoid using any form of each and stick to vector operations, you'll get much more efficiency

How to use GroupByKey in Spark to calculate nonlinear-groupBy task

I have a table looks like
Time ID Value1 Value2
1 a 1 4
2 a 2 3
3 a 5 9
1 b 6 2
2 b 4 2
3 b 9 1
4 b 2 5
1 c 4 7
2 c 2 0
Here is the tasks and requirements:
I want to set the column ID as the key, not the column Time, but I don't want to delete the column Time. Is there a way in Spark to set Primary Key?
The aggregation function is non-linear, which means you can not use "reduceByKey". All the data must be shuffled to one single node before calculation. For example, the aggregation function may looks like root N of the sum values, where N is the number of records (count) for each ID :
output = root(sum(value1), count(*)) + root(sum(value2), count(*))
To make it clear, for ID="a", the aggregated output value should be
output = root(1 + 2 + 5, 3) + root(4 + 3 + 9, 3)
the later 3 is because we have 3 record for a. For ID='b', it is:
output = root(6 + 4 + 9 + 2, 4) + root(2 + 2 + 1 + 5, 4)
The combination is non-linear. Therefore, in order to get correct results, all the data with the same "ID" must be in one executor.
I checked UDF or Aggregator in Spark 2.0. Based on my understanding, they all assume "linear combination"
Is there a way to handle such nonlinear combination calculation? Especially, taking the advantage of parallel computing with Spark?
Function you use doesn't require any special treatment. You can use plain SQL with join
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions.{count, lit, sum, pow}
def root(l: Column, r: Column) = pow(l, lit(1) / r)
val out = root(sum($"value1"), count("*")) + root(sum($"value2"), count("*"))
df.groupBy("id").agg(out.alias("outcome")).join(df, Seq("id"))
or window functions:
import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("id")
val outw = root(sum($"value1").over(w), count("*").over(w)) +
root(sum($"value2").over(w), count("*").over(w))
df.withColumn("outcome", outw)