Filters and conditions in window functions in Apache Spark - scala

I have a sample dataset like below
Name date Category transactionamount
Adam 1/1/2020 Mobile 100
Adam 1/1/2020 Tab 200
Bob 1/1/2020 Mobile 200
Adam 2/1/2020 Tab 200
Bob 2/1/2020 Mobile 200
Adam 3/1/2020 Tab 200
Bob 4/1/2020 Mobile 200
I want to sum transactionamount column over a rolling period of current and previous day , so my code for window frame looks like below
val windowspec = Window.partitionBy($"name").orderBy($"date".asc)
val range = windowspec.rangeBetween(-1, 0)
val aasum2 = sum('transactionAmount).over(range)
df.select('date,'name,aasum2 as 'aasum).orderBy('date,'name).show(100,false)
This works fine for general summation without condition.
But I want output like below table with two new columns based on category column value.
Every output row should contain summation value for distinct date and name.
How we can apply condition(based on value of other column) while doing window function over a column
date Name Mobile_sum Tab_sum
1/1/2020 Adam 100 200
1/1/2020 Bob 200 0
2/1/2020 Adam 0 400
2/1/2020 Bob 400 0
3/1/2020 Adam 0 600
3/1/2020 Bob 0 0
4/1/2020 Adam 0 0
4/1/2020 Bob 200 0

Add your additional column to your partitionBy() within your WindowSpec

Related

How to roll numbers up in Tableau for aggregation?

I have a data structure issue. I have a problem where I need to roll up my data within tableau so that aggregated numbers do not skew in a certain manner.
Example of current data
ID Model_Number Value
123 fff 2
123 ggg 2
123 hhh 2
123 uuu 2
124 yyy 5
124 qqq 5
124 eee 5
Avg: NA 3.28
Ideal state of data and aggregation
ID Value
123 2
124 5
Avg 3.5
As you see since the data is at two different grains the aggregated number (avg) will be different. I would like to roll up my numbers to the distinct value of ID and then calculate my average which will result in a different (correct in my context) aggergated number.
Here is one calculated field that could help.
{ FIXED [ID] : AVG([Value]) }
This will give you the avg value by ID. You can then use a grand total(avg) to get the 3.5

Tableau: show number of occurances of each number on a fixed scale of 1 - 5

I have data with teacher and student scores (on a scale 1-5). I want to show how many teacher grades were 1, 2, 3, 4, 5 and same with the students grading themselves.
I have the data in 2 formats, in the same row, and pivoted on 2 seperate rows. ie
Same row like this
student name | Teacher Score | Student Score
----------------------------------------------------
Jack 4 3
Jim 2 3
Tom 5 4
Pivoted like this
student name | Pivot Field | Score
-----------------------------------------------------
Jack student score 3
Jack teacher score 4
Jim student score 2
Jim teacher score 3
Tom student score 4
Tom teacher score 5
I'm just looking to do this in tableau, count how many of each grading scores were casted by teachers, and students, like this
ALL 5 SCORES | TOTAL COUNT TEACHER | TOTAL COUNT STUDENT
1 0 0 (zero 1's were given out)
2 1 0
3 0 2 (ie both Jack and Jim gave themselves a 3)
4 1 1
5 1 0
Does that make sense? Please let me know if I need to further explain. I feel like I should know this but I'm stumped and I'm having trouble putting the right words together to search for it.. thank you in advance!
Try creating a bin field of your scores. Right click on you score field and select create then select bins. In the size of bins box change it to 1. Then add the new field to your columns and number of records to rows.

Difference between rows in KDB/Q

I'm new to KDB/Q and have a question around getting the difference between two (not necessarily adjacent) rows.
I have only one table, which looks like the below:
q)tickers:`ibm`bac`dis`gs`ibm`gs`dis`bac
q)pxs:100 50 30 250 110 240 45 48
q)dates:2013.05.01 2013.01.05 2013.02.03 2013.02.11 2013.06.17 2013.06.21 2013.04.24 2013.01.06
q)trades:([tickers;dates];pxs)
q)trades
tickers dates | pxs
------------------| ---
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110
gs 2013.06.21| 240
dis 2013.04.24| 45
bac 2013.01.06| 48
I would like to be able to have a either another column in the table that stores the difference between the current and the previous price, or another structure similar in structure. The key question that the resulting needs to answer is "by how much did the stock change compared to the previous time a price was recorded?"
So far I've tried something along the lines of:
select tickers, dates, pxs - pxs(dates bin (exec dates from trades where tickers = trades.tickers)) from trades
which doesn't really work (at all). Definitely due to trying to do SQL-like queries and having a row-oriented mindset.
Please find below an exemple of the sought after answer:
q)trades: do magic with trades
q)trades
tickers dates | pxs | delta
------------------| --- | -----
ibm 2013.05.01| 100 | 0
bac 2013.01.05| 50 | 0
dis 2013.02.03| 30 | 0
gs 2013.02.11| 250 | 0
ibm 2013.06.17| 110 | 10
gs 2013.06.21| 240 | -10
dis 2013.04.24| 45 | 15
bac 2013.01.06| 48 | -2
Thanks for your help,
Dan
q)update delta:{0,1_deltas x}pxs by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100 0
bac 2013.01.05| 50 0
dis 2013.02.03| 30 0
gs 2013.02.11| 250 0
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2
if you do:
select pxs by dates,tickers from table
you will have a complex column (pxs) which is a list of prices for the particular date and ticker. You can then apply deltas:
select deltas pxs by dates,tickers from table
Which will give you the running difference. The first value is the original pxs though so you'll need to update the first one to 0.
EDIT
Just re-read and having looked at your result, you'll need to join back to your original trade table
update dates, pxs, delta:(0N,(-1_ pxs) - 1_ pxs) by tickers from trades
Please find how it works:
select pxs by tickets from trades
creates table which rows contains: ticket and list pxs.
So in every row we have a list:
tickers| pxs
-------| -------
bac | 50 48
dis | 30 45
gs | 250 240
ibm | 100 110
now we have to apply function which will calculate delta. Best function mentioned above: deltas, but my version is about the same.
if we select - then we will have table with tickers|list of pxs|list of deltas, but is we use update .. by, then it ungroup groupped values.
You can get the same results using the prev function. One thing worth highlighting that prev automatically adds the null (0N) as the first element. This is important as we don't have the previous information available, however, adding a 0 as the first element suggests that there has not been any change; though it depends on how you want to handle the first record.
q)update delta:pxs-prev[pxs] by tickers from trades
tickers dates | pxs delta
------------------| ---------
ibm 2013.05.01| 100
bac 2013.01.05| 50
dis 2013.02.03| 30
gs 2013.02.11| 250
ibm 2013.06.17| 110 10
gs 2013.06.21| 240 -10
dis 2013.04.24| 45 15
bac 2013.01.06| 48 -2
using deltas to get the same results (0N instead of 0)
q)update delta:{0N,1_deltas x}pxs by tickers from trades

take sum of similar column from multiple data table based on unique id in crystal report

I have four datatables like
Table 1
id name Afee Insfee
1 a 100 10
2 b 100 10
Table 2
id name Bfee Insfee
2 b 100 10
1 a 100 10
3 c 100 10
Table 3
id name Cfee Insfee
1 a 100 10
3 c 100 10
Table 4
id name Dfee Insfee
1 a 100 10
2 b 100 10
in the crystal report i want to get the result as
Name Afee Bfee Cfee Dfee Insfee total
a 100 100 100 100 40 440
b 100 100 0 100 30 330
c 0 100 100 0 20 220
where this INSfee should be the sum from all the four table for a particular ID and
total should be the sum of a row in that in that report.
How to do this in a sap crystal report.
To get the sum of Insfee, Create a formula and add the field (Insfee) from all tables using sign "+" and place it adjacent to afee, dfee... etc.
Now to get the total use below code:
Create formulas for all fileds(afee,bfee...etc) in below code I named those as a, a1,a1.
Now create a another formula for "total" and implement below code
Place the formulas in detail section, You will get result.
EvaluateAfter({#a});
EvaluateAfter({#a 1});
EvaluateAfter({#a 2});
{#a}+{#a 1}+{#a 2}

Crystal Report will not sum the duplicated record

I'm working in Crystal Report 2008, And I have a data with like this:
Date KG MT SQM
--------------------------------------------
01-01-2013 25000 25
01-01-2013 15000 15
01-01-2013 15000 15 -----(duplicated)
01-02-2013 13000 13
01-02-2013 12000 12
01-03-2013 18000 18
01-03-2013 33000 33
Then I tried to group it by date so the output is like this: so it's already removed the duplicated and sum the vlaue without a problem.
Date KG MT SQM
--------------------------------------------
01-01-2013 40000 40 55000
01-02-2013 25000 25 25000
01-03-2013 51000 51 51000
TOTAL 116,000 116 131,000
Then I want to add the SQM, and when I sum it, it will sum all the record in the group also with duplicated record.
(if you will sum the 3rd field, it also included in sum the duplicated record should be only '116,000' and not '131,000')
My question, how can I suppressed if duplicated and at the same time it will give me summary of the group to show the correct value?
Thanks,
Captain16