More then Print Repeated Values - jasper-reports

Branch ID date tax total
banglore bang01 22-11-2013 450 5000
banglore bang01 22-11-2013 350 4300
banglore bang01 22-11-2013 450 5000
bangalore bang02 22-11-2013 350 4300
bangalore bang02 22-11-2013 250 2500
banglore bang03 24-11-2013 350 4500
when I remove Print Repeated Values
Branch ID date tax total
banglore bang01 22-11-2013 450 5000
350 4300
450 5000
bang02 350 4300
250 2500
bang03 24-11-2013 350 4500
but I want like this :
Branch ID date tax total
banglore bang01 22-11-2013 450 5000
350 4300
450 5000
bangalore bang02 22-11-2013 350 4300
250 2500
banglore bang03 24-11-2013 350 4500

Related

How do I window join across different dates in kdb

I am a beginner in kdb. As I was practicing window join on test data from NYSE, I came across issues with window join across different dates.
Basically, my table looks like:
t:([] sym:10#`AAPL;date:2021.03.21 2021.03.21 2021.03.21 2021.03.21 2021.03.21 2021.03.22 2021.03.22 2021.03.22 2021.03.22 2021.03.22;price:100 101 105 110 120 130 140 150 160 170;time:10:01 10:04 10:07 10:10 10:13 10:01 10:04 10:07 10:10 10:13)
I am trying to create a sliding window for every 3 minutes on each date and calculate the sum of price in that window. However, I am not sure how to do window join on different groups.
I tried:
w3:-3 0+\:t[`minute];
newdata: wj1[w3;`minute;t;(t;(sum;`price)
but this does not give me the correct result. Could someone please help with this. Thank you!
To do a wj across dates you need a timestamp column which you can create from date and time:
t:update timeStamp:"P"$"D" sv/: flip string (date;time) from t
t
sym date price time timeStamp
---------------------------------------------------------
AAPL 2021.03.21 100 10:01 2021.03.21D10:01:00.000000000
AAPL 2021.03.21 101 10:04 2021.03.21D10:04:00.000000000
AAPL 2021.03.21 105 10:07 2021.03.21D10:07:00.000000000
AAPL 2021.03.21 110 10:10 2021.03.21D10:10:00.000000000
AAPL 2021.03.21 120 10:13 2021.03.21D10:13:00.000000000
You can then use the timeStamp column like so:
w3:-00:03 00:00 +\:t[`timeStamp]
wj1[w3;`timeStamp;t;(t;(sum;`price))]
sym date price time timeStamp
---------------------------------------------------------
AAPL 2021.03.21 100 10:01 2021.03.21D10:01:00.000000000
AAPL 2021.03.21 201 10:04 2021.03.21D10:04:00.000000000
AAPL 2021.03.21 206 10:07 2021.03.21D10:07:00.000000000
AAPL 2021.03.21 215 10:10 2021.03.21D10:10:00.000000000
AAPL 2021.03.21 230 10:13 2021.03.21D10:13:00.000000000
AAPL 2021.03.22 130 10:01 2021.03.22D10:01:00.000000000
AAPL 2021.03.22 270 10:04 2021.03.22D10:04:00.000000000
AAPL 2021.03.22 290 10:07 2021.03.22D10:07:00.000000000
AAPL 2021.03.22 310 10:10 2021.03.22D10:10:00.000000000
AAPL 2021.03.22 330 10:13 2021.03.22D10:13:00.000000000
If you have more than 1 sym in the table you should apply the parted attribute:
t:update `p#sym from `sym`timeStamp xasc t
Then add sym before timeStamp in the 2nd argument of wj:
q)wj[w3;`sym`timeStamp;select sym, timeStamp from t;(t;(sum;`price))]
sym timeStamp price
----------------------------------------
AAPL 2021.03.21D10:01:00.000000000 100
AAPL 2021.03.21D10:04:00.000000000 201
AAPL 2021.03.21D10:07:00.000000000 206
AAPL 2021.03.21D10:10:00.000000000 215
AAPL 2021.03.21D10:13:00.000000000 230
AAPL 2021.03.22D10:01:00.000000000 250
AAPL 2021.03.22D10:04:00.000000000 270
AAPL 2021.03.22D10:07:00.000000000 290
AAPL 2021.03.22D10:10:00.000000000 310
AAPL 2021.03.22D10:13:00.000000000 330
MSFT 2021.03.21D10:01:00.000000000 468
MSFT 2021.03.21D10:04:00.000000000 915
MSFT 2021.03.21D10:07:00.000000000 668
MSFT 2021.03.21D10:10:00.000000000 403
MSFT 2021.03.21D10:13:00.000000000 604
MSFT 2021.03.22D10:01:00.000000000 775
MSFT 2021.03.22D10:04:00.000000000 697
MSFT 2021.03.22D10:07:00.000000000 829
MSFT 2021.03.22D10:10:00.000000000 799
MSFT 2021.03.22D10:13:00.000000000 382

Sum values from the previous N number of days in KDB?

I have a table with following two columns:
Initial Table
Date Value
-------------------
2019.01.01 | 150
2019.01.02 | 100
2019.01.04 | 200
2019.01.07 | 300
2019.01.08 | 100
2019.01.10 | 150
2019.01.14 | 200
2019.01.15 | 100
For each row, I would like to sum values from the previous N number of days. In this case, N = 5.
Resultant Table
Date Value Sum
------------------------
2019.01.01 | 150 | 150 (01 -> ..)
2019.01.02 | 100 | 250 (02 -> 01)
2019.01.04 | 200 | 450 (04 -> 01)
2019.01.07 | 300 | 600 (07 -> 02)
2019.01.08 | 100 | 600 (08 -> 04)
2019.01.10 | 150 | 550 (10 -> 07)
2019.01.14 | 200 | 350 (14 -> 10)
2019.01.15 | 100 | 450 (15 -> 10)
Query
t:([] Date: 2019.01.01 2019.01.02 2019.01.04 2019.01.07 2019.01.08 2019.01.10 2019.01.14 2019.01.15; Value: 150 100 200 300 100 150 200 100)
How can I go about doing that?
One way you could go about this is to use an update statement like below:
q)N:5
q)update Sum:sum each Value where each Date within/:flip(Date-N;Date)from t
Date Value Sum
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 600
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 450
The within keyword checks each date in the Date column is within the window of the current date and the current date-N, which is possible with an each right.
q)flip(-5+t`Date;t`Date)
2018.12.27 2019.01.01
2018.12.28 2019.01.02
2018.12.30 2019.01.04
2019.01.02 2019.01.07
2019.01.03 2019.01.08
2019.01.05 2019.01.10
2019.01.09 2019.01.14
2019.01.10 2019.01.15
q)t[`Date]within/:flip(-5+t`Date;t`Date)
10000000b
11000000b
11100000b
01110000b
00111000b
00011100b
00000110b
00000111b
This will return a list of boolean lists, which can be turned to indexes using where each (each since its a list of list), then indexed back into Value.
q)where each t[`Date]within/:flip(-5+t`Date;t`Date)
,0
0 1
0 1 2
1 2 3
2 3 4
3 4 5
5 6
5 6 7
q)t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
,150
150 100
150 100 200
100 200 300
200 300 100
300 100 150
150 200
150 200 100
Then using sum each you can sum each of the list of numbers to get your desired result.
q)sum each t[`Value]where each t[`Date]within/:flip(-5+t`Date;t`Date)
150 250 450 600 600 550 350 450
You could also achieve this using an update statement like the one below. It doesn't require the flip and so should execute faster.
q)N:5
q)delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
Date Value runningSum
---------------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 600
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 450
This works using sums on the Value column, and then bin to find the running count from N days prior.
The delete keyword then removes the summed Value column to obtain your required result
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-N] from update s:sums Value from t
7
While the time difference between this answer and Elliot's is negligible for small values of N, for larger values e.g. 1000, this is faster
q)\t:1000 update Sum:sum each Value where each Date within/:flip(Date-1000;Date)from t
11
q)\t:1000 delete s from update runningSum:s-0^s[Date bin neg[1]+Date-1000] from update s:sums Value from t
7
It should be noted that this answer requires the date field to be sorted, where Elliot's does not.
Another slightly slower way would is to generate 0 values for all the dates that is in between the min and max Date.
Then can use moving sums, msums, to get the values for the past 5 days.
It first takes the min and max Date from the table and makes a list of the dates that span between them.
q)update t: 0^Value from ([]Date:{[x] x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date Value t
--------------------
2019.01.01 150 150
2019.01.02 100 100
2019.01.03 0
2019.01.04 200 200
2019.01.05 0
2019.01.06 0
2019.01.07 300 300
2019.01.08 100 100
2019.01.09 0
2019.01.10 150 150
Then it adds them to the table and fills in the empty values. This will then work for only the previous N days, taking into account any missing data
q){[x] select from x where not null Value } update t: 5 msum 0^Value from ([]Date:{[x] x[0]+til 1+x[1]-x[0]} exec (min[Date], max Date) from t) lj `Date xkey t
Date Value t
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 500
2019.01.08 100 600
2019.01.10 150 550
2019.01.14 200 350
2019.01.15 100 300
I would also be careful when using Value as a column name, as you can run into issues with the value keyword
I hope this answers your question
A window join is a pretty natural fit here. See: https://code.kx.com/v2/ref/wj/
q)wj1[-5 0+\:t`Date;`Date;t;(t;(sum;`Value))]
Date Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 600
2019.01.08 600
2019.01.10 550
2019.01.14 350
2019.01.15 450
To go back 5 observations rather than 5 calendar days you could do:
q)wj1[{(4 xprev x;x)}t`Date;`Date;t;(t;(sum;`Value))]
Date Value
----------------
2019.01.01 150
2019.01.02 250
2019.01.04 450
2019.01.07 750
2019.01.08 850
2019.01.10 850
2019.01.14 950
2019.01.15 850
You can use the moving window mwin function to achieve this:
mwin:{[f;w;l] f each {1_x,y}\[w#0n;`float$l]}
You can then set the function f to sum and get the desired results over the last w:5 days for the desired list of values l (here l:exec Value from t):
update Sum:(mwin[sum;5;] exec Value from t) from t
Date Value Sum
--------------------
2019.01.01 150 150
2019.01.02 100 250
2019.01.04 200 450
2019.01.07 300 750
2019.01.08 100 850
2019.01.10 150 850
2019.01.14 200 950
2019.01.15 100 850

Create PostgreSQL view to feed a chart generating tool having filter option

We need to create a postgres SQL view to generate a chart. Chart creating tool allow only a single SQL view as input. The chart has the filter option by studentname, cousecode and feecode. Other than the chart display, we need to show the sum of the total course fee and fee amount paid by all the students from the same view.
table1: student
id name address
1 John USA
2 Robert UK
3 Tinger NZ
table2: student_course
id std_id coursecode fee
1 1 CHEM 3000
2 1 PHY 4000
3. 1 BIO 2000
4. 2 CHEM 3000
5. 2 GEO 1500
6. 3 ENG 2000
table3: student_fees
id std_name coursecode feecode amount
1 1 CHEM BKFEE 100
2 1 CHEM SPFEE 140
3 1 CHEM MATFEE 250
4 1 PHY BKFEE 150
5 1 PHY SPFEE 200
6 1 BIO LBFEE 300
7 1 BIO MATFEE 350
9 1 BIO TECFEE 200
10 2 CHEM BKFEE 100
11 2 CHEM SPFEE 140
12 2 GEO BKFEE 150
13 3 ENG BKFEE 75
14 3 ENG SPFEE 140
15 3 ENG LBFEE 180
Am able to create a view like this. But this view is not enough for my operation. Because from this view I couldn't calculate the sum of the total course fee(course fee is repeating). In this case, the grouping will not work. Because of the need to filter the data by studentname,coursecode and feecode.
View:
id std_id coursecode course_fee feecode fee_amount
1 John CHEM 3000 BKFEE 100
2 John CHEM 3000 SPFEE 140
3 John CHEM 3000 MATFEE 250
4 John PHY 4000 BKFEE 150
5 John PHY 4000 SPFEE 200
6 John BIO 4000 LBFEE 300
7 John BIO 4000 MATFEE 350
8 John BIO 4000 TECFEE 200
9 Robert CHEM 3000 BKFEE 100
10 Robert CHEM 3000 SPFEE 140
11 Robert GEO 1500 BKFEE 150
12 Tinger ENG 2000 BKFEE 75
13 Tinger ENG 2000 SPFEE 140
14 Tinger ENG 2000 LBFEE 180
So in any way can we create a view like this ?
View:
id std_id coursecode course_fee feecode fee_amount
1 John CHEM 3000 BKFEE 100
2 John CHEM 0 SPFEE 140
3 John CHEM 0 MATFEE 250
4 John PHY 4000 BKFEE 150
5 John PHY 0 SPFEE 200
6 John BIO 4000 LBFEE 300
7 John BIO 0 MATFEE 350
8 John BIO 0 TECFEE 200
9 Robert CHEM 3000 BKFEE 100
10 Robert CHEM 0 SPFEE 140
11 Robert GEO 1500 BKFEE 150
12 Tinger ENG 2000 BKFEE 75
13 Tinger ENG 0 SPFEE 140
14 Tinger ENG 0 LBFEE 180
Any help appreciated...
I guess you are looking for rollup functionality in your view query i am shearing you 2 links fist one is for the basics how rollup works and the 2nd one is specific to Postgresql
first link , second link Hope this will help you
I have work out one demo for you please check rollup query
Not similar to the answer you are expecting but you can explore GROUPING SET
select name, sf.coursecode, amount, sum(fee)
from student s, student_course sc, student_fees sf
where s.id = sc.std_id
and sf.std_name = s.id
and sf.coursecode = sc.coursecode
group by
GROUPING SETS (
(name, sf.coursecode, amount, fee),
(name, sf.coursecode, fee),
()
)
order by name, sf.coursecode asc

Combine 2 data frames with different columns in spark

I have 2 dataframes:
df1 :
Id purchase_count purchase_sim
12 100 1500
13 1020 1300
14 1010 1100
20 1090 1400
21 1300 1600
df2:
Id click_count click_sim
12 1030 2500
13 1020 1300
24 1010 1100
30 1090 1400
31 1300 1600
I need to get the combined data frame with results as :
Id click_count click_sim purchase_count purchase_sim
12 1030 2500 100 1500
13 1020 1300 1020 1300
14 null null 1010 1100
24 1010 1100 null null
30 1090 1400 null null
31 1300 1600 null null
20 null null 1090 1400
21 null null 1300 1600
I can't use union because of different column names. Can some one suggest me a better way to do this ?
All you require a full outer join on ID column.
df1.join(df2, Seq("Id"), "full_outer")
// Since the Id column name is same in both the dataframes, if you use comparison like
df1($"Id") === df2($"Id"), you will get duplicate ID columns
Please refer the below documentation for future references.
https://docs.databricks.com/spark/latest/faq/join-two-dataframes-duplicated-column.html

How to match date and string from 2 lists (KDB)?

I have two lists:
data:
dt sym bid ask
2017.01.01D05:00:09.140745000 AAPL 101.20 101.30
2017.01.01D05:00:09.284281800 GOOG 801.00 802.00
2017.01.02D05:00:09.824847299 AAPL 101.30 101.40
info:
date sym shares divisor
2017.01.01 AAPL 500 2
2017.01.01 GOOG 100 1
2017.01.02 AAPL 200 2
I need to append from "info" the shares and divisor values for each ticker based on the date. How can I achieve this? Below is an example:
result:
dt sym bid ask shares divisor
2017.01.01D05:00:09.140745000 AAPL 101.20 101.30 500 2
2017.01.01D05:00:09.284281800 GOOG 801.00 802.00 100 1
2017.01.02D05:00:09.824847299 AAPL 101.30 101.40 200 2
If matching based on an exact date match then you can use lj. For this to work you will need to create a date column in the data table and key info by date and sym. Like so:
(update date:`date$dt from data)lj 2!info
dt sym price date shares divisor
---------------------------------------------------------------------
2018.02.04D17:25:06.658216000 AAPL 103.9275 2018.02.04 500 2
2018.02.04D17:25:06.658216000 GOOG 105.1709 2018.02.04 100 1
2018.02.05D17:25:06.658217000 AAPL 105.1598 2018.02.05 200 2
2018.02.05D17:25:06.658217000 GOOG 104.0666 2018.02.05
You can then delete the date column from this output.
It might be useful for you to use the stepped attribute [ http://code.kx.com/q/cookbook/temporal-data/#stepped-attribute ]
This will allow you to have e.g. missing dates from the info table and use the "most recent" date instead (so you don't have to have data for every sym every day). For example, without stepped attribute:
q)data:([] dt:(10?2017.01.01+til 2)+10?.z.t;sym:10?`AAPL`GOOG;bid:100+10?5;ask:105+10?5)
q)info:([] date:2017.01.01 2017.01.01 2017.01.02;sym:`AAPL`GOOG`AAPL;shares:500 100 200;divisor:2 1 2)
q)(update date:`date$dt from data) lj 2!info
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.01D04:04:03.440000000 GOOG 104 105 2017.01.01 100 1
2017.01.01D14:00:02.748000000 GOOG 104 105 2017.01.01 100 1
2017.01.02D09:34:52.869000000 GOOG 102 106 2017.01.02
2017.01.02D16:44:16.648000000 AAPL 100 107 2017.01.02 200 2
2017.01.01D08:48:23.285000000 AAPL 102 108 2017.01.01 500 2
2017.01.02D02:31:11.038000000 AAPL 104 109 2017.01.02 200 2
2017.01.01D05:50:50.463000000 GOOG 104 109 2017.01.01 100 1
2017.01.02D02:13:45.275000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D10:25:30.322000000 AAPL 104 109 2017.01.01 500 2
2017.01.01D14:51:12.687000000 AAPL 103 109 2017.01.01 500 2
Note the nulls for GOOG on 2017.01.02. With stepped attribute:
q)(update date:`date$dt from data) lj `s#2!`sym xasc `sym`date xcols info
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.01D04:04:03.440000000 GOOG 104 105 2017.01.01 100 1
2017.01.01D14:00:02.748000000 GOOG 104 105 2017.01.01 100 1
2017.01.02D09:34:52.869000000 GOOG 102 106 2017.01.02 100 1
2017.01.02D16:44:16.648000000 AAPL 100 107 2017.01.02 200 2
2017.01.01D08:48:23.285000000 AAPL 102 108 2017.01.01 500 2
2017.01.02D02:31:11.038000000 AAPL 104 109 2017.01.02 200 2
2017.01.01D05:50:50.463000000 GOOG 104 109 2017.01.01 100 1
2017.01.02D02:13:45.275000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D10:25:30.322000000 AAPL 104 109 2017.01.01 500 2
2017.01.01D14:51:12.687000000 AAPL 103 109 2017.01.01 500 2
Here, GOOG gets the values for 2017.01.01 as there is no new value on 2017.01.02
Could possibly use an aj as well.
q)aj[`date`sym;update date:`date$dt from data;info]
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.02D07:57:14.764000000 GOOG 101 109 2017.01.02 200 2
2017.01.02D02:31:39.330000000 AAPL 100 105 2017.01.02 200 2
2017.01.02D04:25:17.604000000 AAPL 102 107 2017.01.02 200 2
2017.01.01D01:47:51.333000000 GOOG 104 106 2017.01.01 100 1
2017.01.02D15:50:12.140000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D02:59:16.636000000 GOOG 102 106 2017.01.01 100 1
2017.01.01D14:35:31.860000000 AAPL 100 107 2017.01.01 500 2
2017.01.01D16:36:29.214000000 GOOG 101 108 2017.01.01 100 1
2017.01.01D14:01:18.498000000 GOOG 101 107 2017.01.01 100 1
2017.01.02D08:31:52.958000000 AAPL 102 109 2017.01.02 200 2