Create PostgreSQL view to feed a chart generating tool having filter option - postgresql

We need to create a postgres SQL view to generate a chart. Chart creating tool allow only a single SQL view as input. The chart has the filter option by studentname, cousecode and feecode. Other than the chart display, we need to show the sum of the total course fee and fee amount paid by all the students from the same view.
table1: student
id name address
1 John USA
2 Robert UK
3 Tinger NZ
table2: student_course
id std_id coursecode fee
1 1 CHEM 3000
2 1 PHY 4000
3. 1 BIO 2000
4. 2 CHEM 3000
5. 2 GEO 1500
6. 3 ENG 2000
table3: student_fees
id std_name coursecode feecode amount
1 1 CHEM BKFEE 100
2 1 CHEM SPFEE 140
3 1 CHEM MATFEE 250
4 1 PHY BKFEE 150
5 1 PHY SPFEE 200
6 1 BIO LBFEE 300
7 1 BIO MATFEE 350
9 1 BIO TECFEE 200
10 2 CHEM BKFEE 100
11 2 CHEM SPFEE 140
12 2 GEO BKFEE 150
13 3 ENG BKFEE 75
14 3 ENG SPFEE 140
15 3 ENG LBFEE 180
Am able to create a view like this. But this view is not enough for my operation. Because from this view I couldn't calculate the sum of the total course fee(course fee is repeating). In this case, the grouping will not work. Because of the need to filter the data by studentname,coursecode and feecode.
View:
id std_id coursecode course_fee feecode fee_amount
1 John CHEM 3000 BKFEE 100
2 John CHEM 3000 SPFEE 140
3 John CHEM 3000 MATFEE 250
4 John PHY 4000 BKFEE 150
5 John PHY 4000 SPFEE 200
6 John BIO 4000 LBFEE 300
7 John BIO 4000 MATFEE 350
8 John BIO 4000 TECFEE 200
9 Robert CHEM 3000 BKFEE 100
10 Robert CHEM 3000 SPFEE 140
11 Robert GEO 1500 BKFEE 150
12 Tinger ENG 2000 BKFEE 75
13 Tinger ENG 2000 SPFEE 140
14 Tinger ENG 2000 LBFEE 180
So in any way can we create a view like this ?
View:
id std_id coursecode course_fee feecode fee_amount
1 John CHEM 3000 BKFEE 100
2 John CHEM 0 SPFEE 140
3 John CHEM 0 MATFEE 250
4 John PHY 4000 BKFEE 150
5 John PHY 0 SPFEE 200
6 John BIO 4000 LBFEE 300
7 John BIO 0 MATFEE 350
8 John BIO 0 TECFEE 200
9 Robert CHEM 3000 BKFEE 100
10 Robert CHEM 0 SPFEE 140
11 Robert GEO 1500 BKFEE 150
12 Tinger ENG 2000 BKFEE 75
13 Tinger ENG 0 SPFEE 140
14 Tinger ENG 0 LBFEE 180
Any help appreciated...

I guess you are looking for rollup functionality in your view query i am shearing you 2 links fist one is for the basics how rollup works and the 2nd one is specific to Postgresql
first link , second link Hope this will help you
I have work out one demo for you please check rollup query

Not similar to the answer you are expecting but you can explore GROUPING SET
select name, sf.coursecode, amount, sum(fee)
from student s, student_course sc, student_fees sf
where s.id = sc.std_id
and sf.std_name = s.id
and sf.coursecode = sc.coursecode
group by
GROUPING SETS (
(name, sf.coursecode, amount, fee),
(name, sf.coursecode, fee),
()
)
order by name, sf.coursecode asc

Related

Address and smoothen noise in sensor data

I have sensors data as below wherein under Data Column, there are 6rows containing value 45 in between preceding and following rows containing value 50. The requirement is to clean this data and impute with 50 (prev value) in the new_data column. Moreover, the no of noise records (shown as 45 in table) might either vary in number or with level of rows.
Case 1 (sample data) :-
Sl.no
Timestamp
Data
New_data
1
1/1/2021 0:00:00
50
50
2
1/1/2021 0:15:00
50
50
3
1/1/2021 0:30:00
50
50
4
1/1/2021 0:45:00
50
50
5
1/1/2021 1:00:00
50
50
6
1/1/2021 1:15:00
50
50
7
1/1/2021 1:30:00
50
50
8
1/1/2021 1:45:00
50
50
9
1/1/2021 2:00:00
50
50
10
1/1/2021 2:15:00
50
50
11
1/1/2021 2:30:00
45
50
12
1/1/2021 2:45:00
45
50
13
1/1/2021 3:00:00
45
50
14
1/1/2021 3:15:00
45
50
15
1/1/2021 3:30:00
45
50
16
1/1/2021 3:45:00
45
50
17
1/1/2021 4:00:00
50
50
18
1/1/2021 4:15:00
50
50
19
1/1/2021 4:30:00
50
50
20
1/1/2021 4:45:00
50
50
21
1/1/2021 5:00:00
50
50
22
1/1/2021 5:15:00
50
50
23
1/1/2021 5:30:00
50
50
I am thinking of a need to group these data ordered by timestamp asc (like below) and then could have a condition in place where it will have to check group by group in large sample data and if group 1 is same as group 3 , replace group 2 with group 1 values.
Sl.no
Timestamp
Data
New_data
group
1
1/1/2021 0:00:00
50
50
1
2
1/1/2021 0:15:00
50
50
1
3
1/1/2021 0:30:00
50
50
1
4
1/1/2021 0:45:00
50
50
1
5
1/1/2021 1:00:00
50
50
1
6
1/1/2021 1:15:00
50
50
1
7
1/1/2021 1:30:00
50
50
1
8
1/1/2021 1:45:00
50
50
1
9
1/1/2021 2:00:00
50
50
1
10
1/1/2021 2:15:00
50
50
1
11
1/1/2021 2:30:00
45
50
2
12
1/1/2021 2:45:00
45
50
2
13
1/1/2021 3:00:00
45
50
2
14
1/1/2021 3:15:00
45
50
2
15
1/1/2021 3:30:00
45
50
2
16
1/1/2021 3:45:00
45
50
2
17
1/1/2021 4:00:00
50
50
3
18
1/1/2021 4:15:00
50
50
3
19
1/1/2021 4:30:00
50
50
3
20
1/1/2021 4:45:00
50
50
3
21
1/1/2021 5:00:00
50
50
3
22
1/1/2021 5:15:00
50
50
3
23
1/1/2021 5:30:00
50
50
3
Moreover, there is also a need to add an exception like, if the next group is having similar pattern, not to change but to retain the data as it is.
Ex below : If group 1 and group 3 are same , impute group 2 with group 1 value.
But if group 2 and group 4 are same, do not change group 3 , retain same data in New_data.
Case 2:-
Sl.no
Timestamp
Data
New_data
group
1
1/1/2021 0:00:00
50
50
1
2
1/1/2021 0:15:00
50
50
1
3
1/1/2021 0:30:00
50
50
1
4
1/1/2021 0:45:00
50
50
1
5
1/1/2021 1:00:00
50
50
1
6
1/1/2021 1:15:00
50
50
1
7
1/1/2021 1:30:00
50
50
1
8
1/1/2021 1:45:00
50
50
1
9
1/1/2021 2:00:00
50
50
1
10
1/1/2021 2:15:00
50
50
1
11
1/1/2021 2:30:00
45
50
2
12
1/1/2021 2:45:00
45
50
2
13
1/1/2021 3:00:00
45
50
2
14
1/1/2021 3:15:00
45
50
2
15
1/1/2021 3:30:00
45
50
2
16
1/1/2021 3:45:00
45
50
2
17
1/1/2021 4:00:00
50
50
3
18
1/1/2021 4:15:00
50
50
3
19
1/1/2021 4:30:00
50
50
3
20
1/1/2021 4:45:00
50
50
3
21
1/1/2021 5:00:00
50
50
3
22
1/1/2021 5:15:00
50
50
3
23
1/1/2021 5:30:00
50
50
3
24
1/1/2021 5:45:00
45
45
4
25
1/1/2021 6:00:00
45
45
4
26
1/1/2021 6:15:00
45
45
4
27
1/1/2021 6:30:00
45
45
4
28
1/1/2021 6:45:00
45
45
4
29
1/1/2021 7:00:00
45
45
4
30
1/1/2021 7:15:00
45
45
4
31
1/1/2021 7:30:00
45
45
4
Reaching out for help in coding in postgresql to address above scenario. Please feel free to suggest any alternative approaches to solve above problem.
The query below should answer the need.
The first query identifies the rows which correspond to a change of
data.
The second query groups the rows between two successive changes of data and set up the corresponding range of timestamp
The third query is a recursive query which calculates the new_data in an
iterative way according to the timestamp order.
The last query display the expected result.
WITH RECURSIVE list As
(
SELECT no
, timestamp
, lag(data) OVER w AS previous
, data
, lead(data) OVER w AS next
, data IS DISTINCT FROM lag(data) OVER w AS first
, data IS DISTINCT FROM lead(data) OVER w AS last
FROM sensors
WINDOW w AS (ORDER BY timestamp ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
), range_list AS
(
SELECT tsrange(timestamp, lead(timestamp) OVER w, '[]') AS range
, previous
, data
, lead(next) OVER w AS next
, first
FROM list
WHERE first OR last
WINDOW w AS (ORDER BY timestamp ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
), rec_list (range, previous, data, next, new_data, arr) AS
(
SELECT range
, previous
, data
, next
, data
, array[range]
FROM range_list
WHERE previous IS NULL
UNION ALL
SELECT c.range
, p.data
, c.data
, c.next
, CASE
WHEN p.new_data IS NOT DISTINCT FROM c.next
THEN p.data
ELSE c.data
END
, p.arr || c.range
FROM rec_list AS p
INNER JOIN range_list AS c
ON lower(c.range) = upper(p.range) + interval '15 minutes'
WHERE NOT array[c.range] <# p.arr
AND first
)
SELECT s.*, r.new_data
FROM sensors AS s
INNER JOIN rec_list AS r
ON r.range #> s.timestamp
ORDER BY timestamp
see the test result in dbfiddle

Add unique rows for each group when similar group repeats after certain rows

Hi Can anyone help me please to get unique group number?
I need to give unique rows for each group even when same group repeats after some groups.
I have following data:
id version product startdate enddate
123 0 2443 2010/09/01 2011/01/02
123 1 131 2011/01/03 2011/03/09
123 2 131 2011/08/10 2012/09/10
123 3 3009 2012/09/11 2014/03/31
123 4 668 2014/04/01 2014/04/30
123 5 668 2014/05/01 2016/01/01
123 6 668 2016/01/02 2017/09/08
123 7 131 2017/09/09 2017/10/10
123 8 131 2018/10/11 2019/01/01
123 9 550 2019/01/02 2099/01/01
select *,
dense_rank()over(partition by id order by id,product)
from table
Expected results:
id version product startdate enddate count
123 0 2443 2010/09/01 2011/01/02 1
123 1 131 2011/01/03 2011/03/09 2
123 2 131 2011/08/10 2012/09/10 2
123 3 3009 2012/09/11 2014/03/31 3
123 4 668 2014/04/01 2014/04/30 4
123 5 668 2014/05/01 2016/01/01 4
123 6 668 2016/01/02 2017/09/08 4
123 7 131 2017/09/09 2017/10/10 5
123 8 131 2018/10/11 2019/01/01 5
123 9 550 2019/01/02 2099/01/01 6
Try the following
SELECT
id,version,product,startdate,enddate,
1+SUM(v)OVER(PARTITION BY id ORDER BY version) n
FROM
(
SELECT
*,
IIF(LAG(product)OVER(PARTITION BY id ORDER BY version)<>product,1,0) v
FROM TestTable
) q

Combine 2 data frames with different columns in spark

I have 2 dataframes:
df1 :
Id purchase_count purchase_sim
12 100 1500
13 1020 1300
14 1010 1100
20 1090 1400
21 1300 1600
df2:
Id click_count click_sim
12 1030 2500
13 1020 1300
24 1010 1100
30 1090 1400
31 1300 1600
I need to get the combined data frame with results as :
Id click_count click_sim purchase_count purchase_sim
12 1030 2500 100 1500
13 1020 1300 1020 1300
14 null null 1010 1100
24 1010 1100 null null
30 1090 1400 null null
31 1300 1600 null null
20 null null 1090 1400
21 null null 1300 1600
I can't use union because of different column names. Can some one suggest me a better way to do this ?
All you require a full outer join on ID column.
df1.join(df2, Seq("Id"), "full_outer")
// Since the Id column name is same in both the dataframes, if you use comparison like
df1($"Id") === df2($"Id"), you will get duplicate ID columns
Please refer the below documentation for future references.
https://docs.databricks.com/spark/latest/faq/join-two-dataframes-duplicated-column.html

How to match date and string from 2 lists (KDB)?

I have two lists:
data:
dt sym bid ask
2017.01.01D05:00:09.140745000 AAPL 101.20 101.30
2017.01.01D05:00:09.284281800 GOOG 801.00 802.00
2017.01.02D05:00:09.824847299 AAPL 101.30 101.40
info:
date sym shares divisor
2017.01.01 AAPL 500 2
2017.01.01 GOOG 100 1
2017.01.02 AAPL 200 2
I need to append from "info" the shares and divisor values for each ticker based on the date. How can I achieve this? Below is an example:
result:
dt sym bid ask shares divisor
2017.01.01D05:00:09.140745000 AAPL 101.20 101.30 500 2
2017.01.01D05:00:09.284281800 GOOG 801.00 802.00 100 1
2017.01.02D05:00:09.824847299 AAPL 101.30 101.40 200 2
If matching based on an exact date match then you can use lj. For this to work you will need to create a date column in the data table and key info by date and sym. Like so:
(update date:`date$dt from data)lj 2!info
dt sym price date shares divisor
---------------------------------------------------------------------
2018.02.04D17:25:06.658216000 AAPL 103.9275 2018.02.04 500 2
2018.02.04D17:25:06.658216000 GOOG 105.1709 2018.02.04 100 1
2018.02.05D17:25:06.658217000 AAPL 105.1598 2018.02.05 200 2
2018.02.05D17:25:06.658217000 GOOG 104.0666 2018.02.05
You can then delete the date column from this output.
It might be useful for you to use the stepped attribute [ http://code.kx.com/q/cookbook/temporal-data/#stepped-attribute ]
This will allow you to have e.g. missing dates from the info table and use the "most recent" date instead (so you don't have to have data for every sym every day). For example, without stepped attribute:
q)data:([] dt:(10?2017.01.01+til 2)+10?.z.t;sym:10?`AAPL`GOOG;bid:100+10?5;ask:105+10?5)
q)info:([] date:2017.01.01 2017.01.01 2017.01.02;sym:`AAPL`GOOG`AAPL;shares:500 100 200;divisor:2 1 2)
q)(update date:`date$dt from data) lj 2!info
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.01D04:04:03.440000000 GOOG 104 105 2017.01.01 100 1
2017.01.01D14:00:02.748000000 GOOG 104 105 2017.01.01 100 1
2017.01.02D09:34:52.869000000 GOOG 102 106 2017.01.02
2017.01.02D16:44:16.648000000 AAPL 100 107 2017.01.02 200 2
2017.01.01D08:48:23.285000000 AAPL 102 108 2017.01.01 500 2
2017.01.02D02:31:11.038000000 AAPL 104 109 2017.01.02 200 2
2017.01.01D05:50:50.463000000 GOOG 104 109 2017.01.01 100 1
2017.01.02D02:13:45.275000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D10:25:30.322000000 AAPL 104 109 2017.01.01 500 2
2017.01.01D14:51:12.687000000 AAPL 103 109 2017.01.01 500 2
Note the nulls for GOOG on 2017.01.02. With stepped attribute:
q)(update date:`date$dt from data) lj `s#2!`sym xasc `sym`date xcols info
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.01D04:04:03.440000000 GOOG 104 105 2017.01.01 100 1
2017.01.01D14:00:02.748000000 GOOG 104 105 2017.01.01 100 1
2017.01.02D09:34:52.869000000 GOOG 102 106 2017.01.02 100 1
2017.01.02D16:44:16.648000000 AAPL 100 107 2017.01.02 200 2
2017.01.01D08:48:23.285000000 AAPL 102 108 2017.01.01 500 2
2017.01.02D02:31:11.038000000 AAPL 104 109 2017.01.02 200 2
2017.01.01D05:50:50.463000000 GOOG 104 109 2017.01.01 100 1
2017.01.02D02:13:45.275000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D10:25:30.322000000 AAPL 104 109 2017.01.01 500 2
2017.01.01D14:51:12.687000000 AAPL 103 109 2017.01.01 500 2
Here, GOOG gets the values for 2017.01.01 as there is no new value on 2017.01.02
Could possibly use an aj as well.
q)aj[`date`sym;update date:`date$dt from data;info]
dt sym bid ask date shares divisor
--------------------------------------------------------------------
2017.01.02D07:57:14.764000000 GOOG 101 109 2017.01.02 200 2
2017.01.02D02:31:39.330000000 AAPL 100 105 2017.01.02 200 2
2017.01.02D04:25:17.604000000 AAPL 102 107 2017.01.02 200 2
2017.01.01D01:47:51.333000000 GOOG 104 106 2017.01.01 100 1
2017.01.02D15:50:12.140000000 AAPL 101 107 2017.01.02 200 2
2017.01.01D02:59:16.636000000 GOOG 102 106 2017.01.01 100 1
2017.01.01D14:35:31.860000000 AAPL 100 107 2017.01.01 500 2
2017.01.01D16:36:29.214000000 GOOG 101 108 2017.01.01 100 1
2017.01.01D14:01:18.498000000 GOOG 101 107 2017.01.01 100 1
2017.01.02D08:31:52.958000000 AAPL 102 109 2017.01.02 200 2

Tableau Pivot Rows into Columns

I have a table structure like this:
Department Employee Class Peroid Qty1 Qty2 Qty3
----------------------------------------------------
Dept1 John 1 1st 1 2 3
Dept1 John 1 2nd 11 22 33
Dept1 Mary 1 1st 2 3 4
Dept1 Mary 1 2nd 22 33 44
Dept2 Joe 1 1st 3 4 5
Dept2 Joe 1 2nd 33 44 55
Dept2 Paul 1 1st 4 5 6
Dept2 Paul 1 2nd 44 55 66
In a view I'd like to display the format as such:
Class / Period
1
Department Employee 1st 2nd
----------------------------------------------
Dept1 John 1 2 3 11 22 33
Dept1 Mary 2 3 4 22 33 44
Dept2 Joe 3 4 5 33 44 55
Dept2 Paul 4 5 6 44 55 66
I can't seem to find a way to do this. I have Class, Period as Columns and Department, Employee as Rows then drag Qty1, Qty2, Qty3 to the Text Mark but the format becomes:
Class / Period
1
Department Employee 1st 2nd
----------------------------------------------
Dept1 John 1 11
2 22
3 33
Dept1 Mary 2 22
3 33
4 44
Dept2 Joe 3 33
4 44
5 55
Dept2 Paul 4 44
5 55
6 66
How do I turn those rows under each employee to sub-columns under Period?
I think this is what you're trying to achieve.
A lot of times when you see a repeating column in a database table, Qty1, Qty2, Qty3, it is a sign that you really want multiple rows each with a single Qty (and repeating the other information) -- At least when you are building reports. That way you can have rows with any number of instances of Qty, and you can also easily aggregate all the Qty together when needed.
There are situations where you may want to stick with a repeating field design. But if you do want to reshape the data, you can do that in Tableau's data connection window by selecting the columns you want to pull out into a single field and selecting the pivot command.