I faced with a grouping problem in Chart >>> Pivot table of Apache Superset (version 2023.01.1). I created pivot table chart with all criterias as below Column A: Product list, Column B: Jan, Column C: Feb till Dec.
So in Pivot Table Superset like: Time = Months, Columns = date , Rows = Product list and
Metric=
CASE
WHEN Product_List = 'Product_A' AND to_char(date,'Mon' ) = 'Jan'
THEN (SUM(sales)/(10000*0.01)
WHEN Product_List = 'Product_B' AND to_char(date,'Mon' ) = 'Feb'
THEN (SUM(sales)/(20000*0.01)
END
So the goal is to find the rate of products for each months based on different target sales. For Example
Sales for Jan = 100 units
Target sales for Jan = 1000 units
Rate = 100/1000*0.01= %10 for Jan
Sales for Feb = 150 units
Target sales for Feb = 2000 units
Rate = 150/2000*0.01= %Feb for Feb
Pivot Table:
Product Name
Jan
Product A
%20
Product B
%25
Note: The SQL code above is working in SQL LAB in Apache Superset. However When I apply it to chart>pivot table>metrics. it gives me error as 'must appear in the GROUP BY clause or be used in an aggregate function'
intake class student_id
Sep 2022 - Eng English 100
Sep 2022 - Eng English 101
Nov 2022 - Sc Science 100
Jan 2023 - Bio Biology 101
Nov 2022 - Sc Science 102
Sep 2022 - Eng English 102
Jan 2023 - Bio Biology 102
Jan 2023 - Bio Biology 103
Jan 2023 - Bio Biology 105
Feb 2023 - Eng English 104
Feb 2023 - Eng English 103
Hello everyone,
I have a table as shown above. Each row in the table represent the student who is going to attend the classes. For example by looking at the Sep 2022 English class, I know that students with ID 100,101,102 are going to attend the class, and student 100,102 are going to attend Nov 2022 Science class, etc...
What I want to do is to transform the table into another format where it tells how many students did not attend or are not going to attend other classes among the students that are attending the class right now. The table below is the expected output:
I will show how to get the value in the table that are shown in the screenshot:
For example
When student 100,101,102 are attending the Sep 2022 English class, among three of them:
None of them did not attend or not going to attend English class (as they are
attending the English class right now);
One of them did not attend or not going to attend science class (student
101) since only student 100,102 are in the list of science class;
One of them did not attend or not going to attend biology class
(student 100) since only student 101,102 are in the list
to attend biology class and student 100 is not in the list.
Hence, for Sep 2022 - Eng intake:
no_english = 0
no_science = 1
no_biology = 1
Giving another example
When student 101,102,103,105 are attending the Jan 2023 Biology class, among 4 of them:
One of them did not attend or not going to attend English class (student 105) since student 101,102 attended Sep 2022 English class and student 103 going to attend Feb 2023 English class;
three of them did not attend or not going to attend science class (student
101,103,105) since only student 102 are in the list of science class;
None of them did not attend or not going to attend biology class since all of them are attending Biology class right now.
Hence, for Jan 2023 - Bio intake:
no_english = 1
no_science = 3
no_biology = 0
I have been struggled to transform the data into the desired format like what I show in the screenshot. In fact, I'm not sure whether it is possible to do it or not using powerquery or DAX. Any help or advise will be greatly appreciated. Let me know if my question is not clear.
Add 3 measures to a table as follows:
no_science =
VAR ids = VALUES('Table'[student_id])
VAR ids_sci = CALCULATETABLE(VALUES( 'Table'[student_id]), REMOVEFILTERS('Table'), 'Table'[class] = "Science")
RETURN COUNTX( EXCEPT(ids, ids_sci), 'Table'[student_id])+0
no_english =
VAR ids = VALUES('Table'[student_id])
VAR ids_eng = CALCULATETABLE(VALUES( 'Table'[student_id]), REMOVEFILTERS('Table'), 'Table'[class] = "English")
RETURN COUNTX( EXCEPT(ids, ids_eng), 'Table'[student_id])+0
no_biology =
VAR ids = VALUES('Table'[student_id])
VAR ids_bio = CALCULATETABLE(VALUES( 'Table'[student_id]), REMOVEFILTERS('Table'), 'Table'[class] = "Biology")
RETURN COUNTX( EXCEPT(ids, ids_bio), 'Table'[student_id])+0
For fun, an M version
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Grouped Rows" = Table.Group(Source, {"intake", "student_id"}, {{"data", each _, type table }}),
AllCombos =Table.ExpandListColumn( Table.AddColumn(#"Grouped Rows", "class", each List.Distinct(Source[class])), "class"),
T1 = Table.ExpandListColumn(Table.AddColumn(Table.FromList(List.Distinct(Source[class]), null,{"class"} ),"student_id", each List.Distinct(Source[student_id])), "student_id"),
#"Merged Queries0" = Table.NestedJoin(T1, {"class", "student_id"}, Source, {"class", "student_id"}, "Table1", JoinKind.LeftOuter),
StudentNo = Table.AddColumn(#"Merged Queries0", "No", each if Table.RowCount([Table1])=0 then 1 else 0),
#"Merged Queries" = Table.NestedJoin(AllCombos, {"student_id", "class"}, StudentNo, {"student_id", "class"}, "Table2", JoinKind.LeftOuter),
#"Expanded Table2" = Table.ExpandTableColumn(#"Merged Queries", "Table2", {"No"}, {"No"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Table2",{"student_id", "data"}),
#"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[class]), "class", "No", List.Sum)
in #"Pivoted Column"
I have a set of prices as data source for given timeseries and I would like to create a calculated field by combining two prices for each date: i.e.,
Price A *5 - Price B.
Data source:
Date Product Price
01.01.2018 A 10
01.01.2018 B 15
02.01.2018 A 20
02.01.2018 B 30
03.01.2018 A 10
03.01.2018 B 30
I don't know how to write the formula correctly for the Calculated field.
What I expect is to build the following table:
Date A B Combined Price (A *5 - B)
01.01.2018 10 15 35
02.01.2018 20 30 70
03.01.2018 10 30 20
Thank you
Answer from Mohfooj can be found in Tableau forum here: https://community.tableau.com/message/900181#900181
Maintained a database called "human" with more than 300,000 results of news, but when I use the date query to get the results when the date is after 2018-01-04 it always shows 0 results.
> db.human.count({"crawled_time":{"$gte":new Date("2018-01-04")}})
0
> db.human.count({"crawled_time":{"$gte":new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{"$gte":new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00.000Z")}})
0
> db.human.count({"version_created":{$gte:new Date("2018-01-04T00:00:00Z")}})
0
> db.human.count({"version_created":{"$gte":ISODate("2018-01-04T00:00:00.0000Z")}})
0
A sample of the database file json looks like this:
{"_id":"21adb21dc225406182f031c8e67699cc","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB30349","audiences":["NP:MNI"],"body":"ISSUER: City of Spencer, IA\nAMOUNT: $1,500,000\nDESCRIPTION: General Obligation Corporate Purpose Bonds, Series 2018\n------------------------------------------------------------------------\nSELLING: Feb 5 TIME: 11:00 AM., EST\nFINANCIAL ADVISOR: PFM Fin Advisors\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:05.000Z","headline":"SEALED BIDS: City of Spencer, IA, $1.5M Ult G.O. On Feb 5","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["A:R","E:T","E:5I","E:S","A:95","A:85","M:1QD","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:IA1","N2:GOS"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:05.000Z","source_id":"WGB30349__180130279kIQIcAh81BiGVmb/Js54Wg3naQC6GXEu9+H","crawled_time":"2018-01-30 14:12:05"}
{"_id":"8ba08c4af9464c6b23cc508645d5bf03","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB3034a","audiences":["NP:MNI"],"body":"ISSUER: City of Long Branch, NJ\nAMOUNT: $31,629,415\nDESCRIPTION: Bond Anticipation Notes, Consisting of $22,629,415 Bond Anticipation Notes, Series 2018A and\n------------------------------------------------------------------------\nSELLING: Feb 1 TIME: 11:30 AM., EST\nFINANCIAL ADVISOR: N.A.\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:06.000Z","headline":"SEALED BIDS: City of Long Branch, NJ, $31.629M Ult G.O. On Feb 1","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["G:6J","A:R","E:T","E:5I","E:S","A:9M","E:U","M:1QD","N2:US","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:NJ1","N2:NT1"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:06.000Z","source_id":"WGB3034a__1801302ksv4Iy0zSP5cscas0FlZgu1TpQ4Zh25VKCtSt","crawled_time":"2018-01-30 14:12:06"}
{"_id":"537f70076ef056c9a43d30c89500353a","_class":"com.pats.reuters.pojo.NewsData","alt_id":"nWGB3034b","audiences":["NP:MNI"],"body":"ISSUER: Independent School District No. 76 of Canadian County (Calumet), OK\nAMOUNT: $1,630,000\nDESCRIPTION: Combined Purpose Bonds of 2018\n------------------------------------------------------------------------\nSELLING: Feb 12 TIME: 05:00 PM., EST\nFINANCIAL ADVISOR: Stephen H. McDonald\n------------------------------------------------------------------------\n ","first_created":"2018-01-30T06:12:07.000Z","headline":"SEALED BIDS: Independent School District No. 76 of Canadian County (Calumet), OK, $1.63M Ult G.O. On Feb 12","instances_of":[],"language":"en","message_type":2,"mime_type":"text/plain","provider":"NS:RTRS","pub_status":"stat:usable","subjects":["A:R","E:T","E:5I","E:S","A:9R","M:1QD","N2:MUNI","N2:PS1","N2:SL1","N2:CM1","N2:OK1"],"takeSequence":1,"urgency":3,"version_created":"2018-01-30T06:12:07.000Z","source_id":"WGB3034b__1801302ev7DqID2Wr/BAJHrC/plpNKBQhrfuHBnlSldz","crawled_time":"2018-01-30 14:12:07"}
Your field value is not a date object. It is a string. So you can't use new Date("2018-01-04"). Use the below query.
db.human.count({"crawled_time":{"$gte":"2018-01-04"}})
db.human.count({"version_created":{"$gte":"2018-01-04"}})
I need to append data to mongodb using spark-dataframe. For example, let's say there are 100k stocks in a portfolio:
Stock A
Jan 2018
Profit: $30k
Stock B
Jan 2018
Profile: -$10k
MongoDB:
_id: ObjectId('XXX1')
stock: Stock A
monthlyProfit: Array
0: Object
Month: Jan 2018
Profit: 30k
_id: ObjectId('XXX2')
stock: Stock B
monthlyProfit: Array
0: Object
Month: Jan 2018
Profit: -10k
If I were to append February profit, how do I add an element to an existing array and push it to mongodb without having a performance issue given same updates need to happen to all 100k documents in a collection?