Is there a way to use the first value of the data within a group?
For example I have the following data:
1 12/1
1 12/5
2 1/2
3 5/6
3 6/6
I would like for the data output to be: [with groups]
Group Header for 1 start date: 12/1
[Data]
Group Header for 2 start date: 1/2
[Data]
Group Header for 3 start date: 5/6
[Data]
You can define a variable, let it reset on each group, set the calculation type to 'first' and the expression to the field you want to display.
Related
I am trying to reorder the Bars in bar chart in qliksense. Below is the order which I have to sort the bars.
NEW
IN_PROGRESS_PL
ASSIGNED
IN_PROGRESS_SPOC
RESOLVED/CLOSED
FBS E2S BACKLOG
CANCELLED
DUPLICATE
But by default the bars are ordered in alphabetical order or by the numbers, I have tried the below solution, but it still didn't rearranged the bars.
Default order :
Match('NEW','IN_PROGRESS_PL','ASSIGNED','IN_PROGRESS_SPOC','RESOLVED/CLOSED','FBS E2S BACKLOG' , 'CANCELLED','DUPLICATE' )
Can someone please help me to fix this issue.
The first parameter in Match is the targeted field.
If we have data like:
RawData:
Load * inline [
Stage , IdeasCount
ASSIGNED , 63
CANCELLED , 11
INTERNAL_123, 2
IN_PROGRESS1, 20
IN_PROGRESS2, 47
];
Then the sorting expression will be:
=match(Stage, 'IN_PROGRESS1', 'ASSIGNED', 'IN_PROGRESS2', 'INTERNAL_123', 'CANCELLED')
The sorting properties:
And the result chart will have the correct sorting:
P.S. (1)
Have to mention that if the value is not found in the Match function then the function will return null and thus the corresponding bar will be pushed in the front. In this case you can wrap the match in if statement and if no matching value is found then assign some large number:
= if( match(Field, 'value1', 'value2' ...) > 0,
match(Field, 'value1', 'value2' ...),
1000
)
P.S. (2)
In your case you can also use Dual function to create "default" sort order for a specific field.
If we change the script to:
RawData:
Load
Dual(Stage, OrderId) as Stage,
IdeasCount
;
Load * inline [
Stage , IdeasCount, OrderId
ASSIGNED , 63 , 2
CANCELLED , 11 , 5
INTERNAL_123, 2 , 4
IN_PROGRESS1, 20 , 1
IN_PROGRESS2, 47 , 3
]
;
The result table will look the same - with two fields only Stage and IdeasCount. But in the background each value of Stage field will have and number representation (based on the initial OrderId field). And as a side effect of that the auto sorting option will sort the field data by its internal number representation and the chart will "sort itself" correctly
I am trying to get a result in my report, which I beleive, requires a where clause and did not work for me with the select expert section.
I have 2 tables. Lets call them table 1 and table 2.
Table 1 contains unique records.
Table 2 contains multiple records for the same uniqueKey as table 1.
there are 3 fields in table 2 that play a roll for each uniqueKey from table 1.
QTY_ORD
QTY_SHIPPED
ITEM_CANCEL
Lets assume for item # 1 from table 1, there are 5 records in table 2. Each record has a values for the 3 above mentioned fields. I need to display the SUM of all the records where ITEM_CANCEL = 0 of QTY_SHIPPED - QTY_ORD.
It could be that 3 of the records have ITEM_CANCEL = 1 (We can ignore these records), but for the other 2 reocrds where ITEM_CANCEL = 0, I need the SUM of QTY_SHIPPED - SUM of QTY_ORD.
the current code I have is as follows"
if {current_order1.ITEM_CANCEL} = 0 then
sum({current_order1.QTY_ORD})-sum({current_order1.QTY_SHIPPED}) else
0
but this result gives me the sum of ALL the records, including the ones where ITEM_CANCEL = 1.
If I use ITEM_CANCEL = 0 in the select expert, then it removes ALL the results that have no value in table 2. I even tried the code without using the SUM function, but this provided the result of only 1 of the records in table 2 where ITEM_CANCEL = 0, and not the total difference of the 2 records in table 2 that I require.
Any suggestions on this?
Start with a detail-level formuls (no SUM):
if {current_order1.ITEM_CANCEL} = 0 then {current_order1.QTY_ORD} - {current_order1.QTY_SHIPPED} ELSE 0
Then, SUM that formula at whatever Group or Report levels you require.
Say I have a dataframe like so:
ID Media
1 imgix.com/20830dk
2 imgix.com/202398pwe
3 imgix.com/lvw0923dk
4 imgix.com/082kldcm
4 imgix.com/lks032m
4 imgix.com/903248
I'd like to end up with:
ID Media
1 imgix.com/20830dk
2 imgix.com/202398pwe
3 imgix.com/lvw0923dk
4 imgix.com/082kldcm
Even though that causes me to lose 2 links for ID = 4, I don't care. Is there a simple way to do this in python/pyspark?
Group by on col('ID')
Use collect_list with agg to aggregate the list
Call getItem(0) to extract first element from the aggregated list
df.groupBy('ID').agg(collect_list('Media').getItem(0).alias('Media')).show()
Anton and pault are correct:
df.drop_duplicates(subset=['ID'])
does indeed work
I have a dataframe like this :
user_id items
1 item1
1 item2
1 item3
2 item1
2 item5
3 item4
3 item2
If I put user_id as row and items as columns, I get this :
user_id number_of_items
1 3
2 2
3 2
Now I would like to group this result again, like this :
number_of_user_id number_of_items
1 3
2 2
How can I do this, as a calculated field or in a graph(maybe an histogramm?)
First create the following calculated field, called users_per_item
{ fixed items : countd(user_id) }
Then highlight the new measure you just created in the data pane, users_per_item and right click to create Bins. Set the bin size to 1 or whatever value you like. That will create a dimension called users_per_item (bin)
Finally, you can now use the bin field to create the view you want, say place users_per_item (bin) on the columns shelf and CNTD(items) on the rows shelf.
A natural use for LOD calculations for a 2 stage analysis.
I have such problem: i need to remove rows that have in the column A unique values from dataframe
In example below of the DF1 row 0 and 3 should be removed
A B C
0 5 100 5
1 1 200 5
2 1 150 4
3 3 500 5
The one solution that I thought till now it is:
groupby(A)
count rows in each group
filter out counts > 1
save result into DF2
DF1.intersect(DF2)
any other ideas? solution for RDD also can help, but better for DataFrame
Thanks!
A more condensed syntax (but following same approach):
df=sqlContext.createDataFrame([[5,100,5],[1,200,5],[1,150,4],[3,500,5]],['A','B','C'])
df.registerTempTable('df') # Making SQL queries possible
df_t=sqlContext.sql('select A,count(B) from df group by A having count(B)=1') # step 1 to 4 in 1 statement
df2=df.join(df_t,df.A==df_t.A,'leftsemi') # only keep records that have a matching key
Some people refer to the 'leftsemi' as 'left keep'. It keeps records of dataframe 1 if the key also exists in df_t.