Filtering Options on PySpark

Filtering Options on PySpark - pyspark

I have a dataframe and I am trying to find the countries with the highest covid cases. I thought this would be an easy function but my code does not seem to work (Please see screenshot attached).How can I find the top 3 countries with the highest number of cases?Error Screenshot

you can try:
df_covid_3.orderBy(col("total_count").desc()).head(3)
hope it helps

Related

Tableau finding and marking duplicate rows

I have used the level of detail expression outlined in another post in which I'm trying to mark duplicate records as matched. I'm using a level of detail calculation that seems to work sometimes but not al the time. As you can see in my results sometimes it shows matched even though there is only on Server and Application name combination. Here is the calculations formula that I'm using as well as a view of my results. Any insight would be much appreciated.
Thanks in advance.
This is the matched column calculation...
'''if {fixed str([Servers with Token Deployed])+str([App Name]): count([Number of Records])}>1 then 'Yes' else 'No' End'''

How to get the sum directly with one number?

I'm a beginner for tableau. I want to get the direct numbers for each row, but i get the number which are separate, how can i achieve this?
I've tried the sentence like:count("Implemented"), but I don't get the result I want.
For example, for the 1st row I want 3 10 10
not 111 10 112111111
Here is worksheet.
My code:
EDIT :
here is the photo for implementation opportunities
As you can see, the status is related to the date, I think maybe it causes the records which are counted 1by1.
Now the situation is that: i create the code which is related to the date, if i remove this from mark, it will cause the problem (the code is related to the date), but if i leave it, the system will always count it one by one. My code is not perfect but i can't find another one which can replace it.....
EDIT 2:
in short,what i want is the sum of the remaining opportunity:10
capture

Remove DAY from Mark shelf. That detail is producing those separations.

Attaching a workbook with numbers similar to (but not exact due to proprietary issues) is almost always advised. You will get the right answer a lot sooner than just screenshots.
In any case, it seems as if the measure portion of the visualization is properly being summed by the date. Try selecting the measure, and manually selecting "sum" from the menu drop down. Here is a link for more detail.
Secondly, you can play around with table calculations. Click this link and read up on option 3.

How to display one result each for max, min, avg using Aggregation in MongoDB with one query

So here are some questions from my homework that I'm struggling to understand:
Find out the average, maximum and minimum h and er per team_id.
Find out the average, maximum and minimum h and er per team_id and year.
Both are very similar however the second question also asks for the year. Im not sure how exactly to set up my query to solve these questions using aggregations. If anyone can show me a solution to one question that would greatly help as I learn from examples rather than working around it.
The data I use has the following structure; the first line being the header:
player_id,year,stint,team_id,league_id,w,l,g,gs,cg,sho,sv,ipouts,h,er,hr,bb,so,baopp,era,ibb,wp,hbp,bk,bfp,gf,r,sh,sf,g_idp
bechtge01,1871,1,PH1,,1,2,3,3,2,0,0,78.0,43,23,0,11,1,,7.96,,,,0,,,42,,,
brainas01,1871,1,WS3,,12,15,30,30,30,0,0,792.0,361,132,4,37,13,,4.5,,,,0,,,292,,,
fergubo01,1871,1,NY2,,0,0,1,0,0,0,0,3.0,8,3,0,0,0,,27.0,,,,0,,,9,,,
fishech01,1871,1,RC1,,4,16,24,24,22,1,0,639.0,295,103,3,31,15,,4.35,,,,0,,,257,,,
fleetfr01,1871,1,NY2,,0,1,1,1,1,0,0,27.0,20,10,0,3,0,,10.0,,,,0,,,21,,,
flowedi01,1871,1,TRO,,0,0,1,0,0,0,0,3.0,1,0,0,0,0,,0.0,,,,0,,,0,,,
mackde01,1871,1,RC1,,0,1,3,1,1,0,0,39.0,20,5,0,3,1,,3.46,,,,0,,,30,,,
mathebo01,1871,1,FW1,,6,11,19,19,19,1,0,507.0,261,97,5,21,17,,5.17,,,,2,,,243,,,
mcbridi01,1871,1,PH1,,18,5,25,25,25,0,0,666.0,285,113,3,40,15,,4.58,,,,0,,,223,,,
mcmuljo01,1871,1,TRO,,12,15,29,29,28,0,0,747.0,430,153,4,75,12,,5.53,,,,0,,,362,,,
The complete file can be found here.
Thanks to anyone that reads and helps!

Drop down option for a graph in dashboard of grafana

Attached image shows the graph in dashboard which has two graph overlapped.
I want to put a drop down option to it so that I can select one at a time.
Sometimes I have more than 10 samples or graph in one so I would like to go with this approach or something similar option.
I tried the default option we have at the bottom of the graph, but if I have n number of graphs in one that wont help me. I couldn't understand how to get this done.
Can anyone help me with this.

I think you are describing the templating function. Take a look at this link: link
Like you can see in the screenshot of one of my dashboards. With the help of templating I can select between different query's to show in the graph:
In your case the query would look something like:
SELECT count("responsecode") FROM "samples" WHERE "label" = /^$YOUR_TEMPLATE_NAME$/ AND ...

how to hide overlay help datapoints outside range in grafana2

I have been using grafana2 this days and it is a wonderful tool for visualization, thanks for the creator, and I wonder if I can remove the guide/help in graph "Datapoints outside time range"
query for querying all series, base from the examples. I'm not pro in influxdb yet, so my query could be wrong.
select temperature from /.*/ group by time(7d)

I had a similar issue which was caused by incorrect query:
https://github.com/grafana/grafana/issues/1913

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Filtering Options on PySpark - pyspark

I have a dataframe and I am trying to find the countries with the highest covid cases. I thought this would be an easy function but my code does not seem to work (Please see screenshot attached).How can I find the top 3 countries with the highest number of cases?Error Screenshot

you can try: df_covid_3.orderBy(col("total_count").desc()).head(3) hope it helps

Related

Tableau finding and marking duplicate rows

How to get the sum directly with one number?

How to display one result each for max, min, avg using Aggregation in MongoDB with one query

Drop down option for a graph in dashboard of grafana

how to hide overlay help datapoints outside range in grafana2

Categories

Resources