Filtering Options on PySpark - pyspark
I have a dataframe and I am trying to find the countries with the highest covid cases. I thought this would be an easy function but my code does not seem to work (Please see screenshot attached).How can I find the top 3 countries with the highest number of cases?Error Screenshot
you can try:
df_covid_3.orderBy(col("total_count").desc()).head(3)
hope it helps
Related
Tableau finding and marking duplicate rows
I have used the level of detail expression outlined in another post in which I'm trying to mark duplicate records as matched. I'm using a level of detail calculation that seems to work sometimes but not al the time. As you can see in my results sometimes it shows matched even though there is only on Server and Application name combination. Here is the calculations formula that I'm using as well as a view of my results. Any insight would be much appreciated. Thanks in advance. This is the matched column calculation... '''if {fixed str([Servers with Token Deployed])+str([App Name]): count([Number of Records])}>1 then 'Yes' else 'No' End'''
How to get the sum directly with one number?
I'm a beginner for tableau. I want to get the direct numbers for each row, but i get the number which are separate, how can i achieve this? I've tried the sentence like:count("Implemented"), but I don't get the result I want. For example, for the 1st row I want 3 10 10 not 111 10 112111111 Here is worksheet. My code: EDIT : here is the photo for implementation opportunities As you can see, the status is related to the date, I think maybe it causes the records which are counted 1by1. Now the situation is that: i create the code which is related to the date, if i remove this from mark, it will cause the problem (the code is related to the date), but if i leave it, the system will always count it one by one. My code is not perfect but i can't find another one which can replace it..... EDIT 2: in short,what i want is the sum of the remaining opportunity:10 capture
Remove DAY from Mark shelf. That detail is producing those separations.
Attaching a workbook with numbers similar to (but not exact due to proprietary issues) is almost always advised. You will get the right answer a lot sooner than just screenshots. In any case, it seems as if the measure portion of the visualization is properly being summed by the date. Try selecting the measure, and manually selecting "sum" from the menu drop down. Here is a link for more detail. Secondly, you can play around with table calculations. Click this link and read up on option 3.
How to display one result each for max, min, avg using Aggregation in MongoDB with one query
So here are some questions from my homework that I'm struggling to understand: Find out the average, maximum and minimum h and er per team_id. Find out the average, maximum and minimum h and er per team_id and year. Both are very similar however the second question also asks for the year. Im not sure how exactly to set up my query to solve these questions using aggregations. If anyone can show me a solution to one question that would greatly help as I learn from examples rather than working around it. The data I use has the following structure; the first line being the header: player_id,year,stint,team_id,league_id,w,l,g,gs,cg,sho,sv,ipouts,h,er,hr,bb,so,baopp,era,ibb,wp,hbp,bk,bfp,gf,r,sh,sf,g_idp bechtge01,1871,1,PH1,,1,2,3,3,2,0,0,78.0,43,23,0,11,1,,7.96,,,,0,,,42,,, brainas01,1871,1,WS3,,12,15,30,30,30,0,0,792.0,361,132,4,37,13,,4.5,,,,0,,,292,,, fergubo01,1871,1,NY2,,0,0,1,0,0,0,0,3.0,8,3,0,0,0,,27.0,,,,0,,,9,,, fishech01,1871,1,RC1,,4,16,24,24,22,1,0,639.0,295,103,3,31,15,,4.35,,,,0,,,257,,, fleetfr01,1871,1,NY2,,0,1,1,1,1,0,0,27.0,20,10,0,3,0,,10.0,,,,0,,,21,,, flowedi01,1871,1,TRO,,0,0,1,0,0,0,0,3.0,1,0,0,0,0,,0.0,,,,0,,,0,,, mackde01,1871,1,RC1,,0,1,3,1,1,0,0,39.0,20,5,0,3,1,,3.46,,,,0,,,30,,, mathebo01,1871,1,FW1,,6,11,19,19,19,1,0,507.0,261,97,5,21,17,,5.17,,,,2,,,243,,, mcbridi01,1871,1,PH1,,18,5,25,25,25,0,0,666.0,285,113,3,40,15,,4.58,,,,0,,,223,,, mcmuljo01,1871,1,TRO,,12,15,29,29,28,0,0,747.0,430,153,4,75,12,,5.53,,,,0,,,362,,, The complete file can be found here. Thanks to anyone that reads and helps!
Drop down option for a graph in dashboard of grafana
Attached image shows the graph in dashboard which has two graph overlapped. I want to put a drop down option to it so that I can select one at a time. Sometimes I have more than 10 samples or graph in one so I would like to go with this approach or something similar option. I tried the default option we have at the bottom of the graph, but if I have n number of graphs in one that wont help me. I couldn't understand how to get this done. Can anyone help me with this.
I think you are describing the templating function. Take a look at this link: link Like you can see in the screenshot of one of my dashboards. With the help of templating I can select between different query's to show in the graph: In your case the query would look something like: SELECT count("responsecode") FROM "samples" WHERE "label" = /^$YOUR_TEMPLATE_NAME$/ AND ...
how to hide overlay help datapoints outside range in grafana2
I have been using grafana2 this days and it is a wonderful tool for visualization, thanks for the creator, and I wonder if I can remove the guide/help in graph "Datapoints outside time range" query for querying all series, base from the examples. I'm not pro in influxdb yet, so my query could be wrong. select temperature from /.*/ group by time(7d)
I had a similar issue which was caused by incorrect query: https://github.com/grafana/grafana/issues/1913