How to Place a variable in pyspark groupby agg query - pyspark

Hi I have query in which i want to place the variable data into the group by query
i Tried like this but it not working
dd2=(dd1.groupBy("hours").agg({'%s':'%s'})%(columnname1,input1))
In the columnname1 contain 'total' and input1 contain what kind of aggregation is required like mean or stddev.
i want this query to be dynamic.

Try this,
dd2=(dd1.groupBy("hours").agg({'{}'.format(columnname1):'{}'.format(input1)}))

Related

InfluxDB Flux query how to get distinct values from query

I have written the following Flux query in Grafana, I get two results for value. I would like to filter those two by distinct values by scenario key.
I would expect to have "main_flow" and "persons_end_user" results at the end. How can I achive this, I have tried with distinct() and unique(), but does not seem to work.

multiple aggregations on same column using agg in pyspark

I am not able to get multiple metrics using agg as below.
table.select("date_time")\
.withColumn("date",to_timestamp("date_time"))\
.agg({'date_time':'max', 'date_time':'min'}).show()
I see that second aggregation overwriting first aggregation,
can someone help me to get multiple aggregations on same column?
I can't replicate and make sure that it works but I would suggest instead of using a dict for your aggregations try it like this:
table.select("date_time")\
.withColumn("date",to_timestamp("date_time"))\
.agg(min('date_time'), max('date_time')).show()

How to retrieve all values from a column in mongoDB

I am wondering how to retrieve all values stored in a column in mongoDB, and put them in a list. find() only get all the fields, and specify <field>: <value> isn't an option as well.
If you are expecting to retrieve unique values in a column then distinct should work for you. Following is the syntax :
db.yourcollection.distinct("yourfield");
Learn more about distinct here

can we use find query inside map in mongo

I need to perform some aggregation on one existing table and then use aggregated table to perform the map reduce.
The aggregation table is sort of a temporary used so that it can be used in map reduce. Record set in temporary table reaches around 8M.
What can be the way to avoid the temporary table?
One way could be to write find() query inside map() function and emit the aggregated result(initially being stored on aggregation table).
However, I am not able to implement this.
Is there a way! Please help.
You can use the "query" parameter on MongoDB MapReduce. With this parameter the data sent to map function is filtered before processing.
More info on MapReduce documentation

Is there a way to fetch max and min values in Sphinx?

Im using sphinx for document search. Each document has list of integer parameters, like "length", "publication date (unix)", popularity, ... .
The search process itself works fine. But is there a way to get a maximum and minimum fields values for a specified search query?
The main purpose is to generate a search form which will contain filter fields so user can select document`s length.
Or maybe there is another way to solve this problem?
It is possible if length, date etc are defined as attributes.
http://www.sphinxsearch.com/docs/current.html#attributes
Attributes are additional values
associated with each document that can
be used to perform additional
filtering and sorting during search.
Try GroupBy function by 'length' and select mix(length), max(lenght).
In SphinxQl it is like:
select mix(length), max(lenght) from index_123 group by length
The same for other attributes.