sometimes it appears that one have to calculate SUM of unique TAG values in InfluxDB. How to do it?
For example I have multiple users who downloads software. Now I want to extract how many unique users downloaded it.
Following query was tested in Grafana to calculate unique users and also consider time filter applied to the database.
To do this we need to apply subquery first to calculate mean values, this basically will result a table with value 1 associated for each user:
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
Here count is an integer value that is equal to 1 for each time user downloads the software.
After we can calculate sum of these mean values, yes, this is not cheap if you have a huge database, but still is a working solution:
SELECT SUM(mean) FROM (
SELECT mean("count") FROM "autogen"."downloads" WHERE $timeFilter GROUP BY "username"
)
Please go ahead and propose best performing/more native solution, this will be nice to apply for larger DBs
Related
I have volume data for specific customers. The customer names come from salesforce and the volume comes from another table. When I add each in tableau, i get a nice table that seems to be working.
We can see that there are 19 values ~500 My ultimate goal is to sum these based upon filters.
A way i discovered that i can do that is to use the syntax
{ FIXED [Account Id]: count([Volume]) }
But when i do that,
I get
When I change my function to count([volume]) i get a count of all joined rows ~250k
My question is how do i make this respect indivudal entries in the database and not all the joined values? If there was a way to do the sum for distinct timestamps in another field this would also work? Any other advice would be helpful from you tableau experts.
Thanks!
I think i got it. In the table of the database that i was trying to calculate there were 20 rows that needed to be calculated. When the data was joined in SF, it duplicated the rows. The trick here was to do the sum of the max for each primary key
SUM({ FIXED [Pk], [Name1] : MAX([Volume]) })
I'm trying to create a Measure in Power BI using DAX that achieves the below.
The data set has four columns, Name, Month, Country and Value. I have duplicates so first I need to dedupe across all four columns, then group by Month and sum up the value. And then, I need to average across the Month to arrive at a single value. How would I achieve this in DAX?
I figured it out. Reply by #OscarLar was very close but nested SUMMARIZE causes problems because it cannot aggregate values calculated dynamically within the query itself (https://www.sqlbi.com/articles/nested-grouping-using-groupby-vs-summarize/).
I kept the inner SUMMARIZE from #OscarLar's answer changed the outer SUMMARIZE with a GROUPBY. Here's the code that worked.
AVERAGEX(GROUPBY(SUMMARIZE(Data, Data[Name], Data[Month], Data[Country], Data[Value]), Data[Month], "Month_Value", sumx(CURRENTGROUP(), Data[Value])), [Month_Value])
Not sure I completeley understood the question since you didn't provide example data or some DAX code you've already tried. Please do so next time.
I'm assuming parts of this can not (for reasons) be done using power query so that you have to use DAX. Then I think this will do what you described.
Create a temporary data table called Data_reduced in which duplicate rows have been removed.
Data_reduced =
SUMMARIZE(
'Data';
[Name];
[Month];
[Country];
[Value]
)
Then create the averaging measure like this
AveragePerMonth =
AVERAGEX(
SUMMARIZE(
'Data_reduced';
'Data_reduced'[Month];
"Sum_month"; SUM('Data_reduced'[Value])
);
[Sum_month]
)
Where Data is the name of the table.
Thanks in advance for any advice you can offer! I'm building a Tableau dashboard to explore housing affordability and school quality in different neighborhoods in my area. A user will select their occupation and see a graph of neighborhoods plotted based on school quality and housing affordability. To explore housing affordability, I'm using county level assessor data with the valuation of every property matched to neighborhoods.
The goal is to display the percentage of homes in an area that are affordable given the median occupational wages for the job a user selected. Right now, I'm trying to use a calculated field with COUNT([Parcels]<[Occupation])/COUNT([Parcels]), but I need to find a way to count the number of properties in each specific neighborhood below the cut off value.
Does anyone know of a way to count elements of a particular group in this way in Tableau?
I'm on a Mac, using Tableau Desktop, and doing the back end analysis work in R. Thank you!
You seem to misunderstand what the function COUNT() does. You are certainly not alone. Count() behaves in Tableau almost identically to how it does with SQL.
Count([some field]) returns the number of data rows where the value for [some field] is not null. It does not not return the number of rows where [some field] evaluates to true, or a positive number, or anything else.
If [some field] always has a non-null value, then Count([some field]) is the same as SUM([Number of Records]). If [some field] is always null, then Count([some field]) is zero. Count() is not like Excel's CountIf function.
If you want to count data rows that meet a condition, you could try COUNT(if [condition] then 1 end) Since the missing ELSE case defaults to null values, that expression will count rows where [condition] is true.
So one way to get the percentage of affordable homes is count(if [affordable] then 1 end) / count(1) assumes each Data row represents a home. Then format your field to display as a percentage. Another option is to learn to use quick table calcs
If you want to display the number of rows in a given visualized table you could also use SIZE()
Source, official docs:
https://help.tableau.com/current/pro/desktop/en-us/functions_functions_tablecalculation.htm#size
How can I create a DAX query that retrieves rows from a given range in that order. Let's say I want the rows from row 1000 to row 2000. There is no unique id in my database. Should I add one, or is it possible without it?
If you can't distinguish a filter to create the subset of rows you are targeting then I would use a unique ID. I have not come across anything in DAX that allows to select rows in your powerpivot data set. If there isn't anything unique about the data you are targeting then I imagine you would need a unique ID.
i.e. I normally have column values I can filter with to target or create the subset of data I want to use.
I hope I am wrong and there is a way and look forward to someone posting a way.
I'm trying to get max drawdown from a partitioned table across multiple dates. The query works fine when run with a date constrained to a specific day. E.g.
select {max neg x-maxs x} pnl from trades where date=last date
It's getting map-reduced over multiple dates so the above query no longer works. I can make the query run over multiple dates by adding another aggregation:
select max {max neg x-maxs x} pnl from trades
but it's not getting the max drawdown from continuous sequence of trades but a maximum of daily drawdowns.
I wonder if there's a way to make it work with a single select without chaining selects like
select {max neg x-maxs x} pnl from select pnl from trades
I've got a rather big query to pull a lot of various metrics on the trades where max drawdown is just one of them. Using chained select means that I need to break the big query into two queries, map-reduced and non-map-reduced, and then join them back which would make the query look ugly.
Thanks!
Select query runs on each date in partition db and apply function to each date values and finally aggregates them depending upon the call (user defined function behaves differently than plain 'q' functions).
So I don't think you can combine that into one query. But there are ways you can look for to make your query more generalized and reusable for different scenarios.
For ex. convert your query to functional form and use variables in that query for column name and user function. Put this in one function which will accept column name and user function. Now you can call this function with different set of (column ;function). Something like :
runF:{[col;usrfunc] funtional_query_uses_col_userfunc }
All this depends on your use cases. Also check for memory usage as you'll be taking lot of data into memory.