How to bypass level of detail table calculation limitation - tableau-api

I want to use LOD to get how many customers are spending over $1k and flag all the activity for each customer after they hit that threshold. I am using a table calculation to get a running total of their cumulative spend: RUNNING_SUM(SUM(SPEND))
Ideally I would like to do something like this: {FIXED year(date), cust_id: IF RUNNING_SUM(SUM(SPEND)) >1000 then 1 end}
And then add that calculated field as a filter, however Tableau does not support using LOD with table calculations. Are there any good workarounds for this?

Quite interesting topic, but I'm not sure if I got it straight.
With the following solution, you can see just the transactions after 1K for each user.
Is this what you want?
I'm asking beacuase if you need to count (transactions or users) I think there's no way due to the problem you've already pointed out.
Otherwise, if you "just" need to track the activity starting from the ones triggering the 1k limit, this could be a solution.

Related

Why is the Postgres "mode" function so different from "avg", "max", and other aggregates?

In Postgres, I can say select avg(size) from images and select max(size) from images.
But when I want the mode, I may not do this:
select mode(uploaded_by_id) from images
Instead I must do this:
select mode() within group (order by uploaded_by_id desc) from images
The syntax seems a little funky to me. Does anyone know why the other syntax was not permitted?
NOTE: I know that allowing order by enables the user to define which mode to take in the case of a tie, but I don't see why that needs to prohibit the other syntax entirely.
Thanks!
There is no "machine formula" for computing the mode the way there are for those other things. For the min or max, you just track of the min or max seen so far. For average, you can just keep track of the sum and count seen so far, for example. With the mode, you need to have all the data at your fingertips.
Using an ordered-set aggregate provides for such a use case automatically, including spooling the data to temp files on disk as it becomes large.
You could instead write code to aggregate the data into memory and then process it from there (as the other answer references), but this would become slow and prone to crashing as the amount of memory needed starts to exceed the amount available.
After looking at the documentation it appears as though they moved away from a simple function in favour of the window function, theyre citing speed advantages as a reason for this.
https://wiki.postgresql.org/wiki/Aggregate_Mode
If you wanted to you could just create a function yourself but it seems as though the window function is the fastest way to get a NOT NULL result back from the db.

What is the "Tableau" way to deal with changing data?

As a background to this question: I've been using Tableau for some time now, but I've been using code (Python, Swift, etc) as a crutch for getting some of the more complicated things done. My employer is now making me move what I can away from custom code and into retail software packages because it will make things easier to maintain if I get hit by a bus or something.
The scenario: With code, I find it very easy to deal with constantly changing/growing data by using recursion. I know that this isn't something I can do with Tableau, but I've found that for many problems so far there is a "Tableau way" of thinking/doing that can solve a lot of problems. And, I'm not allowed to use Rserve/TabPy.
I have a batch of transactional data that grows every month by about 1.6mil records. What I would like to do is build something in Tableau that can let me track a complicated rolling total across the data without having to do it manually. In my code of choice, it would have been something like:
Import the data into a frame
For every unique date value in the 'transaction date' field, create a new column with that name
Total the number of transaction in each account for that day
Write the data to the applicable column
Move on to the next day
Then create new columns that store the sum total of transactions for that account over all of the 30 day periods available (date through date + 29 days)
Select the max value of the accounts for a customer for those 30-day sums
Dump all of that 30-day data into a new table based on the customer identifier
It's a lot of steps, but with a couple of nice recursive functions, it's done in a snap with a bit of code. Plus, it can handle the data as it changes.
The actual question: How in the world do I approach problems like this within Tableau since my brain goes straight to recursive function land? I can do this manually with Tableau Prep, but it takes manual tweaking every time the data changes. Is there a better way, or is this just not within the realm of what Tableau really does?
*** Edit 10/1/2020: Minor typo fix. ***

prometheus aggregate table data from offset; ie pull historical data from 2 weeks ago to present

so i am constructing a table within grafana with prometheus as a data source. right now, my queries are set to instant, and thus it's showing scrape data from the instant that the query is made (in my case, shows data from the past 2 days)
however, i want to see data from the past 14 days. i know that you can adjust time shift in grafana as well as use the offset <timerange> command to shift the moment when the query is run, however these only adjust query execution points.
using a range vector such as go_info[10h] does indeed go back that range, however the scrapes are done in 15s intervals and as such produce duplicate data in addition to producing query results for a query done in that instant
(and not an offset timepoint), which I don't want
I am wondering if there's a way to gather data from two weeks ago until today, essentially aggregating data from multiple offset time points.
i've tried writing multiple queries on my table to perform this,
e.g:
go_info offset 2d
go_info offset 3d
and so on..
however this doesn't seem very efficient and the values from each query end up in different columns (a problem i could probably alleviate with an alteration to the query, however that doesn't solve the issue of complexity in queries)
is there a more efficient, simpler way to do this? i understand that the latest version of Prometheus offers subquerys as a feature, but i am currently not able to upgrade Prometheus (at least in a simple manner with the way it's currently set up) and am also not sure it would solve my problem. if it is indeed the answer to my question, it'll be worth the upgrade. i just haven't had the environment to test it out
thanks for any help anyone can provide :)
figured it out;
it's not pretty but i had to use offset <#>d for each query in a single metric.
e.g.:
something_metric offset 1d
something_metric offset 2d

How to select column which are needed to create report in tableau

I have tableau desktop. I am creating a report using 5 tables out of 5 table 2 tables are big. These tables are joined and applied filter. extract creation taking a long time (6-7 hours and still running). big tables have 100+ columns, I use only 12 columns to build my report.
Now, there is an option to use custom SQL which take less time for creating extract but then I cannot use tableau to its full potential.
any suggestion is welcome. I am looking for the name of the column I can choose for creating the extract.
Follow Process:
Make database connection
Join tables
Go to sheet and take required fields needed in report then right click on connection and create a extract then don't forget to click Hide unused fields and then apply required filtering and create a extract
This process should show you only required fields out of all fields.
Especially for very large extracts, you can also consider the option to aggregate to visible dimensions when making an extract. That can dramatically reduce the size of the extract and time to create and access it. But that option requires care to be sure you use the faster extract in a way that still gets accurate results. There are assumptions built in to that feature.
An extract is really a cached query result. If you perform aggregation when creating the extract, you can compute totals, mins, max, avg etc during extract creation, and then simply display the aggregate values in Tableau. This can save a lot of time. Of course, you can’t then further drill down past the level of detail in the extract in that case.
More importantly, if you perform further aggregation in Tableau, you have to be careful that the double aggregation gives the result you intend. Some functions are always safe — sums of sums, mins of mins, maxes of maxes always give the same answer as if you only did one large aggregation operation. These are called additive operations. Other combinations may or may not give the result you intend, averages of averages, and definitely countd of countd can be unexpected - although sometimes repeated aggregation can be well defined - averages of daily sums can make sense for example.
So performing aggregation during extract creation can lead to huge performance gains at visualization time - you effectively precompute much or all of the information you need to display. You just have to understand how it works and use accordingly. Experiment.
By the way, that feature uses the default aggregation defined for each measure in the data source. Usually SUM(). You can change that in the data pane.

Speedup database on webpage viewcount

How to optimize the viewcount calculation on mongoDB?
We have an huge number of almost static pages apart from the viewcount. We've tried to calculate it from log without triggering DB operation when users are viewing the webpage, and process the log during easy hours. Is a more elegant way to optimize this viewcount calculation?
You could use Google Analytics or something similar to do it for you. Plus you'd get a whole lot of other useful metrics.