How to create continuous views for leaderboards? - pipelinedb

I have a set of events coming in with the structure player_id, score, timestamp. I want to create cycle based leaderboards on this so that I can see the players daily, weekly, monthly and yearly leaderboards. What kind of aggregations should I use. Could I use ordered set aggregates with rank?. And is it possible to also see/store the past/historical leaderboards so that I could also see last months leaderboards?

You can use fss_agg_weighted to build filtered space saving top-ks, and then extract the top-k player scores by calling fss_topk on the column built by fss_agg_weighted. For example, to continuously compute the daily top 10 player scores:
CREATE CONTINUOUS VIEW daily_top_scores AS
SELECT day(timestamp), fss_agg_weighted(player_id, 10, score) GROUP BY day;
And to extract the top-10 at a given point in time,
SELECT day, fss_topk(fss_agg_weighted) FROM daily_top_scores;
You can also combine the top-k results over wider date ranges without losing any information. To compute the top-10 scores over the entire history of the continuous view:
SELECT fss_topk(combine(fss_agg_weighted)) FROM daily top_scores;

Related

Cumulative sum with TimescaleDB continous aggregate view

Let's say I have a continuous aggregate view that tracks the warehouse inventory change daily. The example below is not real, but I tried to simplify it for the purpose of the question.
CREATE MATERIALIZED VIEW inventory_daily
WITH (timescaledb.continuous) AS
SELECT item,
time_bucket(INTERVAL '1 day', 'time') AS bucket,
SUM(item_delta) as daily_change
FROM conditions
GROUP BY item, bucket;
This gives you intraday inventory changes nicely. But what would be the best, or the most CPU efficient way to the get cumulative sum of all inventory changes over the whole lifespan of items? If you sum all changes together, you should have the count how many items there are left in the inventory, for double accounting, for each day.
Can the cumulative sum done in continous aggregated view or is there a better way to do a breakdown of inventory totals, assuming you have just the change as an input? What I hope to accomplish is:
-- Don't know how to do CUMULATIVE SUM
CREATE MATERIALIZED VIEW inventory_daily
WITH (timescaledb.continuous) AS
SELECT item,
time_bucket(INTERVAL '1 day', time) AS bucket,
SUM(item_delta) as daily_change,
CUMULATIVE_SUM(item_total) as total_at_the_end_day,
FROM conditions
GROUP BY item, bucket;
If this cannot be done in an aggregated view (as it looks like based on the comment) what would be the next best option? Manually calculate values for each day?

Aggregate on day, month and week level

I am making a dashboard with clicks on a daily level and a month level on a certain campaign.
If I have say 1 customer clicking on 2 days then at a daily level that customer is counted twice. However, when I look at the aggregate monthly level this person will be counted once.
My SQL code that I am pulling into tableau is at a daily level. How do I get a monthly level view in the dashboard? When I am creating a parameter with month and day..on selecting month it is just adding the day level numbers to give me the month.
Any advice?
Sounds like a count distinct thing. To get around this the COUNTD would need to happen in Tableau. That would mean you need the Contact ID (or whatever it is you want to count) within the data source. Obviously that would mean your data source is much bigger but is the only way to get an accurate unique count over a custom time period.
Another alternative is to restrict the available time periods for the user and pre-aggregate for those time periods.

Calculate a running total that works with relative date filters

I have a Union table of my various bank accounts to create a personal finance analysis dashboard.
I am trying to make a Running Total to show my total capital available at any given date. Using a Running Total table calculation works, just as much as using a RUNNING_SUM() calculated field. They both work up until I filter the dates. So I am trying to find a way to make the running calculation work without being thrown off by Date Filters (I would like to implement relative dates for visualisation in the dashboard).
My union table has the following relevant data columns:
Order ID: Descending number from 1 for each entry per account.
Date: Date of entry.
Item: Entry name.
Account: Name of bank account.
Amount: +ive for credit or -ive for debit.
Balance: balance after entry value for each given account.
So the table can look like this:
So on 07/05/2019 the Running total should be 229.64.
The running sum formula mentioned above is currently RUNNING_SUM(SUM([Amount])), so if any dates are excluded via filter the running total doesn't add up to the right amount.
A way I can see around the problem could be to get the sum over all accounts of the last balance reading at a given date. The balance is a running total but only if the final entry per time period for all accounts are summed would it work. Would it be possible to make a calculated field that gets the last balance reading for each account at any given date and then sums them?
Or is there a simpler smarter way I am not aware of?
This comes down to an Order of Operations problem. Once you filter the dates the viz doesn't have access to the data anymore.
Your best approach would be to add the running sum to the data source before you bring it into Tableau. Then the running sum isn't a calculated field dependent on the data in the Viz.

Combining two separate date fields to one in Tableau

I’m trying to combine two separate date fields into one so that I can calculate a defect rate between the two.
I have two date fields:
1. EndDate
2. FundingDate
The EndDate field is used to capture the # of units for a particular ‘project’ for a particular month. The FundingDate is used to capture the total # of volume generate for a particular month.
If I create a worksheet using just EndDate and filter to a ‘project’ I’m interested in and COUNTD the # of units, those figures turn out to be accurate for their respective months.
Same goes for the FundingDate, separate worksheet, COUNTD the # of units, figures are accurate for their respective month.
If I try to view the COUNTD of units from a project using the FundingDate, the #’s are off. Same goes for the total volume if I use EndDate trying to find the total volume.
How do I create a Date Dimension that both can pull off of that reflect the correct COUNTD?
this was resolved by creating an external .TXT file with a Master Date list. I then duplicated the original data source and did a cross data base join to the two date fields to the Master Date
https://onlinehelp.tableau.com/current/pro/desktop/en-us/multipleconnections_troubleshooting.html

Tableau Filter on field which contains MAX of another field

I have a table in Tableau that contains football teams, their top goal scorers and the number of goals these players scored. I would like to filter the table to show the team which has the player who has scored the most goals.
For example, if my table has Team A and Team B, and Team B has the player which has scored the most goals out of every player (in all the teams), then I would like the filter to include only Team B (but show every player in Team B).
This is a good use case for a top filter.
Place Team on the filter shelf.
When defining the filter, choose the Top tab.
Select By field, Top 1, Number of Goals and Max
This tells Tableau to determine the maximum value for the [Number of Goals] field for each Team, and then filter to only include the Team with the top value.
(Note, this approach assumes that there is a single data row per player showing the total number of goals that player achieved. If your data is structured differently, say one data row per player per game, then you might need to revise the approach slightly, perhaps using an LOD calc too)
In SQL, this typically leads to a HAVING clause.
The only downside is if two teams tie for the top position, I believe you will only see one of them in that case.
If that case is important to you, you can get a similar effect using a table calc to rank teams by their max [Number of Goals], setting the tie breaking rule of your choice for the quick table calc, and then using that calc on the filter shelf to only show teams with the top rank. This will show multiple teams if they are tied for top rank.
The table calc approach is more flexible but can be less efficient, especially for large data sets, since the data is fetched from the data source to Tableau for the ranking calculation, and then only some of it is displayed. (Table calc come very late in the processing pipeline) The top filter approach performs the calculations and filters at the data source, and only sends the filtered results back to the Tableau client.