Generating Running Sum of Ratings in SQL - postgresql

I have a rating table. It boils down to:
rating_value created
+2 april 3rd
-5 april 20th
So, every time someone gets rated, I track that rating event in the database.
I want to generate a rating history/time graph where the rating is the sum of all ratings up to that point in time on a graph.
I.E. A person's rating on April 5th might be select sum(rating_value) from ratings where created <= april 5th
The only problem with this approach is I have to run this day by day across the interval I'm interested in. Is there some trick to generating a running total using this sort of data?
Otherwise, I'm thinking the best approach is to create a denormalized "rating history" table alongside the individual ratings.

If you have postgresql 8.4, you can use a window-aggregate function to calculate a running sum:
steve#steve#[local] =# select rating_value, created,
sum(rating_value) over(order by created)
from rating;
rating_value | created | sum
--------------+------------+-----
2 | 2010-04-03 | 2
-5 | 2010-04-20 | -3
(2 rows)
See http://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS

try to add a group by statement. that gives you the rating value for each day (in e.g. an array). as you output the rating value over time, you can just add the previous array elements together.

Related

Calculate data with overlapping dates in SQL

I'm using PostgreSQL 13 (I can update to 14 if that helps) and I'd like to return some rows based on data I've got.
The data I've got is a bit complex and comes from a few different tables but I don't think it matters here.
Currently I was able to create a query that returns data that looks like this:
Product ID Start End AvailableAmount
------------ ------------- ------------- ------------------
1 null null 2
1 2022-07-20 2022-07-22 1
1 2022-07-24 2022-07-27 1
2 null null 1
3 null null 5
Where Start is the start of a time period, End is the end of the period, AvailableAmount is the amount of product available in that time period. Available amount is calculated based on some other data.
I've tried summing up the AvailableAmount column but that does not return valid data because for time period from 2022-07-20 to 2022-07-24 AvailableAmount should be 1 but it's 2.
I think I'd need to somehow separate these dates and list the amount per day, not per time period but I don't know how.
Basically, going day by day, AvailableAmount for product with ID 1 should be:
2022-07-20: 1
2022-07-21: 1,
2022-07-22: 1,
2022-07-23: 2,
2022-07-24: 1,
2022-07-25: 1,
2022-07-26: 1,
2022-07-27: 1,
2022-07-28: 2,
...
so if I'd to query for the product with time period 2022-07-20 to 2022-07-25 I should be able to request 1 unit of the product. Currently my implementation makes it impossible as it's summing up the amount so if my request spans over two different time periods the available amount is lower than it should be.
I've tried using gaps and islands approach but I don't think it'd work here. I've also read about multirange that was introduced in v14 but I haven't tested it yet, working on it. I've also tried using generate_series but that did not help me.
I don't know if that is enough information but I can provide more if needed.
Thanks!

why my calculated cant be applied in my filters tableau

this is my tableau workbook
so i want to calculate day different between each transaction for each users, the users on this case are in filters PUL: True with this calculation
{Fixed [User Id]: sum(
if [Created At]<=[END_DATE] then 1 else 0 end)}>=2
AND
{FIXED [User Id]: sum(
IF [Created At]<=[END_DATE] AND
[Created At] >= [START_DATE] THEN 1 ELSE 0 END)}>=1
it means the users with more than 1 transaction before last range and atleast doing 1 transaction in date range are on the list.
so after that i made calculated field to count day different with this formula [CF]
DATEDIFF('day',LOOKUP(MIN([Created At]),-1), MIN([Created At]))
and also i made filters date range
lookup(min(([Created At])),0) >= [START_DATE] and
lookup(min(([Created At])),0) <= [END_DATE]
so not only count the time difference in date range for selected users, but each users also count day different on last transaction before date range (if any)
so this is the results (please take a look at user_id 86886)
i didnt understand why user_id 86886 as my experiments only have 1 transaction instead i already made the filters that only take user who doing transaction more than 1, after i check it, that user_id 86886 are doing more than 1 transaction in a day. this is the screenshoot
my questions is
why the tableau cant visualize all of the transaction in same day (but different hour) like this
and how to visualize it so it will appear 2 records with same day but different hour.
and also why if the transaction on the same day count time different with NULL instead it should be 0 because there's no time different
EXPECTED RESULTS (let's take a case in user_id 86886)
+---------+-----------------------+-------------+
| user_Id | dayOfCreatedAt | CF diff day |
+---------+-----------------------+-------------+
| 86886 | 1/25/2020 11:25:28 AM | |
| | 1/25/2020 11:39:42 AM | 0 |
+---------+-----------------------+-------------+
explanations : the first one of CF diff day become NULL because it's first transaction of this user, and the user is not doing transaction again before that, and the second was "0" because there's no day different, the different was only in hour, but although it was same day, so there will be not different day and it count 0
based on #Anil advice, this is my link of workbook tableau https://public.tableau.com/profile/fachry.dzaky#!/vizhome/simulation_data/Sheet14
I think your PUL field has some errors. Probably you should change the calculation of this field as
{Fixed [User Id]: sum(
if [Created At]<[START_DATE] then 1 else 0 end)}>=1
AND
{FIXED [User Id]: sum(
IF [Created At]<=[END_DATE] AND
[Created At] >= [START_DATE] THEN 1 ELSE 0 END)}>=1
Because, if you are trying to get non-null difference there must be at least one transaction before the [start_date]. Check it please.
Now please follow these steps (carefully please)
Step-1 Drag user id and created at at Rows Shelf.
Step-2 Change created at to exact date and thereafter to discreet (Both are important and you view shows that you haven't displayed it as exact date and as day of created_at instead)
Step-3: Set you date parameters
Step-4 Drag all three desired fields to filters shelf.
Step*5: Add PUL to context (again an important step)
Step-6 double click the CF field (check its calculation as
DATEDIFF('day', LOOKUP(MIN([Created At]),-1), MIN([Created At]))
Step-7 Change table calculation options of CF as table down (check this is also important)
Step-8 double click other CF_max/Min fields to add these to measure values
Step-9 change table calculation options in each of these four fields as discussed earlier (i.e. nested calculation of CF as specific dimensions restarting at every user_id). And nested calculation of CF_Max(as the case may be) to table down.
NOte Regarding you specific problem of user_id 86886 (if I will remove PUL revised condition as TRUE I am getting the same view as desired). Have a look please

Calculated Field to Count While Between Dates

I am creating a Tableau visualization for floor stock in our plant. We have a column for incoming date, quantity, and outgoing date. I am trying to create a visualization that sums the quantity but only while between the 2 columns.
So for example, if we have 9 parts in stock that arrived on 9/1 and is scheduled to ship out on 9/14, I would like this visualization to include these 9 parts in the sum only while it is in our stock between those 2 dates. Here is an example of some of the data I am working with.
4/20/2018 006 5/30/2018
4/20/2018 017 5/30/2018
4/20/2018 008 5/30/2018
6/29/2018 161 9/7/2018
Create a new calculation:
if [ArrivalDate]>="2018-09-01" and [ArrivalDate]<"2018-09-15"
and [Shipdate]<'2018-09-15"
then [MEASUREofStock] else 0 end
Here is a solution using UNIONs written before Tableau added support for Unions (so it required custom SQL)
Volume of an Incident Queue at a Point in Time
For several years now, Tableau has supported Union directly, so now it is possible to get the same effect without writing custom SQL, but the concept is the same.
The main thing to understand is that you need a data row per event (per arrival or per departure) and a single date column, not two. That will let you calculate the net change in quantity per day, and you can then use a running total if you want to see the absolute quantity at the close of each day
There is no simple way to display the total quantity between the two dates without changing the input table structure. If you want to show all dates and the "eligible" quantity in each day, you should
Create a calendar table that has all dates start from 1990-01-01 to 2029-12-31. (You can limit the dates to be displayed in dashboard later by applying date filter, but here you want to be safe and include all dates that may exist in your stock table) Here is how to create the date table quickly.
Left join the date table to stock table and calculate the eligible quantity in each day.
SELECT
a.date,
SUM(CASE WHEN b.quantity IS NULL THEN 0 ELSE b.quantity END) AS quantity
FROM date a
LEFT JOIN
stock b on a.date BETWEEN b.Incoming_Date AND b.Outgoing_Date
GROUP BY a.date
Import the output table to Tableau, and simply add dates and quantity to the chart.

TABLEAU Calculating a Running DISTINCT COUNT on usernames for last 3 months

Issue:
Need to show RUNNING DISTINCT users per 3-month interval^^. (See goal table as reference). However, “COUNTD” does not help even after table calculation or “WINDOW_COUNT” or “WINDOW_SUM” function.
^^RUNNING DISTINCT user means DISTINCT users in a period of time (Jan - Mar, Feb – Apr, etc.). The COUNTD option only COUNT DISTINCT users in a window. This process should go over 3-month window to find the DISTINCT users.
Original Table
Date Username
1/1/2016 A
1/1/2016 B
1/2/2016 C
2/1/2016 A
2/1/2016 B
2/2/2016 B
3/1/2016 B
3/1/2016 C
3/2/2016 D
4/1/2016 A
4/1/2016 C
4/2/2016 D
4/3/2016 F
5/1/2016 D
5/2/2016 F
6/1/2016 D
6/2/2016 F
6/3/2016 G
6/4/2016 H
Goal Table
Tried Methods:
Step-by-step:
Tried to distribute the problem into steps, but due to columnar nature of tableau, I cannot successfully run COUNT or SUM (any aggregate command) on the LAST STEP of the solution.
STEP 0 Raw Data
This tables show the structure Data, as it is in the original table.
STEP 1 COUNT usernames by MONTH
The table show the count of users by month. You will notice because user B had 2 entries he is counted twice. In the next step we use DISTINCT COUNT to fix this issue.
STEP 2 DISTINCT COUNT by MONTH
Now we can see who all were present in a month, next step would be to see running DISTINCT COUNT by MONTH for 3 months
STEP 3 RUNNING DISTINCT COUNT for 3 months
Now we can see the SUM of DISTINCT COUNT of usernames for running 3 months. If you turn the MONTH INTERVAL to 1 from 3, you can see STEP 2 table.
LAST STEP Issue Step
GOAL: Need the GRAND TOTAL to be the SUM of MONTH column.
Request:
I want to calculate the SUM of '1' by MONTH. However, I am using WINDOW function and aggregating the data that gave me an Error.
WHAT I NEED
Jan Feb March April May Jun
3 3 4 5 5 6
WHAT I GOT
Jan Feb March April May Jun
1 1 1 1 1 1
My Output after tried methods: Attached twbx file. DISTINCT_count_running_v1
HELP taken:
https://community.tableau.com/thread/119179 ; Tried this method but stuck at last step
https://community.tableau.com/thread/122852 ; Used some parts of this solution
The way I approached the problem was identifying the minimum login date for each user and then using that date to count the distinct number of users. For example, I have data in this format. I created a calculated field called Min User Login Date as { FIXED [User]:MIN([Date])} and then did a CNTD(USER) on Min User Login Date to get the unique user count by date. If you want running total, then you can do quick table calculation on Running Total on CNTD(USER) field.
You need to put Month(date) and count(username) in the columns then you will get result what you expect.
See screen below

Grouping by date difference/range

How would i write a statement that would make specific group by's looking at the monthly date range/difference. Example:
org_group | date | second_group_by
A 30.10.2013 1
A 29.11.2013 1
A 31.12.2013 1
A 30.01.2015 2
A 27.02.2015 2
A 31.03.2015 2
A 30.04.2015 2
as long es there isnt a monthly date_diff > 1 it should be in the same second_group_by. I hope its clear enough for you to understand, the column second_group_by should be generated by the user...it doesnt exists in the table.
date diff between which rows though?
If you just want to separate years (or months or weeks) use
GROUP BY DATEPART(....)
That's Sybase or SQL Server but other SQLs will have equivalent.
If you have specific data ranges, get them into a table with start and end date-time and a monotonically increasing integer, join to that with a BETWEEN and GROUP BY the integer.