Date difference between pairs of dates per ID - tableau-api

I have data with 2 columns, in the following format:
ID
Date
1
1/1/2020
1
27/7/2020
1
15/3/2021
2
18/1/2020
3
1/1/2020
3
3/8/2020
3
18/9/2021
2
23/8/2020
2
30/2/2021
Now I would like to create a calculation field in Tableau to find per ID the difference between the different dates. For any value e.g. days.
For example for ID 1 the difference of the two dates according to calendar is 208 days. Next the difference of the second to third date for the same ID is 231 days.

A table calc like the following should do if you get the partitioning, addressing and ordering right — such as setting “compute using” to Date.
If first() < 0 then min([Date]) - lookup(min([Date]), -1) end

Related

How can I calculate the number of publications per month?

There is a table of posts on social networks with the date and title of the publication.
id
created_at
title
1
2022-01-17 08:50:58
Sberbank is the best bank
2
2022-01-17 18:36:41
Visa vs MasterCard
3
2022-01-17 16:16:17
Visa vs UnionPay
4
2022-01-17 18:01:00
Mastercard vs UnionPay
5
2022-01-16 16:44:36
Hadoop or Greenplum: pros and cons
6
2022-01-16 14:57:32
NFC: wireless payment
I need to calculate the number of publications per month, indicating the first date of the month and the percentage
of increase in the number of posts (publications) relative to the previous month. The data in the resulting table should be arranged in chronological order. The percentage of the increase in the number of messages can be negative, and the result should
be rounded to one decimal place with the addition of the % sign.
Table results
dt
count
prent_growth
2022-02-01
175
null
2022-03-01
338
93.1%
2022-04-01
345
2.1%
2022-05-01
295
-14.5%
2022-06-01
330
11.9%
I read documentation, but i don't understand how to do that..
step-by-step demo: db<>fiddle
SELECT
*,
(count * 100 / prev_count - 100)::text || '%' -- 4
FROM (
SELECT
*,
lag(count) OVER (ORDER BY pub_month) as prev_count -- 3
FROM (
SELECT
date_trunc('month', pub_date)::date as pub_month, -- 1
COUNT(*) -- 2
FROM mytable
GROUP BY 1
) s
) s
Normalize all dates to the first day of the month ("truncates" the day part if you like to see it that way)
Group all normalized dates and count all entrys per normalized date/month
Using lag() window function to shift the previous count result to the current row. Now you can directly compare the previous and current month count
Calculate the percentage. The result is a numeral type. So can cast it into text type to add the percentage character afterwards.

After performing dropDuplicates() am getting different counts when taking the count

I did dropDuplicates in a dataframe with subsets of Region,store,and id.
The dataframe contains some other columns like latitude, longitude, address, Zip, Year, Month...
When I do count of the derived dataframe am getting a constant value,
But when i take the count of a selected year, say 2018, am getting different counts when running the df.count()
Could anyone please explain why this is happening?
Df.dropDuplicates("region","store","id")
Df.createOrReplaceTempView(Df)
spark.sql("select * from Df").count() is constant
whenever i run
But if i put a where clause inside with Year or Month am getting multiple counts.
Eg:
spark.sql("select * from Df where Year =2018").count()
This statement is giving multiple values on each execution.
Intermediate output
Region store objectnr latitude longitude newid month year uid
Abc 20 4572 46.6383 8.7383 1 4 2018 0
Sgs 21 1425 47.783 6.7282 2 5 2019 1
Efg 26 1277 48.8293 8.2727 3 7 2019 2
Output
Region store objectnr latitude longitude newid month year uid
Abc 20 4572 46.6383 8.7383 1277 4 2018 0
Sgs 21 1425 47.783 6.7282 1425 5 2019 1
Efg 26 1277 48.8293 8.2727 1277 7 2019 2
So here newid gets the value of objecrnr,
When newid is comming same then i need to assign the latest objectnr to newid, considering the year and month
The line
Df.dropDuplicates("region","store","id")
creates a new Dataframe and it is not modifying the existing one. Dataframes are immutable.
To solve your issue you need to save the output of the dropDuplicates statement into a new Dataframe as shown below:
val Df2 = Df.dropDuplicates("region","store","id")
Df2.createOrReplaceTempView(Df2)
spark.sql("select * from Df2").count()
In addition you may get different counts when applying the filter Year=2018 because the Year column ist not part of the three columns you used to drop the duplicates. Apparently you have date in your Dataframe that share the same values in the three column but differ in the Year. Dropping duplicates is not a deterministic process ass it depends on the ordering of your data which vary in every run on your code.

Use one date filter on multiple columns in tableau

I do have data set with multiple date columns with different values of dates across all the months and years. I want to create a report wherein when I select a Year, I want to list the count of dates across each months on that year. Based on one Year field selection, how can I apply filter across different date fields to display the counts for that particular year
Lets say we have the data set like this
Date 1 Date 2
1/3/2017 NA
1/23/2017 1/23/2017
1/14/2017 1/16/2017
2/2/2017 2/3/2017
NA 2/21/2017
3/1/2017 NA
3/3/2017 3/21/2017
.
.
.
12/1/2017 12/12/2017
My result should look like this when I pick the year 2017
Date 1 Date 2
Jan 3 2
Feb 1 2
Mar 2 1
.
.
Dec 1 1
I was able to apply filter on one column but when I try to apply on other columns, I am not getting desired result
Assuming you want to interact with your dashboard using a parameter, you can create one string parameter in order to input the year you want to analyze.
After that you just need to create 2 calculated fields to count if that year is "contained" in your dates:
if contains(str([Date 1]),[Parameter]) then 1 else 0 end
Keep in mind that there's no gaurantee you'll get all the available months in the calendar unless you have data for all of them.
In order to consider even blank dates, I created a Date Global calculated field as follow:
ifnull([Date 1],[Date 2])
Once you've created this fields/parameter (show parameter control), you can simply add them in your worksheet ad I did in the image:

TABLEAU Calculating a Running DISTINCT COUNT on usernames for last 3 months

Issue:
Need to show RUNNING DISTINCT users per 3-month interval^^. (See goal table as reference). However, “COUNTD” does not help even after table calculation or “WINDOW_COUNT” or “WINDOW_SUM” function.
^^RUNNING DISTINCT user means DISTINCT users in a period of time (Jan - Mar, Feb – Apr, etc.). The COUNTD option only COUNT DISTINCT users in a window. This process should go over 3-month window to find the DISTINCT users.
Original Table
Date Username
1/1/2016 A
1/1/2016 B
1/2/2016 C
2/1/2016 A
2/1/2016 B
2/2/2016 B
3/1/2016 B
3/1/2016 C
3/2/2016 D
4/1/2016 A
4/1/2016 C
4/2/2016 D
4/3/2016 F
5/1/2016 D
5/2/2016 F
6/1/2016 D
6/2/2016 F
6/3/2016 G
6/4/2016 H
Goal Table
Tried Methods:
Step-by-step:
Tried to distribute the problem into steps, but due to columnar nature of tableau, I cannot successfully run COUNT or SUM (any aggregate command) on the LAST STEP of the solution.
STEP 0 Raw Data
This tables show the structure Data, as it is in the original table.
STEP 1 COUNT usernames by MONTH
The table show the count of users by month. You will notice because user B had 2 entries he is counted twice. In the next step we use DISTINCT COUNT to fix this issue.
STEP 2 DISTINCT COUNT by MONTH
Now we can see who all were present in a month, next step would be to see running DISTINCT COUNT by MONTH for 3 months
STEP 3 RUNNING DISTINCT COUNT for 3 months
Now we can see the SUM of DISTINCT COUNT of usernames for running 3 months. If you turn the MONTH INTERVAL to 1 from 3, you can see STEP 2 table.
LAST STEP Issue Step
GOAL: Need the GRAND TOTAL to be the SUM of MONTH column.
Request:
I want to calculate the SUM of '1' by MONTH. However, I am using WINDOW function and aggregating the data that gave me an Error.
WHAT I NEED
Jan Feb March April May Jun
3 3 4 5 5 6
WHAT I GOT
Jan Feb March April May Jun
1 1 1 1 1 1
My Output after tried methods: Attached twbx file. DISTINCT_count_running_v1
HELP taken:
https://community.tableau.com/thread/119179 ; Tried this method but stuck at last step
https://community.tableau.com/thread/122852 ; Used some parts of this solution
The way I approached the problem was identifying the minimum login date for each user and then using that date to count the distinct number of users. For example, I have data in this format. I created a calculated field called Min User Login Date as { FIXED [User]:MIN([Date])} and then did a CNTD(USER) on Min User Login Date to get the unique user count by date. If you want running total, then you can do quick table calculation on Running Total on CNTD(USER) field.
You need to put Month(date) and count(username) in the columns then you will get result what you expect.
See screen below

Grouping by date difference/range

How would i write a statement that would make specific group by's looking at the monthly date range/difference. Example:
org_group | date | second_group_by
A 30.10.2013 1
A 29.11.2013 1
A 31.12.2013 1
A 30.01.2015 2
A 27.02.2015 2
A 31.03.2015 2
A 30.04.2015 2
as long es there isnt a monthly date_diff > 1 it should be in the same second_group_by. I hope its clear enough for you to understand, the column second_group_by should be generated by the user...it doesnt exists in the table.
date diff between which rows though?
If you just want to separate years (or months or weeks) use
GROUP BY DATEPART(....)
That's Sybase or SQL Server but other SQLs will have equivalent.
If you have specific data ranges, get them into a table with start and end date-time and a monotonically increasing integer, join to that with a BETWEEN and GROUP BY the integer.