How to group previously-denormalized-data from a row - postgresql

I have a table containing courses run by teachers, I want to grab the number of taught days and split these by years and teachers' status.
The table contains the following fields:
id teacher_id course_name course_date course_duration teacher_status
--------------------------------------------------------------------------
1 Teacher_01 Course_AA 2012-02-01 2 volunteer
2 Teacher_02 Course_BB 2012-02-01 7 employee
3 Teacher_03 Course_BB 2013-02-01 7 contractor
4 Teacher_01 Course_AA 2014-02-01 2 paid volunteer
5 Teacher_04 Course_AA 2014-06-01 2 paid volunteer
Teachers may run a course under various statuses: volunteer, paid volunteer, contractor, employee, etc. The status of a given teacher can change through time. The duration of a course is expressed in days.
I can already gather the sum of taught days by teachers, split by status. This is done by
SELECT
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
teacher_status
;
But data is not normalized and different families of statuses have been mixed. So I want to gather the same info (number of taught days) split:
by 3 statuses: volunteer, paid volunteer, all other statuses,
and by years.
What is expected is:
Year Teacher_status Taught_days
---------------------------------------
2012 volunteer 2
2012 employee 7
2013 contractor 7
2014 paid volunteer 4
I've tried various combinations of aggregate functions, GROUP BY / HAVING / ROLLUP statements but without success. How should I achieve this?

You'll want to select a complex expression and then GROUP BY that, not just by a raw column value. You could either repeat the expression or, in Postgres, also refer to the column alias:
SELECT
EXTRACT(year FROM course_date) as year,
(CASE teacher_status
WHEN 'volunteer' THEN 'volunteer'
WHEN 'paid volunteer' THEN 'paid'
ELSE 'other'
END) AS status,
SUM(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
year,
status;

To get your example result, I have this query
SELECT extract (year from course_date),
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
extract (year from course_date),
teacher_status;

Related

PostgreSQL LAG records one year apart by partitioning

I have a db table with a bunch of records in a snapshot fashioned way, i. e. daily captures of product units availability for many years
product units category expire_date report_date
pineapple 10 common 12/25/2021 12/01/2021
pineapple 8 common 12/25/2021 12/02/2021
pineapple 8 deluxe 12/28/2021 12/02/2021
grapes 45 deluxe 11/30/2022 12/01/2021
...
pineapple 21 common 12/12/2022 12/01/2022
...
What I'm trying to get from that data is something like this "lagged" version, partitioning by product and category:
product units category report_date prev_year_units_atreportdate
pineapple 10 common 12/01/2021 NULL
pineapple 21 common 12/01/2022 10
pineapple 16 common 12/01/2023 21
...
It's important to know that from time to time the cron snapshot task fails and no records are stored for days. This leads to a different number of records by product.
I've been using LAG() to no avail since I can only get previous day/month using partitioning by product, category
Can anyone help me on this?
I think I would use a subselect rather than a window function.
select *,
(
select units from t t2
where t2.report_date=t1.report_date-interval '1 year' and t2.product=t1.product and t2.category=t1.category
) lagged_units
from t as t1
I'm not sure what you want to happen on leap year, though, or the year after one.

TABLEAU Calculating a Running DISTINCT COUNT on usernames for last 3 months

Issue:
Need to show RUNNING DISTINCT users per 3-month interval^^. (See goal table as reference). However, “COUNTD” does not help even after table calculation or “WINDOW_COUNT” or “WINDOW_SUM” function.
^^RUNNING DISTINCT user means DISTINCT users in a period of time (Jan - Mar, Feb – Apr, etc.). The COUNTD option only COUNT DISTINCT users in a window. This process should go over 3-month window to find the DISTINCT users.
Original Table
Date Username
1/1/2016 A
1/1/2016 B
1/2/2016 C
2/1/2016 A
2/1/2016 B
2/2/2016 B
3/1/2016 B
3/1/2016 C
3/2/2016 D
4/1/2016 A
4/1/2016 C
4/2/2016 D
4/3/2016 F
5/1/2016 D
5/2/2016 F
6/1/2016 D
6/2/2016 F
6/3/2016 G
6/4/2016 H
Goal Table
Tried Methods:
Step-by-step:
Tried to distribute the problem into steps, but due to columnar nature of tableau, I cannot successfully run COUNT or SUM (any aggregate command) on the LAST STEP of the solution.
STEP 0 Raw Data
This tables show the structure Data, as it is in the original table.
STEP 1 COUNT usernames by MONTH
The table show the count of users by month. You will notice because user B had 2 entries he is counted twice. In the next step we use DISTINCT COUNT to fix this issue.
STEP 2 DISTINCT COUNT by MONTH
Now we can see who all were present in a month, next step would be to see running DISTINCT COUNT by MONTH for 3 months
STEP 3 RUNNING DISTINCT COUNT for 3 months
Now we can see the SUM of DISTINCT COUNT of usernames for running 3 months. If you turn the MONTH INTERVAL to 1 from 3, you can see STEP 2 table.
LAST STEP Issue Step
GOAL: Need the GRAND TOTAL to be the SUM of MONTH column.
Request:
I want to calculate the SUM of '1' by MONTH. However, I am using WINDOW function and aggregating the data that gave me an Error.
WHAT I NEED
Jan Feb March April May Jun
3 3 4 5 5 6
WHAT I GOT
Jan Feb March April May Jun
1 1 1 1 1 1
My Output after tried methods: Attached twbx file. DISTINCT_count_running_v1
HELP taken:
https://community.tableau.com/thread/119179 ; Tried this method but stuck at last step
https://community.tableau.com/thread/122852 ; Used some parts of this solution
The way I approached the problem was identifying the minimum login date for each user and then using that date to count the distinct number of users. For example, I have data in this format. I created a calculated field called Min User Login Date as { FIXED [User]:MIN([Date])} and then did a CNTD(USER) on Min User Login Date to get the unique user count by date. If you want running total, then you can do quick table calculation on Running Total on CNTD(USER) field.
You need to put Month(date) and count(username) in the columns then you will get result what you expect.
See screen below

LOD Workaround with Tableau 8.3

I'm new to Tableau. I have a customer-event table to show which customers attended which events (like webinars, etc). One of fields is sales - which is the sales for that customer 30 days from the date of the event.
custid eventid eventdt 30daysales
1 aa jan 1 $100
1 ab jan 1 $100
2 aa jan 2 $150
Note that customer 1 attended 2 events on the same day. So the sales number is duplicated. If I were building a report for a single event, it's no problem. But when I build a monthly report, I want sum(Sales) = $250 and not $350.
My report sample:
Month eventcount customercount 30daysales
Jan 2 2 $250
With tableau 9, I read that using an LOD formula would allow me to sum sales on a per customer basis. But I'm on Tableau 8.3 and I'm wondering what the manual workaround is.
How do I write the calculated field to compute the 30daysales without duplicating?

Grouping by date difference/range

How would i write a statement that would make specific group by's looking at the monthly date range/difference. Example:
org_group | date | second_group_by
A 30.10.2013 1
A 29.11.2013 1
A 31.12.2013 1
A 30.01.2015 2
A 27.02.2015 2
A 31.03.2015 2
A 30.04.2015 2
as long es there isnt a monthly date_diff > 1 it should be in the same second_group_by. I hope its clear enough for you to understand, the column second_group_by should be generated by the user...it doesnt exists in the table.
date diff between which rows though?
If you just want to separate years (or months or weeks) use
GROUP BY DATEPART(....)
That's Sybase or SQL Server but other SQLs will have equivalent.
If you have specific data ranges, get them into a table with start and end date-time and a monotonically increasing integer, join to that with a BETWEEN and GROUP BY the integer.

crystal reports formula's on group

crystal report grouped by company and invoiced
company name invoiceNo which has max(dayslate)? (maxdaysLate)
------------- records -------------------------------
there is a chance of one invoice is assigned to multiple companies,
for example
company1 inv1 30 days late (max(dayslate) col below)
invoiceNo duedate dayslate (cal field today - duedate)
inv1 2016-01-01 30
inv2 2016-01-01 30
Company2 inv3 26 days late(inv1 below has max but already in comp1, so 2nd max)
invoiceNo duedate dayslate
inv1 2016-01-01 30
inv3 2016-01-04 26
need help how to write a formula for max days late record field in a group (ex: inv1)
how to calculate max days as already one is used find second so on...
edit
days late is column which is calculated column (today - due date)
on group want to display which has max days late of that group and its related invoice number
there is chance of one invoice has been assigned to multiple companies, in case any that invoice has max days in 2 groups it should be calculated to only one company group and on other group it should consider the second highest