How can I calculate the number of publications per month? - postgresql

There is a table of posts on social networks with the date and title of the publication.
id
created_at
title
1
2022-01-17 08:50:58
Sberbank is the best bank
2
2022-01-17 18:36:41
Visa vs MasterCard
3
2022-01-17 16:16:17
Visa vs UnionPay
4
2022-01-17 18:01:00
Mastercard vs UnionPay
5
2022-01-16 16:44:36
Hadoop or Greenplum: pros and cons
6
2022-01-16 14:57:32
NFC: wireless payment
I need to calculate the number of publications per month, indicating the first date of the month and the percentage
of increase in the number of posts (publications) relative to the previous month. The data in the resulting table should be arranged in chronological order. The percentage of the increase in the number of messages can be negative, and the result should
be rounded to one decimal place with the addition of the % sign.
Table results
dt
count
prent_growth
2022-02-01
175
null
2022-03-01
338
93.1%
2022-04-01
345
2.1%
2022-05-01
295
-14.5%
2022-06-01
330
11.9%
I read documentation, but i don't understand how to do that..

step-by-step demo: db<>fiddle
SELECT
*,
(count * 100 / prev_count - 100)::text || '%' -- 4
FROM (
SELECT
*,
lag(count) OVER (ORDER BY pub_month) as prev_count -- 3
FROM (
SELECT
date_trunc('month', pub_date)::date as pub_month, -- 1
COUNT(*) -- 2
FROM mytable
GROUP BY 1
) s
) s
Normalize all dates to the first day of the month ("truncates" the day part if you like to see it that way)
Group all normalized dates and count all entrys per normalized date/month
Using lag() window function to shift the previous count result to the current row. Now you can directly compare the previous and current month count
Calculate the percentage. The result is a numeral type. So can cast it into text type to add the percentage character afterwards.

Related

Oracle SQL Developer how to get Min unix timestamp for max order per line

im running Oracle Database 18c Enterprise Edition Release 18.0.0.0.0
i have a table like the following:
Unix_Timestamp
Line
Order_Number
1660496421
1
299
1670496421
1
299
1660456421
1
298
1660473051
1
298
1660573526
2
300
1660473044
2
300
Unix_Timestamp is a unique column
i want to get the min Unix_Timestamp value for the max Order_Number value per Line.
so get the most recent order per line, and get the min Unix_Timestamp for that order_number
Order_Number value goes up in ascending value and each number can only belong to one Line.
i do not need the order_number value in my dataset but i need it considered in my script
so far i can get the min Unix_Timestamp value per line but im struggling to factor in the max order_number:
select Line,
Min(Unix_Timestamp) as Unix_Timestamp
from..
join..
where..
group by Line
any help would be appreciated thank you
You can use keep last:
When you need a value from the first or last row of a sorted group, but the needed value is not the sort key, the FIRST and LAST functions eliminate the need for self-joins or views and enable better performance.
In your case you can do:
select line,
min(unix_timestamp) keep (dense_rank last order by order_number) as unix_timestamp
from your_table
group by line
LINE UNIX_TIMESTAMP
---- --------------
1 1660496421
2 1660473044
db<>fiddle
(The difference between the outputs is hard to spot - 1660496421 looks very similar to 1660456421...)

Date difference between pairs of dates per ID

I have data with 2 columns, in the following format:
ID
Date
1
1/1/2020
1
27/7/2020
1
15/3/2021
2
18/1/2020
3
1/1/2020
3
3/8/2020
3
18/9/2021
2
23/8/2020
2
30/2/2021
Now I would like to create a calculation field in Tableau to find per ID the difference between the different dates. For any value e.g. days.
For example for ID 1 the difference of the two dates according to calendar is 208 days. Next the difference of the second to third date for the same ID is 231 days.
A table calc like the following should do if you get the partitioning, addressing and ordering right — such as setting “compute using” to Date.
If first() < 0 then min([Date]) - lookup(min([Date]), -1) end

How to group previously-denormalized-data from a row

I have a table containing courses run by teachers, I want to grab the number of taught days and split these by years and teachers' status.
The table contains the following fields:
id teacher_id course_name course_date course_duration teacher_status
--------------------------------------------------------------------------
1 Teacher_01 Course_AA 2012-02-01 2 volunteer
2 Teacher_02 Course_BB 2012-02-01 7 employee
3 Teacher_03 Course_BB 2013-02-01 7 contractor
4 Teacher_01 Course_AA 2014-02-01 2 paid volunteer
5 Teacher_04 Course_AA 2014-06-01 2 paid volunteer
Teachers may run a course under various statuses: volunteer, paid volunteer, contractor, employee, etc. The status of a given teacher can change through time. The duration of a course is expressed in days.
I can already gather the sum of taught days by teachers, split by status. This is done by
SELECT
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
teacher_status
;
But data is not normalized and different families of statuses have been mixed. So I want to gather the same info (number of taught days) split:
by 3 statuses: volunteer, paid volunteer, all other statuses,
and by years.
What is expected is:
Year Teacher_status Taught_days
---------------------------------------
2012 volunteer 2
2012 employee 7
2013 contractor 7
2014 paid volunteer 4
I've tried various combinations of aggregate functions, GROUP BY / HAVING / ROLLUP statements but without success. How should I achieve this?
You'll want to select a complex expression and then GROUP BY that, not just by a raw column value. You could either repeat the expression or, in Postgres, also refer to the column alias:
SELECT
EXTRACT(year FROM course_date) as year,
(CASE teacher_status
WHEN 'volunteer' THEN 'volunteer'
WHEN 'paid volunteer' THEN 'paid'
ELSE 'other'
END) AS status,
SUM(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
year,
status;
To get your example result, I have this query
SELECT extract (year from course_date),
teacher_status,
sum(course_duration) AS "Taught days"
FROM
my_table
GROUP BY
extract (year from course_date),
teacher_status;

A T-SQL process to identify the total duration or number of days of all "cases" within a specified time period. This is a challenge

I could really do with some help and intend to be active in this community and help others in return. I am a SQL developer using MS SQL Server for the last two years but I've hit a roadblock on this one. Imagine the scenario you have a number of "Accommodation Providers". Each has a certain "Service Capacity". We have a dataset with a number of concurrent "Placements" which can be any duration from a day to several years. We would like to know the "Occupancy Rate" by calculating it as
Occupancy = Placement Days (all days in all placements within period)
/
(Capacity x Days in Period) X 100
I have changed names of fields/tables and am showing some made-up sample data here.
We have one dataset in a table (tPL) for "Placements". There are many thousands of records, going back 7 years
e.g
tbl_Placements tPL:
[Provder Name] [Name of Client] [Vacancy Filled Date] [Vacancy End Date]
Accommodation1 John Smith 2018-08-04 2018-08-12
Accommodation1 Jane Smith 2019-01-28 2019-04-09
and:
[Placement_Length_in_Days]
8
294
tbl_Month_Year tMY:
Month_Year
2018-03-01
2018-04-01
2018-05-01
2018-06-01
2018-07-01
2018-08-01
2018-09-01
2018-10-01
2018-11-01
2018-12-01
2019-01-01
2019-02-01
2019-03-01
2019-04-01
2019-05-01
and lastly
tbl_Service_Capacity tSC:
[Provider Name] [Service Capacity]
Accommodation 1 12
Accommodation 2 4
Dividing by the service capacity is the easy part. Where I'm struggling is calculating the total number of "Placement Days" in a given period such as a month or quarter.
If you consider that Accommodation1, 2 and 3 can have multiple concurrent and overlapping placements of different lengths which can start and finish at any time, how can I calculate the total number of days in all placements, that fall within a given time period e.g. quarter or a month, to then calculate the occupancy percentage? The code below is an attempt. I'm presuming all months to be 30 days here, which I know is wrong. I know the logic is wrong here about calculating the number of days. To be honest, I'm almost totally fried and I just can't seem to get this done, hence I'm asking for help.
Am I going about this the wrong way by joining on a date table? Has anyone come against this before. Also if you would like me to give you more information or clarify, I'm happy to do so.
Any help you can give will be hugely appreciated!
Please see the code below. I've tried it a few different ways, but sadly did not save the older versions to show. They didn't work, though. I've done something similar in the past to see how many "open cases" there were at any given point in time. That inspired the code here and went like this:
SELECT TOP (1000) tMY.Month_Year, COUNT(*) AS ActiveCases
FROM tbl_Casework AS tblCW LEFT OUTER JOIN
tbl_Month_Year AS tMY ON tMY.Month_Year >= tblCW.Start_Date AND tMY.Month_Year <= DATEADD(day, 31 - DATEPART(day,
ISNULL(tblCW.End_Date, GETDATE())), ISNULL(tblCW.End_Date, GETDATE()))
GROUP BY tMY.Month_Year
This definitely worked, but was just a count of "how many cases were open at some point during each month?"
SELECT tMY.Month_Year
,tPL.[Accommodation Provider]
,tSC.[Service_capacity_Total]
-- if started before month began and closed at or after end of month / or still open
,(sum(case when (datediff(day, tPL.[Vacancy Filled Date], [tMY].[MonthYear])<0 AND
(datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])>=30) OR tPL.[Vacancy End Date] is null) then 30
-- if started after month began and closed during month
,sum(case when (datediff(day, tPL.[Vacancy Filled Date], [tMY].[MonthYear])>=0 AND
datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])<=30) then tPL.[Placement_Length_in_Days]
-- if started before and closed after month - take filled date to end of month
,sum(case when datediff(day, [tMY].[Month_Year], tPL.[Vacancy End Date])>=30 AND datediff(day, tPL.[Vacancy Filled Date], [tMY].[Month_Year])<0 then
datediff(day, tPL.[Vacancy Filled Date], DATEADD(DAY, 30, tMY.Month_Year)) END) / (tSC.[Service_capacity]*30)*100 As [Occupancy Rate]
FROM [tbl_Placements] tPL
inner join tbl_Service_Capacity tSC on tSC.[Service Name] = tPL.[Accommodation Provider]
left outer join tbl_Month_Year tMY ON tMY.MonthYear >= [Vacancy Filled Date] and tMY.MonthYear <= DATEADD(day, 30, tPL.[Vacancy Filled Date])
WHERE tPL.[Vacancy Filled Date] >= '20160501' and tMY.MonthYear < (getdate()-30) AND tSC.[Service Capacity] IS NOT NULL
group by tMY.MonthYear, tPL.[Service Name], tSC.[Service Capacity]--, tPL.[Client Name]
order by tMY.MonthYear Asc
The code runs but I get crazy occupancy rates at 300% or 3% so the figures must be incorrect. The only part I'm sure of is taking the [Placement_Length_in_Days] when it starts and finishes within the time period. The calculations here are wrong, I'm sure of that.
To give you a quick shot, you might try this:
DECLARE #tbl_Placements TABLE
(
[Provider Name] VARCHAR(100),
[Name of Client] VARCHAR(100),
[Vacancy Filled Date] DATE,
[Vacancy End Date] DATE
);
INSERT INTO #tbl_Placements
VALUES ('Accommodation1', 'John Smith', '2018-08-04', '2018-08-12'),
('Accommodation1', 'Jane Smith ', '2019-01-28', '2019-04-09');
SELECT
p.[Provider Name], p.[Name of Client],
DATEADD(DAY, A.Nmbr - 1, p.[Vacancy Filled Date]) AS OccupiedAt
FROM
#tbl_Placements p
CROSS APPLY
(SELECT TOP (DATEDIFF(DAY, p.[Vacancy Filled Date], p.[Vacancy End Date]) + 1)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM
master..spt_values) A(Nmbr);
The idea in short:
We use CROSS APPLY to create a joined set per row.
We use a computed TOP clause to get the right count of rows back
We create a numbers-table on the fly, simply by querying any table with enough rows (here I took master..spt_values. We do not need the actual table's content, just a counter we get from ROW_NUMBER().
We return the set together with a running day starting with the first day of occupation and ending with the last day of occupation.
Hint: This was much easier, if you have an existing physical numbers/date table in your database. You would simply inner join this table with a BETWEEN in the ON-clause.
You might read this.

Grouping by date difference/range

How would i write a statement that would make specific group by's looking at the monthly date range/difference. Example:
org_group | date | second_group_by
A 30.10.2013 1
A 29.11.2013 1
A 31.12.2013 1
A 30.01.2015 2
A 27.02.2015 2
A 31.03.2015 2
A 30.04.2015 2
as long es there isnt a monthly date_diff > 1 it should be in the same second_group_by. I hope its clear enough for you to understand, the column second_group_by should be generated by the user...it doesnt exists in the table.
date diff between which rows though?
If you just want to separate years (or months or weeks) use
GROUP BY DATEPART(....)
That's Sybase or SQL Server but other SQLs will have equivalent.
If you have specific data ranges, get them into a table with start and end date-time and a monotonically increasing integer, join to that with a BETWEEN and GROUP BY the integer.