Show the count based on some condition - postgresql

I've asked a question some days back. Here is that link.
Count() corresponding to max()
Now with the same set of tables (SQL Fiddle) I would like to check a different condition
If the first question was about a count related to the max of a status, this question is about showing the count based on the next status of every project.
Explanation
As you can see in the table user_approval,appr_prjt_id=1 has 3 different statuses namely 10,20 ,30. And the next status will be 40 (With every approval the status is increased by 10) and so on. So is it possible to show that there is a project whose status is waiting to be 40? Its count must only be shown for status corresponding to 40 in the output (not in the statuses 10,20,30,...etc)
Desired Output:
10 | 20 | 30 | 40
location1 0 | 0 | 0 | 1

Not sure what the next status will be 40 means. But assuming that the status is increased by 10 with every approval, the following should work:
SELECT *
FROM user_projects pr
WHERE EXISTS (
SELECT * FROM user_approval ex
WHERE ex.appr_prjt_id = pr.proj_id
AND ex.appr_status = 30
)
AND NOT EXISTS (
SELECT * FROM user_approval nx
WHERE nx.appr_prjt_id = pr.proj_id
AND nx.appr_status >= 40
);

You can get the counts for each of the next status requirements with a query that looks more like:
select
sum(case when ua.appr_status = 10 then 1 else 0 end) as app_waiting_20,
sum(case when ua.appr_status = 20 then 1 else 0 end) as app_waiting_30,
sum(case when ua.appr_status = 30 then 1 else 0 end) as app_waiting_40
from
user_approval ua;
The nice thing about this solution is only one table scan, and you can add all kinds of other counts/sums in the query result as well.

select * from user_approval where appr_status
= (select max(appr_status) from user_approval where appr_status < 40);
SQL Fiddle : - http://www.sqlfiddle.com/#!11/f5243/10

Related

MySQL SELECT MIN and MAX RIGHT JOIN numeric value of the last 30 days

I need a query to return the initial and final numeric value of the number of listeners of some artists of the last 30 days ordered from the highest increase of listeners to the lowest.
To better understand what I mean, here are the tables involved.
artist table saves the information of a Spotify artist.
id
name
Spotify_id
1
Shakira
0EmeFodog0BfCgMzAIvKQp
2
Bizarrap
716NhGYqD1jl2wI1Qkgq36
platform_information table save the information that I want to get from the artists and on which platform.
id
platform
information
1
spotify
monthly_listeners
2
spotify
followers
platform_information_artist table stores information for each artist on a platform and information on a specific date.
id
platform_information_id
artist_id
date
value
1
1
1
2022-11-01
100000
2
1
1
2022-11-15
101000
3
1
1
2022-11-30
102000
4
1
2
2022-11-02
85000
5
1
2
2022-11-06
90000
6
1
2
2022-11-26
100000
Right now have this query:
SELECT (SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date ASC
LIMIT 1) as month_start,
(SELECT value
FROM platform_information_artist
WHERE artist_id = 1
AND platform_information_id =
(SELECT id from platform_information WHERE platform = 'spotify' AND information = 'monthly_listeners')
AND DATE(date) >= DATE(NOW()) - INTERVAL 30 DAY
ORDER BY date DESC
LIMIT 1) as month_end,
(SELECT month_end - month_start) as diference
ORDER BY month_start;
Which returns the following:
month_start
month_end
difference
100000
102000
2000
The problem is that this query only returns the artist I specify.
And I need the information like this:
artist_id
name
platform_information_id
month_start_value
month_end_value
difference
2
Bizarrap
1
85000
100000
15000
1
Shakira
1
100000
102000
2000
The query should return the 5 artists that have grown the most in number of monthly listeners over the last 30 days, along with the starting value 30 days ago, and the current value.
Thanks for the help.

Postgresql query first and last in every range

I have table
id
machineid
reset
1
1
false
2
1
false
3
1
false
4
1
true
5
1
false
15
1
true
17
1
false
20
2
false
21
2
false
25
2
false
30
2
false
I cant figure out how to find first and last id for every machine. Reset create new range for next rows. Result should look like:
machineid
startid
endid
1
1
3
1
4
5
1
15
17
2
20
30
you can start from grouping your records into groups or ranges. As the order of your records matter, it indicates you can make use of window functions. You have to determine how you are going to uniquely name these groups. I suggest you use the number of resets above the record. This result to this statement:
SELECT *
, SUM(case when reset then 1 else 0 end) over (partition by machineid order by id) as reset_group
FROM
test;
After that finding the start and end ids is a simple GROUP BY statement:
SELECT
machineid, MIN(id) as startid, MAX(id) as endid
FROM (
SELECT machineid, id
, SUM(case when reset then 1 else 0 end) over (partition by machineid order by id) as reset_group
FROM
test
) as grouped
GROUP BY
machineid, reset_group
ORDER BY
machineid, startid;
Please try it out: db<>fiddle

Find last occurring value within record in PostgreSQL

I'm not new to SQL, but I am new to PostgreSQL and am really struggling to adapt my current knowledge in a different environment.
I am trying to create a variable that captures whether or not someone stays active, skips, or churns within a 0/1 time series variable. For example, in the data below, my dataset would include the variables id,time, and voted, and I would create the variable "skipped":
id time voted skipped
1 1 1 active
1 2 0 skipped
1 3 1 active
2 1 1 active
2 2 0 churned
2 3 0 churned
3 1 1 active
3 2 1 active
3 3 0 churned
The rule for coding "skipped" is pretty simple: If 1 is the last record, the person is "active" and any zeroes count as "skipped", but if 0 is the last record, the person is "churned".
The record with id = 1 is a skip because id is non-zero at time 3 after being 0 at time 2. The other two cases, 0 is the final value so they are "churned". Can anyone help? I've been noodling on it all day, and am hitting a wall.
This isn't particularly elegant, but it should meet your needs:
with votes as (
select
id, time, voted,
max(time) over (partition by id) as max_time
from voter_data
)
select
v1.id, v1.time, v1.voted,
case
when v1.voted = 1 then 'active'
when v2.voted = 1 then 'skipped'
else 'churned'
end as skipped
from
votes v1
join votes v2 on
v1.id = v2.id and
v1.max_time = v2.time
In a nutshell, we first figure out which is the last record for each voter id, and then we do a self-join on the resulting table to isolate only that last id.
There is a chance this could produce multiple results -- if it's possible to have the same ID vote twice at the same time. If that's the case, you want row_number() instead of max().
Results on your data:
1 1 1 'active'
1 2 0 'skipped'
1 3 1 'active'
2 1 1 'active'
2 2 0 'churned'
2 3 0 'churned'
3 1 1 'active'
3 2 1 'active'
3 3 0 'churned'
Window functions can help for readability when working with self-referential joins.
WITH
add_last_voted_status AS (
SELECT
*
, LAST_VALUE(voted) OVER (
PARTITION BY id
ORDER BY time
) AS last_voted_status
FROM table
)
SELECT
id
, time
, voted
, CASE
WHEN last_voted_status = 0
THEN 'churned'
WHEN last_voted_status = 1 AND voted = 1
THEN 'active'
WHEN last_voted_status = 1 AND voted = 0
THEN 'skipped'
ELSE '?'
END AS skipped
FROM add_last_voted_status

SELECT record based upon dates

Assuming data such as the following:
ID EffDate Rate
1 12/12/2011 100
1 01/01/2012 110
1 02/01/2012 120
2 01/01/2012 40
2 02/01/2012 50
3 01/01/2012 25
3 03/01/2012 30
3 05/01/2012 35
How would I find the rate for ID 2 as of 1/15/2012?
Or, the rate for ID 1 for 1/15/2012?
In other words, how do I do a query that finds the correct rate when the date falls between the EffDate for two records? (Rate should be for the date prior to the selected date).
Thanks,
John
How about this:
SELECT Rate
FROM Table1
WHERE ID = 1 AND EffDate = (
SELECT MAX(EffDate)
FROM Table1
WHERE ID = 1 AND EffDate <= '2012-15-01');
Here's an SQL Fiddle to play with. I assume here that 'ID/EffDate' pair is unique for all table (at least the opposite doesn't make sense).
SELECT TOP 1 Rate FROM the_table
WHERE ID=whatever AND EffDate <='whatever'
ORDER BY EffDate DESC
if I read you right.
(edited to suit my idea of ms-sql which I have no idea about).

How to check if the sum of some records equals the difference between two other records in t-sql?

I have a view that contains bank account activity.
ACCOUNT BALANCE_ROW AMOUNT SORT_ORDER
111 1 0.00 1
111 0 10.00 2
111 0 -2.50 3
111 1 7.50 4
222 1 100.00 5
222 0 25.00 6
222 1 125.00 7
ACCOUNT = account number
BALANCE_ROW = either starting or ending
balance would be 1, otherwise 0
AMOUNT = the amount
SORT_ORDER =
simple order to return the records in the order of start balance,
activity, and end balance
I need to figure out a way to see if the sum of the non balance_row rows equal the difference between the ending balance and the starting balance. The result for each account (1 for yes, 0 for no) would be simply added to the resulting result set.
Example:
Account 111 had a starting balance of 0.00. There were two account activity records of 10.00 and -2.5. That resulted in the ending balance of 7.50.
I've been playing around with temp tables, but I was not sure if there is a more efficient way of accomplishing this.
Thanks for any input you may have!
I would use ranking, then group rows by ACCOUNT calculating totals along the way:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY ACCOUNT ORDER BY SORT_ORDER)
FROM data
),
grouped AS (
SELECT
ACCOUNT,
BALANCE_DIFF = SUM(CASE BALANCE_ROW WHEN 1 THEN AMOUNT END
* CASE rnk WHEN 1 THEN -1 ELSE 1 END),
ACTIVITY_SUM = SUM(CASE BALANCE_ROW WHEN 0 THEN AMOUNT ELSE 0 END)
FROM data
GROUP BY
ACCOUNT
)
SELECT *
FROM grouped
WHERE BALANCE_DIFF <> ACTIVITY_SUM
Ranking is only used here to make it easier to calculate the starting/ending balance difference. If starting and ending balance rows had, for instance, different BALANCE_ROW codes (like 1 for the starting balance, 2 for the ending one), it would be possible to avoid ranking.
Untested code, but should be really close for comparing the summed balance with the balance_row as you've defined in your question.
SELECT
Account, /* Account Number */
(select sum(B.amount) from yourview B
where B.balance_row = 0 and
B.account = A.account and
B.sort_order BETWEEN A.sort_order and
(select max(sort_order) /* previous sort order value on account */
from yourview C where
C.balance_row = 1 and
C.account = A.account and
C.sort_order < A.sort_order)
) AS Test_Balance, /* Test_Balance = sum of amounts since last balance row */
Balance_Row /* Value of balance row */
FROM yourview A
WHERE A.Balance_Row = 1