Postgresql query first and last in every range - postgresql

I have table
id
machineid
reset
1
1
false
2
1
false
3
1
false
4
1
true
5
1
false
15
1
true
17
1
false
20
2
false
21
2
false
25
2
false
30
2
false
I cant figure out how to find first and last id for every machine. Reset create new range for next rows. Result should look like:
machineid
startid
endid
1
1
3
1
4
5
1
15
17
2
20
30

you can start from grouping your records into groups or ranges. As the order of your records matter, it indicates you can make use of window functions. You have to determine how you are going to uniquely name these groups. I suggest you use the number of resets above the record. This result to this statement:
SELECT *
, SUM(case when reset then 1 else 0 end) over (partition by machineid order by id) as reset_group
FROM
test;
After that finding the start and end ids is a simple GROUP BY statement:
SELECT
machineid, MIN(id) as startid, MAX(id) as endid
FROM (
SELECT machineid, id
, SUM(case when reset then 1 else 0 end) over (partition by machineid order by id) as reset_group
FROM
test
) as grouped
GROUP BY
machineid, reset_group
ORDER BY
machineid, startid;
Please try it out: db<>fiddle

Related

How can I evaluate data over time in Postgresql?

I need to find users who have posted three times or more, three months in a row. I wrote this query:
select count(id), owneruserid, extract(month from creationdate) as postmonth from posts
group by owneruserid, postmonth
having count(id) >=3
order by owneruserid, postmonth
And I get this:
count owneruserid postmonth
36 -1 1
23 -1 2
45 -1 3
41 -1 4
18 -1 5
24 -1 6
31 -1 7
78 -1 8
83 -1 9
17 -1 10
88 -1 11
127 -1 12
3 6 11
3 7 12
4 8 1
8 8 12
4 12 4
3 12 5
3 22 2
4 22 4
(truncated)
Which is great. How can I query for users who posted three times or more, three months or more in a row? Thanks.
This is called the Islands and Gaps problem, specifically it's an Island problem with a date range. You should,
Fix this question up.
Flag it to be sent to dba.stackexchange.com
To solve this,
Create a pseudo column with a window that has 1 if the row preceding it does not correspond to the preceding mont
Create groups out of that with COUNT()
Check to make sure the count(*) for the group is greater than or equal to three.
Query,
SELECT l.id, creationdaterange, count(*)
FROM (
SELECT t.id,
t.creationdate,
count(range_reset) OVER (PARTITION BY t.id ORDER BY creationdate) AS creationdaterange
FROM (
SELECT id,
creationdate,
CASE
WHEN date_trunc('month',creationdate::date)::date - interval '1 month' = date_trunc('month',lag(creationdate))::date OVER (PARTITION BY id ORDER BY creationdate)
THEN 1
END AS range_reset
FROM post
ORDER BY id, creationdate
) AS t;
) AS l
GROUP BY t.id, creationdaterange
HAVING count(*) >= 3;

Find last occurring value within record in PostgreSQL

I'm not new to SQL, but I am new to PostgreSQL and am really struggling to adapt my current knowledge in a different environment.
I am trying to create a variable that captures whether or not someone stays active, skips, or churns within a 0/1 time series variable. For example, in the data below, my dataset would include the variables id,time, and voted, and I would create the variable "skipped":
id time voted skipped
1 1 1 active
1 2 0 skipped
1 3 1 active
2 1 1 active
2 2 0 churned
2 3 0 churned
3 1 1 active
3 2 1 active
3 3 0 churned
The rule for coding "skipped" is pretty simple: If 1 is the last record, the person is "active" and any zeroes count as "skipped", but if 0 is the last record, the person is "churned".
The record with id = 1 is a skip because id is non-zero at time 3 after being 0 at time 2. The other two cases, 0 is the final value so they are "churned". Can anyone help? I've been noodling on it all day, and am hitting a wall.
This isn't particularly elegant, but it should meet your needs:
with votes as (
select
id, time, voted,
max(time) over (partition by id) as max_time
from voter_data
)
select
v1.id, v1.time, v1.voted,
case
when v1.voted = 1 then 'active'
when v2.voted = 1 then 'skipped'
else 'churned'
end as skipped
from
votes v1
join votes v2 on
v1.id = v2.id and
v1.max_time = v2.time
In a nutshell, we first figure out which is the last record for each voter id, and then we do a self-join on the resulting table to isolate only that last id.
There is a chance this could produce multiple results -- if it's possible to have the same ID vote twice at the same time. If that's the case, you want row_number() instead of max().
Results on your data:
1 1 1 'active'
1 2 0 'skipped'
1 3 1 'active'
2 1 1 'active'
2 2 0 'churned'
2 3 0 'churned'
3 1 1 'active'
3 2 1 'active'
3 3 0 'churned'
Window functions can help for readability when working with self-referential joins.
WITH
add_last_voted_status AS (
SELECT
*
, LAST_VALUE(voted) OVER (
PARTITION BY id
ORDER BY time
) AS last_voted_status
FROM table
)
SELECT
id
, time
, voted
, CASE
WHEN last_voted_status = 0
THEN 'churned'
WHEN last_voted_status = 1 AND voted = 1
THEN 'active'
WHEN last_voted_status = 1 AND voted = 0
THEN 'skipped'
ELSE '?'
END AS skipped
FROM add_last_voted_status

how to combine multiple query into one single query

I have three queries as below and I need to combine them into one. Does any body know how to do that?
select COUNT(*) from dbo.VWAnswer where questionId =2 and answer =1
select COUNT(*) from dbo.VWAnswer where questionId =3 and answer =4
select COUNT(*) from dbo.VWAnswer where questionId =5 and answer =2
I want to find out total count of those people whose gender = 1 and Education = 4 and marital status = 2
Following is the table columns(With one ex) that i refer:
questionId questionText anwser AnserSheetID
1 Gender 1 1
2 Qualification 4 1
3 Marital Status 2 1
1 Gender 2 2
2 Qualification 1 2
3 Marital Status 2 2
1 Gender 1 3
2 Qualification 3 3
3 Marital Status 1 3
Basically, these are questions answered by different people whose answers are stored in this table.
So if we consider above table entries I should get 1 as total count based upon above 3 conditions i.e. gender = 1 and Education = 4 and marital status = 2
Can someone tell me what I need to do to get this to work?
If you want to combine your three count queries, you can try the below SQL to get it done.
select
sum(case when questionId =2 and anwser=1 then 1 else 0 end) as FCount,
sum(case when questionId =3 and anwser=4 then 1 else 0 end) as SCount,
sum(case when questionId =5 and anwser=2 then 1 else 0 end) as TCount
from dbo.VWAnswer
Update 1:
select
Sum(case when questionText='Gender' and anwser='1' then 1 else 0 end) as GenderCount,
Sum(case when questionText='Qualification' and anwser='4' then 1 else 0 end) as EducationCount,
Sum(case when questionText='Marital Status' and anwser='2' then 1 else 0 end) as MaritalCount
from VWAnswer
We can only get the counts based on the rows and every condition should apply in each row.
You might use a joined view meeting you conditions and select the count of the rows fitting your conditions.
Select COUNT(*) as cnt from
(
Select a.AnserSheetID
from VWAnswer a
Join VWAnswer b on a.AnserSheetID=b.AnserSheetID and b.questionId = 2 and b.anwser=4
Join VWAnswer c on a.AnserSheetID=c.AnserSheetID and c.questionId = 3 and c.anwser=2
where a.questionId=1 and a.anwser=1
) hlp

Count valid values per user

I have a table with a list of values. -1 is a blank value:
ID FieldType1A FieldType1B FieldType2A FieldType2B Person
1 15 14 10 -1 1
2 16 -1 12 10 1
3 17 -1 5 6 1
4 6 -1 7 -1 2
...
So the result should be:
Person FieldType1 FieldType2
1 4 5
2 1 1
there is a users table with a list of user IDs, would there be a way of iterating over that list of values to generate the person list in the result set (0 for the field types being perfectly valid as it is merely counts)? I think the answer to T-SQL Column Values Count is a step in the direction I'm attempting to go, but unsure how to combine columns that are the same (the A/Bs allow for a list of answers). That and I'm interested in combining all valid values as not attempting to count the number of each valid response.
You can use a CASE expression to change all non-negative-one values to 1, and -1 values to 0, and then sum them up.
SELECT Person,
SUM(CASE WHEN FieldType1A <> -1 THEN 1 ELSE 0 END) +
SUM(CASE WHEN FieldType1B <> -1 THEN 1 ELSE 0 END) AS FieldType1,
SUM(CASE WHEN FieldType2A <> -1 THEN 1 ELSE 0 END) +
SUM(CASE WHEN FieldType2B <> -1 THEN 1 ELSE 0 END) AS FieldType2
FROM YourTable
GROUP BY Person
SELECT Person,
count(nullif(FieldType1A, -1)) + count(nullif(FieldType1B, -1)) as FieldType1,
count(nullif(FieldType2A, -1)) + count(nullif(FieldType2B, -1)) as FieldType2
FROM yourtable
GROUP BY person

Conditional summarizing columns

I have the following situation
ID Value
1 50
1 60
2 70
2 80
1 0
2 50
I need to run a query that would return summed value, grouped by ID. The catch is if the value is 0, then the entire sum should be 0.
Query results would be
ID Value
1 0
2 200
I tried
select ID, case
when Value> 0 then sum(Value) * 1
when Value= 0 then sum(value) * 0
end
from table
but that did not work.
select ID,
sum(value)*sign(min(abs(value))) as [sum(value)]
from YourTable
group by ID
With a case if you like:
select ID,
case sign(min(abs(value)))
when 0 then 0
else sum(value)
end as [sum(value)]
from YourTable
group by ID