Postgres sequence that resets once the id is different - postgresql

I am trying to make a Postgres sequence that will reset once the id of the item it is linked to changes, e.g:
ID SEQUENCE_VALUE
1 1
2 1
1 2
1 3
2 2
3 1
I don't know PSQL or SQL in general very well and I can't find a similar question, any Help Is greatly appreciated!

Just use a normal sequence that does not reset and calculate the desired value in the query:
SELECT id,
row_number() OVER (PARTITION BY id
ORDER BY seq_col)
AS sequence_value
FROM mytable;
Here, seq_col is a column that is auto-generated from a sequence (an identity column).

Related

User Sessions | Month's Since Last Active Using SQL

UserID
CalMonth
ActiveFlag
Months_since_last_active
A
1/1/2021
1
0
A
2/1/2021
1
A
3/1/2021
2
A
4/1/2021
1
0
B
1/1/2021
1
0
B
2/1/2021
1
B
3/1/2021
1
0
Problem --> The first 3 colums are given. Generate the last one 'Months_since_last_active' by adding 1 until the use is active again
My Solution as below:
With active_sessions as (
Select
User_Id
, CalMonth
, active flag as current_flag
, LAG (ActiveFlag,1) over (partition by User_Id order by CalMonth) as previous_flag
)
Select User_Id, CalMonth, current_flag, sum(case when current_flag =1 then 0
when current_flag IS NULL then Months_since_last_active + 1
END
) as Months_since_last_active
from active_sessions
order by 1,2
I was asked the above question in an interview and told that my proposed solution would not work because:
When it comes to 3/1/2021 and beyond, the previous values of 'Months_since_last_active' are not in the table yet -- they are only in the code
If I wanted to use LAG function, then it'd take innumerable LAG functions to achieve what I was trying to achieve
I will appreciate if someone can comment on my solution.
Your solution has 3 major problems, 2 of them may be related to copy/past errors. The active_sessions CTE is missing the from clause, so there is no data source. Then the main portion uses the aggregate function SUM, however, the query has no group by which is required for the aggregate function. These are easily corrected. The other issue concerns the LAG function and your use of it.
First off in the CTE you alias the result as previous_flag, then in the main query you reference Months_since_last_active which does not exist yet. I think this is the source of the interviewer's first point.
The interviewer's second point also stems form the LAG function. As written it always looks back exactly 1 row, but from the current row yet it needs to look back 2 rows for (userid, calmonth) = ('A', 2021-03-01), and 3 rows for (A, 2021-04-01), etc. Basically you need to look back to to the last row with active_flag = 1. This leads directly to the it'd take innumerable LAG functions as you do not know how far beck you need to look. Suppose you had 30-40 or more inactive rows between active rows. You need a LAG(activeflag,n) ... for each possibility.
A solution. I dislike the problem statement it should not contain by adding 1 until the use is active again (is it yours or theirs). Either way this is an XY. If theirs they should be telling you what to solve, i.e. find number of months since last active. If yours you have created the problem for yourself. The problem statement should not say anything about how to solve the it. I will ignore that portion of the problem (And in a real interview I would/have ignored it, but be prepared to explain why).
What you have a a version of a Gaps And Islands (google it, you will find more that to think about). In this version lets consider each row with activeflag = 'Y' an as island, and anything else as a gap. Nor what you are looking for is the length of the gaps between islands. In the following the island_num CTE does 2 things. It assigns a sequence number to each row for a (userid, calmonth) and generates a boolean for each island. The `gap_points' then joins the results with itself, selecting the assigned for the max island whose calmonth is less than the current rows calmonth. In the main part the Months_since_last_active is assigned 0 if the current row is an island, and the difference between the generated row numbers if it is a gap. (see demo)
with island_num (userid, cal_month, active_flag, is_island, row_num) as
( select am.*
, case when am.activeflag = 1 then true else false end is_island
, row_number() over (partition by am.userid order by am.calmonth) rn
from active_month am
) -- select * from island_num
, gap_points(userid, cal_month, active_flag, is_island, row_num, island_row) as
( select *
from island_num i1
join lateral
(select max(row_num)
from island_num i2
where i1.userid = i2.userid
and i2.cal_month < i1.cal_month
and i2.is_island
) s0
on true
) --select * from gap_points;
select userid "User Id"
, cal_month "Cal Month"
, active_flag "Active Flag"
, case when is_island then 0
else row_num - island_row
end "Months_since_last_active"
from gap_points;

Put all the elements of a column into the same array (postgresql)

My question is the following ;
After a first query, I have a table with a single column of bigints, for example :
id
----
1
2
3
4
I would like to convert this column into a postgresql array, which would give - according to the example - {1,2,3,4}.
Any ideas about how to do that ?
Thank you for all your answers and have a nice day,
best regards
Use aggregation:
select array_agg(id)
from the_table;
If you need a specific sort order:
select array_agg(id order by id)
from the_table;

Postgres Pivot based on variable column to create a new id

I have the following table
type attribute order
1 11 1
1 12 2
2 11 1
2 12 2
3 15 1
3 16 2
4 15 1
4 16 2
I need to understand which types have identical attributes and then assign them a new id. The order column can be as well if it's helpful because each attribute can only have one order, but you don't need to use it.
Ideally the result set would be the following where you have a new id for each type that is based on the attributes in the first table.
type new_id
1 1
2 1
3 2
4 2
I was planning on trying to pivot the table based on the order column and concatenating the attribute id's to create a new id, but I cannot use crosstab and the number of attributes a type has could vary and I need to account for that.
Any suggestions on what to do here?
This works, there's possibly a better way to do it but it's what came to mind:
SELECT UNNEST(types) AS type, new_id
FROM (
SELECT ARRAY_AGG(type) AS types, ROW_NUMBER() OVER() AS new_id
FROM (
SELECT type, ARRAY_AGG(attribute ORDER BY attribute) AS attr
FROM t
GROUP BY type
) x
GROUP BY attr
) y
Output:
1;1
2;1
3;2
4;2
So first it gets the list of attributes for each type, then it gets the list of types for each common list of attributes (this is where it makes sure each type shares the same attributes) and gets a new id for each group of types. Then unnest that to put each type on a new row, and that row number is the new id.

CASE WHEN with COLLECT_SET

I have a toy table:
hive> SELECT * FROM ds.forgerock;
OK
forgerock.id forgerock.productname forgerock.description
1 OpenIDM Platform for building enterprise provisioning solutions
2 OpenAM Full-featured access management
3 OpenDJ Robust LDAP server for Java
4 OpenDJ desc2
4 OpenDJ desc2
Time taken: 0.083 seconds, Fetched: 5 row(s)
I am trying to get a table like:
id flag
1 0
2 0
3 1
4 1
I am using the toy table to iterate and develop working code.
SELECT id, CASE WHEN "OpenDJ" IN COLLECT_SET(productname) THEN 1 ELSE 0 END AS flag,
GROUP BY id FROM ds.forgerock;
Note that in the toy data set, every id only has one distinct value, so COLLECT_SET doesn't seem necessary. However, given the actual data set actually has more than one distinct value, what I am trying to do will make more sense.
Use max() for flag aggregation by id:
SELECT id, max(CASE WHEN productname='OpenDJ' THEN 1 ELSE 0 END) AS flag
FROM ds.forgerock
GROUP BY id;

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.