how to get last known contiguous value in postgres ltree field? - postgresql

I have a child table called wbs_numbers. the primary key id is a ltree
A typical example is
id
series_id
abc.xyz.00001
1
abc.xyz.00002
1
abc.xyz.00003
1
abc.xyz.00101
1
so the parent table called series. it has a field called last_contigous_max.
given the above example, i want the series of id 1 to have its last contigous max be 3
can always assume that the ltree of wbs is always 3 fragment separated by dot. and the last fragment is always a 5 digit numeric string left padded by zero. can always assume the first child is always ending with 00001 and the theoretical total children of a series will never exceed 9999.
If you think of it as gaps and islands, the wbs_numbers will never start with a gap within a series. it will always start with an island.
meaning to say this is not possible.
id
series_id
abc.xyz.00010
1
abc.xyz.00011
1
abc.xyz.00012
1
abc.xyz.00101
1
This is possible
id
series_id
abc.xyz.00001
1
abc.xyz.00004
1
abc.xyz.00005
1
abc.xyz.00051
1
abc.xyz.00052
1
abc.xyz.00100
1
abc.xyz.10001
2
abc.xyz.10002
2
abc.xyz.10003
2
abc.xyz.10051
2
abc.xyz.10052
2
abc.xyz.10100
2
abc.xyz.20001
3
abc.xyz.20002
3
abc.xyz.20003
3
abc.xyz.20004
3
abc.xyz.20052
3
abc.xyz.20100
3
so the last max contiguous in this case is
for series id 1 => 1
for series id 2 => 3
for series id 3 => 4
What's the query to calculate the last_contigous_max number for any given series_id?
I also don't mind having another table just to store "islands".
Also, you can safely assume that wbs_number records will never be deleted once created. The id in the wbs_numbers table will never be altered once filled in as well.
Meaning to say islands will only grow and never shrink.

You can carry out your problem following these steps:
extract your integer value from your "id" field
compute a ranking value sided with your id value
filter out when your ranking value does not match your id value
get tied last row for each of your matches
WITH cte AS (
SELECT *, CAST(RIGHT(id_, 4) AS INTEGER) AS idval
FROM tab
), ranked AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY series_id ORDER BY idval) AS rn
FROM cte
)
SELECT series_id, idval
FROM ranked
WHERE idval = rn
ORDER BY ROW_NUMBER() OVER(PARTITION BY series_id ORDER BY idval DESC)
FETCH FIRST ROWS WITH TIES
Check the demo here.

Related

Create Row Number based column and repeating for another

I want to create Row Number by Column based on each Product ID and back to 1 for another Product ID.
Pack ID
Product ID
Row Number
A001
P001
1
A002
P001
2
A003
P001
3
A004
P002
1
A005
P002
2
A006
P003
1
A007
P004
1
A008
P004
2
What query should I write ?
Use the ROW_NUMBER window function (see here and here).
select pack_id
, product_id
, row_number() over (partition by product_id
order by pack_id) "row number"
from test;
demo.

Replace content in 'order' column with sequential numbers

I have an 'order' column in a table in a postgres database that has a lot of missing numbers in the sequence. I am having a problem figuring out how to replace the numbers currently in the column, with new ones that are incremental (see examples).
What I have:
id order name
---------------
1 50 Anna
2 13 John
3 2 Bruce
4 5 David
What I want:
id order name
---------------
1 4 Anna
2 3 John
3 1 Bruce
4 2 David
The row containing the lowest order number in the old version of the column should get the new order number '1', the next after that should get '2' etc.
You can use the window function row_number() to calculate the new numbers. The result of that can be used in an update statement:
update the_table
set "order" = t.rn
from (
select id, row_number() over (order by "order") as rn
from the_table
) t
where t.id = the_table.id;
This assumes that id is the primary key of that table.

How to find end point of same value with interval?

I have a table like this below.
ID Time State
1 1 "active"
1 2 "active"
1 3 "active"
1 4 "inactive"
2 2 "inactive"
2 3 "active"
3 1 "active"
3 3 "active"
3 4 "inactive"
I want to sort table with start/end time by state.
It might need lag() window function but I don't know how to find end point of same state.
My expected table should look like this.
ID Start End State
1 1 4 "active"
1 4 NULL "inactive"
2 2 3 "inactive"
2 3 NULL "active"
3 1 4 "active"
3 4 NULL "inactive"
demo:db<>fiddle
SELECT DISTINCT ON (sum) -- 5
id,
-- 4
first_value(time) OVER (PARTITION BY sum ORDER BY time) as start,
first_value(lead) OVER (PARTITION BY sum ORDER BY time DESC) as end,
state
FROM (
SELECT
*,
-- 3
SUM(CASE WHEN is_prev_state THEN 0 ELSE 1 END) OVER (ORDER BY id, time)
FROM (
SELECT
*,
-- 1
lead(time) OVER (PARTITION BY id ORDER BY time),
-- 2
state = lag(state) OVER (PARTITION BY id ORDER BY time) as is_prev_state
FROM states
)s
)s
lead() takes the next value to the current row. To e.g. the time == 4 (id == 1) goes to the row with time == 3. The idea is to get a possible end of group onto the right row.
lag() does the opposite thing. It takes the previous value the current row. With that I can check whether a state has changed or not: Is the current state the same of the last one.
With this line I create the groups for every single state: If state change happened sum up one value. If not hold the same value (adding 0).
Now I have the possible last value per state (given through (1)) and can get the first value. This is done with the window function first_value() which gives you the first value of an ordered group. To get the last value you just have to order the group descending. (Why not using last_value())
DISTINCT ON filters only the very first row of the (with SUM() function generated) group

How can 'brand new, never before seen' IDs be counted per month in redshift?

A fair amount of material is available detailing methods utilising dense_rank() and the like to count distinct somethings per month, however, I've been unable to find anything that allows a count of distinct per month which also removes/discounts any id's that have been seen in prior month groups.
The data can be imagined like so:
id (int8 type) | observed time (timestamp utc)
------------------
1 | 2017-01-01
2 | 2017-01-02
1 | 2017-01-02
1 | 2017-02-02
2 | 2017-02-03
3 | 2017-02-04
1 | 2017-03-01
3 | 2017-03-01
4 | 2017-03-01
5 | 2017-03-02
The process of the count can be seen as:
1: in 2017-01 we saw devices 1 and 2 so the count is 2
2: in 2017-02 we saw devices 1, 2 and 3. We know already about devices 1 and 2, but not 3, so the count is 1
3: in 2017-03 we saw devices 1, 3, 4 and 5. We already know about 1 and 3, but not 4 or 5, so the count is 2.
with the desired output being something like:
observed time | count of new id
--------------------------
2017-01 | 2
2017-02 | 1
2017-03 | 2
Explicitly, I am looking to have a new table, with an aggregated month per row, with a count of how many new ids occur within that month that have not been seen at all before.
The IRL case allows devices to be seen more than once in a month, but this shouldn't impact the count. It also uses integer for storage (both positive and negative) of the id, and time periods will be to the second in true timestamps. The size of the data set is also significant.
My initial attempt is along the lines of:
WITH records_months AS (
SELECT *,
date_trunc('month', observed_time) AS month_group
FROM my_table
WHERE observed_time > '2017-01-01')
id_months AS (
SELECT DISTINCT
month_group,
id
FROM records_months
GROUP BY month_group, id)
SELECT *
FROM id-months
However, I'm stuck on the next part i.e counting the number of new ID that were not seen in prior months. I believe the solution might be a window function, but I'm having trouble working out which or how.
First thing I thought of. The idea is to
(innermost query) calculate the earliest month that each id was seen,
(next level up) join that back to the main my_table dataset, and then
(outer query) count distinct ids by month after nulling out the already-seen ids.
I tested it out and got the desired result set. Joining the earliest month back to the original table seemed like the most natural thing to do (vs. a window function). Hopefully this is performant enough for your Redshift!
select observed_month,
-- Null out the id if the observed_month that we're grouping by
-- is NOT the earliest month that the id was seen.
-- Then count distinct id
count(distinct(case when observed_month != earliest_month then null else id end)) as num_new_ids
from (
select t.id,
date_trunc('month', t.observed_time) as observed_month,
earliest.earliest_month
from my_table t
join (
-- What's the earliest month an id was seen?
select id,
date_trunc('month', min(observed_time)) as earliest_month
from my_table
group by 1
) earliest
on t.id = earliest.id
)
group by 1
order by 1;

T-SQL table variable data order

I have a UDF which returns table variable like
--
--
RETURNS #ElementTable TABLE
(
ElementID INT IDENTITY(1,1) PRIMARY KEY NOT NULL,
ElementValue VARCHAR(MAX)
)
AS
--
--
Is the order of data in this table variable guaranteed to be same as the order data is inserted into it. e.g. if I issue
INSERT INTO #ElementTable(ElementValue) VALUES ('1')
INSERT INTO #ElementTable(ElementValue) VALUES ('2')
INSERT INTO #ElementTable(ElementValue) VALUES ('3')
I expect data will always be returned in that order when I say
select ElementValue from #ElementTable --Here I don't use order by
EDIT:
If order by is not guaranteed then the following query
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross Apply dbo.MyFunc T2
order by t1.elementid
will not produce 9x9 matrix as
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3
consistently.
Is there any possibility that it could be like
1 2
1 1
1 3
2 3
2 2
2 1
3 1
3 2
3 3
How to do it using my above function?
No, the order is not guaranteed to be the same.
Unless, of course you are using ORDER BY. Then it is guaranteed to be the same.
Given your update, you obtain it in the obvious way - you ask the system to give you the results in the order you want:
SELECT T1.ElementValue,T2.ElementValue FROM dbo.MyFunc() T1
Cross join dbo.MyFunc() T2
order by t1.elementid, t2.elementid
You are guaranteed that if you're using inefficient single row inserts within your UDF, that the IDENTITY values will match the order in which the individual INSERT statements were specified.
Order is not guaranteed.
But if all you want is just simply to get your records back in the same order you inserted them, then just order by your primary key. Since you already have that field setup as an auto-increment, it should suffice.
...or use a deterministic function
SELECT TOP 9
M1 = (ROW_NUMBER() OVER(ORDER BY id) + 2) / 3,
M2 = (ROW_NUMBER() OVER(ORDER BY id) + 2) % 3 + 1
FROM
sysobjects
M1 M2
1 1
1 2
1 3
2 1
2 2
2 3
3 1
3 2
3 3