Making a dynamic custom series in postgresql (avoiding loop if possible) - postgresql

im new to postgresql and i'm trying to do something that requires a loop in T-SQL
did some research on how to loop in postgresql and found out that i should make some sort of function first and i'm trying to avoid that.
I have a main table of
SELECT 17 as employeecount, 'Aug-2020' as month
UNION
SELECT 22, 'Sep-2020'
UNION
SELECT 27, 'Oct-2020'
I would need an output that increments 1 to x(employeecount) per month like below:
SELECT 1 as employeecount, 'Aug-2020' as month
UNION
SELECT 2, 'Aug-2020'
UNION
SELECT 3, 'Aug-2020'
........... up to 17, 'Aug-2020'
UNION
SELECT 1, 'Sep-2020'
UNION
... up to 22, 'Sep-2020' and so on
or
---------------------
increment | Month |
1 | Aug-2020|
2 | Aug-2020|
3 | Aug-2020|
4 | Aug-2020|
. | Aug-2020|
. | Aug-2020|
17 | Aug-2020|
1 | Sep-2020|
2 | Sep-2020|
. | Sep-2020|
. | Sep-2020|
22 | Sep-2020|
I'm trying to avoid looping but if there's no other way, then it'd be fine.
Thanks in advance!

Use a lateral join to generate_series():
with main (employeecount, month) as (
values (17, 'Aug-2020'), (22, 'Sep-2020'), (27, 'Oct-2020')
)
select increment, month
from main
cross join lateral generate_series(1, employeecount) as gs(increment);

Related

Get unique values across two columns

I have a table that looks like this:
id | col_1 | col_2
------+------------------+-------------------
1 | 12 | 15
2 | 12 | 16
3 | 12 | 17
4 | 13 | 18
5 | 14 | 18
6 | 14 | 19
7 | 15 | 19
8 | 16 | 20
I know if I do something like this, it will return all unique values from col_1:
select distinct(col_1) from table;
Is there a way I can get the distinct values across two columns? So my output would only be:
12
13
14
15
16
17
18
19
20
That is, it would take the distinct values from col_1 and add them to col_2's distinct values while also removing any values that are in both distinct lists (such as 15 which appears in both col_1 and col_2
You can use a UNION
select col_1
from the_table
union
select col_2
from the_table;
union implies a distinct operation, the above is the same as:
select distinct col
from (
select col_1 as col
from the_table
union all
select col_2 as col
from the_table
) x
You will need to use union
select col1 from table
union
select col2 from table;
You will not need distinct here because a union automatically does that for you (as opposed to a union all).

Find unique entities with multiple UUID identifiers in redshift

Having an event table with multiple types of UUID's per user, we would like to come up with a way to stitch all those UUIDs together to get the highest possible definition of a single user.
For example:
UUID1 | UUID2
1 a
1 a
2 a
2 b
3 c
4 c
There are 2 users here, the first one with uuid1={1,2} and uuid2={a,b}, the second one with uuid1={3,4} and uuid2={c}. These chains could potentially be very long. There are no intersections (i.e. 1c doesn't exist) and all rows are timestamp ordered.
Is there a way in redshift to generate these unique "guest" identifiers without creating an immense query with many joins?
Thanks in advance!
Create test data table
-- DROP TABLE uuid_test;
CREATE TEMP TABLE uuid_test AS
SELECT 1 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 2 row_id, 1::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 3 row_id, 2::int uuid1, 'a'::char(1) uuid2
UNION ALL SELECT 4 row_id, 2::int uuid1, 'b'::char(1) uuid2
UNION ALL SELECT 5 row_id, 3::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 6 row_id, 4::int uuid1, 'c'::char(1) uuid2
UNION ALL SELECT 7 row_id, 4::int uuid1, 'd'::char(1) uuid2
UNION ALL SELECT 8 row_id, 5::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 9 row_id, 6::int uuid1, 'e'::char(1) uuid2
UNION ALL SELECT 10 row_id, 6::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 11 row_id, 7::int uuid1, 'f'::char(1) uuid2
UNION ALL SELECT 12 row_id, 8::int uuid1, 'g'::char(1) uuid2
UNION ALL SELECT 13 row_id, 8::int uuid1, 'h'::char(1) uuid2
;
The actual problem is solved by using strict ordering to find every place where the unique user changes, capturing that as a lookup table and then applying it to the original data.
-- Create lookup table with a from-to range of IDs for each unique user
WITH unique_user AS (
-- Calculate the end of the id range using LEAD() to look ahead
-- Use an inline MAX() to find the ending ID for the last entry
SELECT row_id AS from_id
, NVL(LEAD(row_id,1) OVER (ORDER BY row_id)-1, (SELECT MAX(row_id) FROM uuid_test) ) AS to_id
, unique_uuid
-- Mark unique user change when there is discontinuity in either UUID
FROM (SELECT row_id
,CASE WHEN NVL(LAG(uuid1,1) OVER (ORDER BY row_id), 0) <> uuid1
AND NVL(LAG(uuid2,1) OVER (ORDER BY row_id), '') <> uuid2
THEN MD5(uuid1||uuid2)
ELSE NULL END unique_uuid
FROM uuid_test) t
WHERE unique_uuid IS NOT NULL
ORDER BY row_id
)
-- Apply the unique user value to each row using a range join to the lookup table
SELECT a.row_id, a.uuid1, a.uuid2, b.unique_uuid
FROM uuid_test AS a
JOIN unique_user AS b
ON a.row_id BETWEEN b.from_id AND b.to_id
ORDER BY a.row_id
;
Here's the output
row_id | uuid1 | uuid2 | unique_uuid
--------+-------+-------+----------------------------------
1 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
2 | 1 | a | efaa153b0f682ae5170a3184fa0df28c
3 | 2 | a | efaa153b0f682ae5170a3184fa0df28c
4 | 2 | b | efaa153b0f682ae5170a3184fa0df28c
5 | 3 | c | 5fcfcb7df376059d0075cb892b2cc37f
6 | 4 | c | 5fcfcb7df376059d0075cb892b2cc37f
7 | 4 | d | 5fcfcb7df376059d0075cb892b2cc37f
8 | 5 | e | 18a368e1052b5aa0388ef020dd9a1e20
9 | 6 | e | 18a368e1052b5aa0388ef020dd9a1e20
10 | 6 | f | 18a368e1052b5aa0388ef020dd9a1e20
11 | 7 | f | 18a368e1052b5aa0388ef020dd9a1e20
12 | 8 | g | 321fcc2447163a81d470b9353e394121
13 | 8 | h | 321fcc2447163a81d470b9353e394121

How to retrieve top 3 results for each column in postgresql?

I have given a question. The table looks like this..
STATE | year1 | ... | year 10
AP | 100 | ... | 120
assam | 13 | .. | 42
madhya pradesh | 214 | ... | 421
Now, I need to get the top - 3 states for each year.
I tried everything possible. But, I am not able to filter results per column.
You have a design problem. The enumerated column are almost always a sign of bad design.
For now you could unpivot using unnest and then use window function row_number to get the top 3 states per year:
with unpivoted as (
select state,
unnest(array[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]) as year,
unnest(array[
year_1, year_2, year_3,
year_4, year_5, year_6,
year_7, year_8, year_9,
year_10
]) as value,
from your_table
)
select *
from (
select t.*,
row_number() over (
partition by year
order by value desc
) as seqnum
from unpivoted t
) t
where seqnum <= 3;
Demo

First and second time appearing row id in PostgreSQL

Suppose we have a list of ids with date. And we want to know when the ids appeared for the first and the second time. About the first time, I have created a query that is
SELECT year, mon, COUNT(id) AS sum_first_id
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;
I think that this works. But how could I find when the ids appear for the second time?
Let's say you have the table table_x:
select *
from table_x
order by 1, 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
1 | 2015-06-14
2 | 2015-06-05
2 | 2015-06-08
2 | 2015-06-10
2 | 2015-06-17
2 | 2015-06-22
(8 rows)
To select n first element in groups use row_number() function:
select id, date
from (
select id, date, row_number() over (partition by id order by date) rn
from table_x
order by 1, 2
) sub
where rn <= 2
id | date
----+------------
1 | 2015-06-04
1 | 2015-06-05
2 | 2015-06-05
2 | 2015-06-08
(4 rows)
It does not appear that your query is correct.
SELECT year, mon, COUNT(id) AS sum_first_id -- what is year, mon?
FROM (
SELECT DISTINCT
ON (id) DATE, id
FROM TABLE
GROUP BY 2, 1 -- should be order by 2, 1
) AS foo
GROUP BY 2, 1
ORDER BY 1, 2;

Joining many tables on same data and returning all rows

UPDATE:
my orgional attempt to use FULL OUTER JOIN did not work correctly. I have updated the question to reflex the true issue. Sorry for presenting a classic XY PROBLEM.
I'm trying to retrieve a dataset from multiple tables all in one query thats is grouped by year, month of the data.
The final result should look like this:
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | 14 |
Coming from data that looks like this:
Table 1:
| Year | Month | Data |
|------+-------+------|
| 2012 | 11 | 231 |
| 2012 | 12 | 534 |
Table 2:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 12 |
| 2013 | 1 | 22 |
Table 3:
| Year | Month | Data |
|------+-------+------|
| 2012 | 12 | 13 |
| 2013 | 1 | 14 |
I tried using FULL OUTER JOIN but this doesn't quite work because in my SELECT clause because no matter which table I select 'Year' and 'Month' from there are null values.
SELECT
Collase(t1.year,t2.year,t3.year)
,Collese(t1.month,t2.month,t3.month)
,t1.data as col1
,t2.data as col2
,t3.data as col3
From t1
FULL OUTER JOIN t2
on t1.year = t2.year and t1.month = t2.month
FULL OUTER JOIN t3
on t1.year = t3.year and t1.month = t3.month
Result is something like this (is too confusing to repeat exactly what i would get using this demo data):
| Year | Month | Col1 | Col2 | Col3 |
|------+-------+------+------+------|
| 2012 | 11 | 231 | - | - |
| 2012 | 12 | 534 | 12 | 13 |
| 2013 | 1 | - | 22 | |
| - | 1 | - | - | 14 |
If your data allows it (not 100 columns), this is usually a clean way of doing it:
select year, month, sum(col1) as col1, sum(col2) as col2, sum(col3) as col3
from (
SELECT t1.year, t1.month, t1.data as col1, 0 as col2, 0 as col3
From t1
union all
SELECT t2.year, t2.month, 0 as col1, t2.data as col2, 0 as col3
From t2
union all
SELECT t3.year, t3.month, 0 as col1, 0 as col2, t3.data as col3
From t3
) as data
group by year, month
If you are using SQL Server 2005 or later version, you could also try this PIVOT solution:
SELECT
Year,
Month,
Col1,
Col2,
Col3
FROM (
SELECT Year, Month, 'Col1' AS Col, Data FROM t1
UNION ALL
SELECT Year, Month, 'Col2' AS Col, Data FROM t2
UNION ALL
SELECT Year, Month, 'Col3' AS Col, Data FROM t3
) f
PIVOT (
SUM(Data) FOR Col IN (Col1, Col2, Col3)
) p
;
This query can be tested and played with at SQL Fiddle.
Perhaps you are looking for the COALESCE keyword? It takes a list of columns and returns the first one that is NOT NULL, or NULL if all arguments are null. In your example, you would do something like this.
SELECT COALESCE(t1.data, t2.data)
You would still need to join tables in this case. It would just cut down on the case statements.
You could derive the complete list of years and months from all the tables, than join every table to that list (using a left join):
SELECT
f.Year,
f.Month,
t1.data AS col1,
t2.data AS col2,
t3.data AS col3
FROM (
SELECT Year, Month FROM t1
UNION
SELECT Year, Month FROM t2
UNION
SELECT Year, Month FROM t3
) f
LEFT JOIN t1 ON f.year = t1.year and f.month = t1.month
LEFT JOIN t2 ON f.year = t2.year and f.month = t2.month
LEFT JOIN t3 ON f.year = t3.year and f.month = t3.month
;
You can see a live demonstration of this query at SQL Fiddle.
if you are looking for the non-null values from either tabloe then you will have to add t1.dat IS NOT NULL as well. I hope that I understand your question.
CREATE VIEW joined_SALES
AS SELECT t1.year, t1.month, t1.data , t2.data
FROM table1 t1, table2 t2
WHERE
t1.year = t2.year
and t1.month = t2.month
and t1.dat IS NOT NULL
GROUP BY t1.year, t1.month;
This might be a better way, especially if you are going to do something with the data before returning it. Basically you are translating the table the data came from into a typeId.
declare #temp table
([year] int,
[month] int,
typeId int,
data decimal)
insert into #temp
SELECT t1.year, t1.month, 1, sum(t1.data)
From t1
group by t1.year, t1.month
insert into #temp
SELECT t2.year, t2.month, 2, sum(t2.data)
From t2
group by t1.year, t1.month
insert into #temp
SELECT t3.year, t3.month, 3, sum(t3.data)
group by t1.year, t1.month
select t.year, t.month,
sum(case when t.typeId = 1 then t.data end) as col1,
sum(case when t.typeId = 2 then t.data end) as col2,
sum(case when t.typeId = 3 then t.data end) as col3
from #temp t
group by t.year, t.month