TSQL : Find combinations of rows within group without a cross joins - tsql

I'm trying to develop a T-SQL routine (SQL Server 2014), which will allow me to find all combinations of records within group.
Given the following data:
ID_COMBINATION | ID_POSITION | MULTIPLY_FACTOR
-----------------------------------------------
1 | 1 | 1
1 | 1 | 2
1 | 1 | 3
1 | 2 | 1
1 | 2 | 2
1 | 2 | 3
I would like to calculate a combination of MULTIPLY_FACTOR for full set of ID_POSITIONS for a given ID_COMBINATION
The result should be:
1 | 1 | 1
1 | 2 | 1
1 | 1 | 1
1 | 2 | 2
1 | 1 | 1
1 | 2 | 3
...
1 | 1 | 3
1 | 2 | 3
For the moment I prefer to have a closed routine definition (over using dynamic SQL to generate multi cross joins code at run-time, depending on the number of unique ID_POSITIONS within a group)
Thank you very much for your help!
EDIT:
The following TSQL code calculates combinations of unique ID_POSITION for a given ID_COMBINATION 1:
declare #Samples as Table ( Id_Combination Int, Id_Position Int, Multiply_Factor Int );
INSERT INTO #Samples (Id_Combination, Id_Position, Multiply_Factor)
VALUES (1, 1, 1), (1, 1, 2), (1, 1, 3)
, (1, 2, 1), (1, 2, 2), (1, 2, 3)
SELECT
S1.Id_Combination
,S1.Id_Position AS s1_idpos
,S1.Multiply_Factor AS s1_mufac
,S2.Id_Position AS s2_idpos
,S2.Multiply_Factor AS s2_mufac
FROM #Samples AS S1
INNER JOIN #Samples AS S2
ON s1.Id_Combination = s2.Id_Combination
AND s1.Id_Position < s2.Id_Position
However, if I add a new ID_POSITION key with respective MULTPLY_FACTOR values I will have to modify join conditions and select statement to cover new scenarios, like:
declare #Samples as Table ( Id_Combination Int, Id_Position Int, Multiply_Factor Int );
INSERT INTO #Samples (Id_Combination, Id_Position, Multiply_Factor)
VALUES (1, 1, 1), (1, 1, 2), (1, 1, 3)
,(1, 2, 1), (1, 2, 2), (1, 2, 3),
,(1, 3, 1), (1, 3, 2), (1, 3, 3);
SELECT
S1.Id_Combination
,S1.Id_Position AS s1_idpos
,S1.Multiply_Factor AS s1_mufac
,S2.Id_Position AS s2_idpos
,S2.Multiply_Factor AS s2_mufac
,S3.Id_Position AS s3_idpos
,S3.Multiply_Factor AS s3_mufac
FROM #Samples AS S1
INNER JOIN #Samples AS S2
ON s1.Id_Combination = s2.Id_Combination
AND s1.Id_Position < s2.Id_Position
INNER JOIN #Samples AS S3
ON s2.Id_Combination = s3.Id_Combination
AND s2.Id_Position < s3.Id_Position
Getting back to my question general idea: how to write "generic" TSQL code here, which will cover all possible, future values from the ID_POSITION domain and present values vertically rather then adding new fields in SELECT clause.
For sure, some SUB_COMBINATION key will have to be introduced, to make those combinations distinct within each other inside a parent ID_COMBINATION...

Since I can't figure out your comment re: "a size of ID_POSITION", I'll just ask why it isn't this easy:
-- Sample data.
declare #Samples as Table ( Id_Combination Int, Id_Position Int, Multiply_Factor Int );
insert into #Samples ( Id_Combination, Id_Position, Multiply_Factor ) values
( 1, 1, 1 ), ( 1, 1, 2 ), ( 1, 1, 3 ), -- ( 1, 1, 4 ), -- Try me.
( 1, 2, 1 ), ( 1, 2, 2 ), ( 1, 2, 3 );
select * from #Samples;
-- Generate all possible combinations of all values.
select distinct S1.Id_Combination, S2.Id_Position, S3.Multiply_Factor
from #Samples as S1 cross join #Samples as S2 cross join #Samples as S3
order by Id_Combination, Id_Position, Multiply_Factor;
Note that if you uncomment the extra sample data row you will get two more result rows.

Related

How to write a select query for displaying data on a table in another way using Postgresql?

I want to write a select query to pick data from a table which is shown in this image below,PICTURE_1
1.Table Containing Data
and display it like this image in this link below, PICTURE_2
2.Result of the query
About the data: The first picture shows data logged into a table for 2 seconds from 3 IDs(1,2&3) having 2 sub IDs (aa&bb). Values and timestamp are also displayed in the picture. The table conatins only 3 column as shown in PICTURE_1. Could you guys help me write a query to display data in the table to get displayed as shown in the second image using Postgresql?. You can extract ID name using substring function. The language that Im using is plpgsql. Any ideas/logic also will be good.Thank you for your time.
Please try this. Here row value has been shown in column wise and also use CTE.
-- PostgreSQL(v11)
WITH cte_t AS (
SELECT LEFT(name, 1) id
, RIGHT(name, POSITION('.' IN REVERSE(name)) - 1) t_name
, value
, time_stamp
FROM test
)
SELECT id
, time_stamp :: DATE "date"
, time_stamp :: TIME "time"
, MAX(CASE WHEN t_name = 'aa' THEN value END) "aa"
, MAX(CASE WHEN t_name = 'bb' THEN value END) "bb"
FROM cte_t
GROUP BY id, time_stamp
ORDER BY date, time, id;
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=6d35047560b3f83e6c906584b23034e9
Check this query dbfiddle
with cte (name, value, timeStamp) as (values
('1.aa', 1, '2021-08-20 10:10:01'),
('2.aa', 2, '2021-08-20 10:10:01'),
('3.aa', 3, '2021-08-20 10:10:01'),
('1.bb', 4, '2021-08-20 10:10:01'),
('2.bb', 5, '2021-08-20 10:10:01'),
('3.bb', 6, '2021-08-20 10:10:01'),
('1.aa', 7, '2021-08-20 10:10:02'),
('2.aa', 8, '2021-08-20 10:10:02'),
('3.aa', 9, '2021-08-20 10:10:02'),
('1.bb', 0, '2021-08-20 10:10:02'),
('2.bb', 1, '2021-08-20 10:10:02'),
('3.bb', 2, '2021-08-20 10:10:02')
), sub_cte as (
select split_name[1] as id, split_name[2] as name, value, tt::date as date, tt::time as time from (
select
regexp_split_to_array(name, '\.') split_name,
value,
to_timestamp(timestamp, 'YYYY-MM-DD HH:MI:SS') as tt
from cte
) foo
)
select id, date, time, a.value as aa, b.value as bb from sub_cte a
left join (
select * from sub_cte where name = 'bb'
) as b using (id, date, time)
where a.name = 'aa'
Result
id | date | time | aa | bb
----+------------+----------+----+----
1 | 2021-08-20 | 10:10:01 | 1 | 4
2 | 2021-08-20 | 10:10:01 | 2 | 5
3 | 2021-08-20 | 10:10:01 | 3 | 6
1 | 2021-08-20 | 10:10:02 | 7 | 0
2 | 2021-08-20 | 10:10:02 | 8 | 1
3 | 2021-08-20 | 10:10:02 | 9 | 2
(6 rows)

Get last row from group, limit number of results in PostgreSQL

I have a table with records representing a log, I omit rest of the columns in this example.
The id-column is autoincrement, item_id represents an item in app.
I need to get the latest item_id, for example two or three
CREATE TABLE "log" (
"id" INT,
"item_id" INT
);
-- TRUNCATE TABLE "log";
INSERT INTO "log" ("id", "item_id") VALUES
(1, 1),
(2, 2),
(3, 1),
(4, 1),
(5, 3),
(6, 3);
Basic query will list all results, latest at the top:
SELECT *
FROM "log"
ORDER BY "id" DESC
id item_id
6 3
5 3
4 1
3 1
2 2
1 1
I would like to have just two (LIMIT 2) last item_ids with their id. Last means - inserted last (ORDER BY id).
id item_id
6 3
4 1
Last three would be
id item_id
6 3
4 1
2 2
Once an item_id is returned, it is not returned again. So LIMIT 4 would return only three rows because there are only three unique item_id.
I am probably missing something. I already tried various combinations of DISTINCT OF, GROUP BY, LIMIT etc.
UPDATE #1:
After I tested query by S-man (below), I found out that it works for the data I provided howerer it does not work in general, for another set of data (sequence of item_id A, B and A again.). Here is another data set:
TRUNCATE TABLE "log";
INSERT INTO "log" ("id", "item_id") VALUES
(1, 1),
(2, 2),
(3, 3),
(4, 3),
(5, 1),
(6, 3);
Data in DB, ordered by id desc:
id item_id
6 3
5 1
4 3
3 3
2 2
1 1
Expected result for last three item_id
6 3
5 1
2 2
Well, after three changes, now we come back to the very first idea:
Just take DISTINCT ON:
demo:db<>fiddle
SELECT
*
FROM (
SELECT DISTINCT ON (item_id) -- 1
*
FROM log
ORDER BY item_id, id DESC
) s
ORDER BY id DESC -- 2
LIMIT 2
Returns exact one record of an ordered group. You group is the item_id, the order is id DESC, so you get the highest id for each item_id
Reorder by id DESC (instead of the previously ordered item_id) and limit your query output.

Improve performance on CTE with sub-queries

I have a table with this structure:
WorkerID Value GroupID Sequence Validity
1 '20%' 1 1 2018-01-01
1 '10%' 1 1 2017-06-01
1 'Yes' 1 2 2017-06-01
1 '2018-01-01' 2 1 2017-06-01
1 '17.2' 2 2 2017-06-01
2 '10%' 1 1 2017-06-01
2 'No' 1 2 2017-06-01
2 '2016-03-01' 2 1 2017-06-01
2 '15.9' 2 2 2017-06-01
This structure was created so that the client can create customized data for a worker. For example Group 1 can be something like "Salary" and Sequence is one value that belongs to that Group like "Overtime Compensation". The column Value is a VARCHAR(150) field and the correct validation and conversation is done in another part of the application.
The Validity column exist mainly for historical reasons.
Now I would like to show, for the different workers, the information in a grid where each row should be one worker (displaying the one with the most recent Validity):
Worker 1_1 1_2 2_1 2_2
1 20% Yes 2018-01-01 17.2
2 10% No 2016-03-01 15.9
To accomplish this I created a CTE that looks like this:
WITH CTE_worker_grid
AS
(
SELECT
worker,
/* 1 */
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 1
AND w.Sequence = 1
ORDER BY w.Validity DESC
) AS 1_1,
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 1
AND w.Sequence = 2
ORDER BY w.Validity DESC
) AS 1_2,
/* 2 */
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 2
AND w.Sequence = 1
ORDER BY w.Validity DESC
) AS 2_1,
(
SELECT top 1 w.Value
FROM worker_values AS w
WHERE w.GroupID = 2
AND w.Sequence = 2
ORDER BY w.Validity DESC
) AS 2_2
)
GO
This produces the correct result but it's very slow as it creates this grid for over 18'000 worker with almost 30 Groups and up to 20 Sequences in each Group.
How could one speed up the process of a CTE of this magnitude? Should CTE even be used? Can the sub-queries be changed or re-factored out to speed up the execution?
Use a PIVOT!
+----------+---------+---------+------------+---------+
| WorkerId | 001_001 | 001_002 | 002_001 | 002_002 |
+----------+---------+---------+------------+---------+
| 1 | 20% | Yes | 2018-01-01 | 17.2 |
| 2 | 10% | No | 2016-03-01 | 15.9 |
+----------+---------+---------+------------+---------+
SQL Fiddle: http://sqlfiddle.com/#!18/6e768/1
CREATE TABLE WorkerAttributes
(
WorkerID INT NOT NULL
, [Value] VARCHAR(50) NOT NULL
, GroupID INT NOT NULL
, [Sequence] INT NOT NULL
, Validity DATE NOT NULL
)
INSERT INTO WorkerAttributes
(WorkerID, Value, GroupID, Sequence, Validity)
VALUES
(1, '20%', 1, 1, '2018-01-01')
, (1, '10%', 1, 1, '2017-06-01')
, (1, 'Yes', 1, 2, '2017-06-01')
, (1, '2018-01-01', 2, 1, '2017-06-01')
, (1, '17.2', 2, 2, '2017-06-01')
, (2, '10%', 1, 1, '2017-06-01')
, (2, 'No', 1, 2, '2017-06-01')
, (2, '2016-03-01', 2, 1, '2017-06-01')
, (2, '15.9', 2, 2, '2017-06-01')
;WITH CTE_WA_RANK
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY WorkerID, GroupID, [Sequence] ORDER BY Validity DESC) AS VersionNumber
, WA.WorkerID
, WA.GroupID
, WA.[Sequence]
, WA.[Value]
FROM
WorkerAttributes AS WA
),
CTE_WA
AS
(
SELECT
WA_RANK.WorkerID
, RIGHT('000' + CAST(WA_RANK.GroupID AS VARCHAR(3)), 3)
+ '_'
+ RIGHT('000' + CAST(WA_RANK.[Sequence] AS VARCHAR(3)), 3) AS SMART_KEY
, WA_RANK.[Value]
FROM
CTE_WA_RANK AS WA_RANK
WHERE
WA_RANK.VersionNumber = 1
)
SELECT
WorkerId
, [001_001] AS [001_001]
, [001_002] AS [001_002]
, [002_001] AS [002_001]
, [002_002] AS [002_002]
FROM
(
SELECT
CTE_WA.WorkerId
, CTE_WA.SMART_KEY
, CTE_WA.[Value]
FROM
CTE_WA
) AS WA
PIVOT
(
MAX([Value])
FOR
SMART_KEY IN
(
[001_001]
, [001_002]
, [002_001]
, [002_002]
)
) AS PVT

Postgres query : using previous dynamically created colum value in next

I'm trying to implement what I have in code as a postgres query.
The following example isn't exactly what we're trying to do but I hope it shows how I'm trying to use the value from a previously calculated row in the next.
A sample table to help me demonstrate what I'm trying to do :
test=# select * from test ;
id | field1 | field2 | field3 | score
----+--------+--------+--------+-------
1 | 1 | 3 | 2 | 1.25
2 | 1 | -1 | 1 |
3 | 2 | 1 | 5 |
4 | 3 | -2 | 4 |
Here's the query in progress:
select id,
coalesce (
score,
case when lag_field3 = 2 then 0.25*(3*field1+field2) end
) as new_score
from (
select id, field1, field2, field3, score,
lag (field3) over (order by id) as lag_field3
from test
) inner1 ;
Which returns what I want so far ...
id | new_score
----+-----------
1 | 1.25
2 | 0.5
3 |
4 |
The next iteration of the query:
select id,
coalesce (
score,
case when lag_field3 = 2 then 0.25*(3*field1+field2) end,
case when field1 = 2 then 0.75 * lag (new_score) end
) as new_score
from (
select id, field1, field2, field3, score,
lag (field3) over (order by id) as lag_field3
from test
) inner1 ;
The difference is this :
case when field1 = 2 then 0.75 * lag (new_score) end
I know and understand why this won't work.
I've aliased the calculated field as new_score and when field1 = 2, I want 0.75 * the previous rows new_score value.
I understand that new_score is an alias and can't be used.
Is there some way I can accomplish this? I could try to copy that expression, wrap a lag around it, alias that as something else and try to work with that but that would get very messy.
Any ideas?
Many thanks.
Postgres lets you use windows in CASE statements. Probably you were missing the OVER (ORDER BY id) part. You can also define different windows but you can't use windows in conjunction with GROUP BY. Also, it won't let you use annidate windows, so you have to write down some subqueries or CTEs.
Here's the query:
SELECT id, COALESCE(tmp_score,
CASE
WHEN field1 = 2
THEN 0.75 * LAG(tmp_score) OVER (ORDER BY id)
-- missing ELSE statement here
END
) AS new_score
FROM (
SELECT id, field1,
COALESCE (
score,
CASE
WHEN LAG(field3) OVER (ORDER BY id) = 2
THEN 0.25*(3*field1+field2)
END
) AS tmp_score
FROM test
) inner1
The code to create and populate the table:
CREATE TABLE test(
id int,
field1 int,
field2 int,
field3 int,
score numeric
);
INSERT INTO test VALUES
(1, 1, 3, 2, 1.25),
(2, 1, -1, 1, NULL),
(3, 2, 1, 5, NULL),
(4, 3, -2, 4, NULL);
The query returns this output:
id | new_score
----+-----------
1 | 1.25
2 | 0.50
3 | 0.3750
4 |

Calculate total spread covered by several ranges

I have a table where each record has an indicator and a range, and I want to know the total spread covered by the ranges for each indicator -- but not double-counting when ranges overlap for a certain indicator.
I can see that the wording is hard to follow, but the concept is pretty simple. Let me provide an illustrative example.
CREATE TABLE records(id int, spread int4range);
INSERT INTO records VALUES
(1, int4range(1, 4)),
(1, int4range(2, 7)),
(1, int4range(11, 15)),
(2, int4range(3, 5)),
(2, int4range(6, 10));
SELECT * FROM records;
Yielding the output:
id | spread
----+---------
1 | [1,4)
1 | [2,7)
1 | [11,15)
2 | [3,5)
2 | [6,10)
(5 rows)
I would now like a query which gives the following output:
id | total
---+--------
1 | 10
2 | 6
Where did the numbers 10 and 6 come from? For ID 1, we have ranges that include 1, 2, 3, 4, 5, 6, 11, 12, 13, and 14; a total of 10 distinct integers. For ID 2, we have ranges that include 3, 4, 6, 7, 8, and 9; a total of six distinct integers.
If it helps you understand the problem, you might imagine it as something like "if these records represent the day and time range for meetings on my calendar, how many total hours in each day are there where I'm booked at least once?"
Postgres version is 9.4.8, in case that matters.
select id, count(*)
from (
select distinct id, generate_series(lower(spread), upper(spread) - 1)
from records
) s
group by id
;
id | count
----+-------
1 | 10
2 | 6