Joining 2 subsets of same table with different conditions - hiveql

ID
Timestamp
type
account
212
2021-01-06 14:47:35
019
ALA058748
212
2021-01-07 18:34:44
021
API305575
212
2021-01-07 22:34:48
021
XYZ565656
212
2021-01-08 00:31:25
021
API305575
212
2021-01-08 00:31:31
021
API305575
212
2021-01-08 00:34:44
020
API305575
123
2021-05-21 03:34:44
021
API305575
123
2021-05-21 05:34:44
019
API305575
123
2021-05-21 09:34:44
021
API305575
123
2021-05-21 03:34:44
020
PQR464646
I have a table like above
I need to choose only those IDs for which -
Step 1) MINIMUM(Timestamp) with type = 021 for an ID --- Say X
Step 2) (Timestamp) with type = 020 and with same ID and account as in X --- Say Y
WHERE (Y-X) in minutes > 30
in this example - only ID 212 will be selected since for ID 123 , account with MIN(Timestamp) where type=021 <> account with type=020
Thank You

Schema:
create table t
(
ID int,
Timestamp datetime,
type int,
account varchar(50)
);
insert into t values(212, '2021-01-06 14:47:35', 019, 'ALA058748');
insert into t values(212, '2021-01-07 18:34:44', 021, 'API305575');
insert into t values(212, '2021-01-07 22:34:48', 021, 'XYZ565656');
insert into t values(212, '2021-01-08 00:31:25', 021, 'API305575');
insert into t values(212, '2021-01-08 00:31:31', 021, 'API305575');
insert into t values(212, '2021-01-08 00:34:44', 020, 'API305575');
insert into t values(123, '2021-05-21 03:34:44', 021, 'API305575');
insert into t values(123, '2021-05-21 05:34:44', 019, 'API305575');
insert into t values(123, '2021-05-21 09:34:44', 021, 'API305575');
insert into t values(123, '2021-05-21 03:34:44', 020, 'PQR464646');
Query #1 for MySQL:
select id
from t
group by id
having TIMESTAMPDIFF(minute, min(case when type = '021' then timestamp end),
min(case when type = '020' then timestamp end))>30
Query #2 for SQL Server:
select id
from t
group by id
having datediff(minute, min(case when type = '021' then timestamp end),
min(case when type = '020' then timestamp end))>30
Output:
id
212
db<>fiddle here

You can do what you want using aggregation and filtering. Date functions are notoriously database dependent, so the "30 minutes" logic probably differs on your database:
select id, account
from t
group by id, account
having min(timestamp) = min(case when type = '021' then timestamp end) and
min(timestamp) < min(case when type = '020' then timestamp end) + interval '30 minute'

Related

PostgreSQL: GROUP BY and ORDER BY, whole dataset as a result

In a Postgres database I have a table with the following columns:
ID (Pimary Key)
Code
Date
I'm trying to extract data ordered by Date and grouped by Code so that the most recent date will determine what code rows should be grouped first and so forth (if it makes sense). An example:
007 2022-01-04
007 2022-01-01
007 2021-12-19
002 2022-01-03
002 2021-12-02
002 2021-11-15
035 2022-01-01
035 2021-11-30
035 2021-05-03
001 2021-12-31
022 2021-12-07
076 2021-11-19
I thought I could achieve this with the following query:
SELECT * FROM Table
GROUP BY Table.Code
ORDER BY Table.Date DESC
but this gives me
ERROR: column "Table.ID" must appear in the GROUP BY clause or be used in an aggregate function
And if I add the column ID to the GROUP BY the result I get is just a list ordered by Date with all the Codes mixed.
Is there any way to achieve whai I want?
Edit 3
More elegant solution using max over partition by.
SELECT
"Code",
"Date"
FROM
"Table"
ORDER BY
max("Date") over (partition by "Code") DESC,
"Table"."Date" DESC
;
Output:
Code
Date
007
2022-01-04T00:00:00Z
007
2022-01-01T00:00:00Z
007
2021-12-19T00:00:00Z
002
2022-01-03T00:00:00Z
002
2021-12-02T00:00:00Z
002
2021-11-15T00:00:00Z
035
2022-01-01T00:00:00Z
035
2021-11-30T00:00:00Z
035
2021-05-03T00:00:00Z
001
2021-12-31T00:00:00Z
022
2021-12-07T00:00:00Z
076
2021-11-19T00:00:00Z
Edit 2:
I join a select b with the entire dataset. The select b is used for sort only and is what you tried.
With "b" as
( select
"Code",
max("Date") as "Date"
from
"Table"
group by
"Code"
)
SELECT
"Table"."Code",
"Table"."Date"
FROM
"Table" left join "b" on "Table"."Code" = "b"."Code"
ORDER BY
"b"."Date" desc,
"Table"."Date" DESC;
Output:
Code
Date
007
2022-01-04T00:00:00Z
007
2022-01-01T00:00:00Z
007
2021-12-19T00:00:00Z
002
2022-01-03T00:00:00Z
002
2021-12-02T00:00:00Z
002
2021-11-15T00:00:00Z
035
2022-01-01T00:00:00Z
035
2021-11-30T00:00:00Z
035
2021-05-03T00:00:00Z
001
2021-12-31T00:00:00Z
022
2021-12-07T00:00:00Z
076
2021-11-19T00:00:00Z
Edit1
A group by clause should contain a unique value per line.
The example below show a way to fix the error on your data.
Table with ID:
CREATE TABLE "Table" (
"ID" serial not null primary key,
"Code" varchar,
"Date" timestamp
);
INSERT INTO "Table"
("Code", "Date")
VALUES
('007', '2022-01-04 00:00:00'),
('007', '2022-01-01 00:00:00'),
('007', '2021-12-19 00:00:00'),
('002', '2022-01-03 00:00:00'),
('002', '2021-12-02 00:00:00'),
('002', '2021-11-15 00:00:00'),
('035', '2022-01-01 00:00:00'),
('035', '2021-11-30 00:00:00'),
('035', '2021-05-03 00:00:00'),
('001', '2021-12-31 00:00:00'),
('022', '2021-12-07 00:00:00'),
('076', '2021-11-19 00:00:00')
;
Select:
SELECT * FROM "Table" ORDER BY "Code", "Date" DESC;
Output:
ID
Code
Date
10
001
2021-12-31T00:00:00Z
4
002
2022-01-03T00:00:00Z
5
002
2021-12-02T00:00:00Z
6
002
2021-11-15T00:00:00Z
1
007
2022-01-04T00:00:00Z
2
007
2022-01-01T00:00:00Z
3
007
2021-12-19T00:00:00Z
11
022
2021-12-07T00:00:00Z
7
035
2022-01-01T00:00:00Z
8
035
2021-11-30T00:00:00Z
9
035
2021-05-03T00:00:00Z
12
076
2021-11-19T00:00:00Z
Original Answer
First, select the columns that you want to group e.g. Code, that you want to apply an aggregate function (Date).
Second, list the columns that you want to group in the GROUP BY clause.
In the order by clause, use the same logic as the select clause.
https://www.postgresqltutorial.com/postgresql-group-by/
Tables:
CREATE TABLE "Table"
("Code" int, "Date" timestamp)
;
INSERT INTO "Table"
("Code", "Date")
VALUES
(007, '2022-01-04 00:00:00'),
(007, '2022-01-01 00:00:00'),
(007, '2021-12-19 00:00:00'),
(002, '2022-01-03 00:00:00'),
(002, '2021-12-02 00:00:00'),
(002, '2021-11-15 00:00:00'),
(035, '2022-01-01 00:00:00'),
(035, '2021-11-30 00:00:00'),
(035, '2021-05-03 00:00:00'),
(001, '2021-12-31 00:00:00'),
(022, '2021-12-07 00:00:00'),
(076, '2021-11-19 00:00:00')
;
Select
SELECT
"Table"."Code",
max("Table"."Date")
FROM
"Table"
GROUP BY
"Table"."Code"
ORDER BY
max("Table"."Date") DESC
Output:
Code
max
7
2022-01-04T00:00:00Z
2
2022-01-03T00:00:00Z
35
2022-01-01T00:00:00Z
1
2021-12-31T00:00:00Z
22
2021-12-07T00:00:00Z
76
2021-11-19T00:00:00Z

postgresql select rows from same table twice

I want to compare deposit for each person in the table.
and return all the rows where the deposit field is decreased.
Here is what I have done so far;
The customer table is;
person_id employee_id deposit ts
101 201 44 2021-09-30 10:12:19+00
100 200 45 2021-09-30 10:12:19+00
101 201 47 2021-09-30 09:12:19+00
100 200 21 2021-09-29 10:12:19+00
104 203 54 2021-09-27 10:12:19+00
and as a result I want is;
person_id employee_id deposit ts
101 201 44 2021-09-30 10:12:19+00
SELECT person_id,
employee_id,
deposit,
ts,
lag(deposit) over client_window as pre_deposit,
lag(ts) over client_window as pre_ts
FROM customer
WINDOW client_window as (partition by person_id order by ts)
ORDER BY person_id , ts
so it returns table with the following results;
person_id employee_id deposit ts pre_deposit pre_ts
101 201 44 2021-09-30 10:12:19+00 47 2021-09-30 09:12:19+00
100 200 45 2021-09-30 10:12:19+00 21 2021-09-29 10:12:19+00
101 201 47 2021-09-30 09:12:19+00 null null
100 200 21 2021-09-29 10:12:19+00 null null
104 203 54 2021-09-27 10:12:19+00 null null
SELECT person_id,
employee_id,
deposit,
ts,
lag(deposit) over client_window as pre_deposit,
lag(ts) over client_window as pre_ts
FROM customer
WINDOW client_window as (partition by person_id order by ts)
WHERE pre_deposit > deposit //this returns column not found for pre_deposit
ORDER BY person_id , ts
so far somehow I need to select the same table again to be able to apply this condition;
where pre_deposit > deposit
what does it make sense here?
union? outer-join? left-join? right-join?
Use your query as a subquery and filter the results:
SELECT person_id, employee_id, deposit, ts
FROM (
SELECT *, lag(deposit) over client_window as pre_deposit
FROM customer
WINDOW client_window as (partition by person_id order by ts)
) t
WHERE deposit < pre_deposit
ORDER BY person_id, ts;
See the demo.

bulk update one table using value in another table

If I find a max value in my database of LIM50177
lim_id
LIM50172
LIM50173
LIM50174
LIM50175
LIM50176
LIM50177
How can I loop through another table and for every base_id go and bulk replace the temp_id with a new lim_id?
temp_id base id desc
1008 720 GP
1009 721 GT
1010 722 GA
1021 723 P
1021 724 G
1021 725 X
In other words
The data will be updated as follows:
temp_id base id desc
LIM50178 720 GP
LIM50179 721 GT
LIM50180 722 GA
LIM50181 723 P
LIM50182 724 G
LIM50183 725 X
Use a sequence every time you generate the lim_id values so you get unique values.
(Don't use the values in the other table to calculate the maximum value as, if the table is updated in two sessions at the same time and you are always basing the next value off the maximum value in the table then neither session will see the updates performed by the other session and you can end up generating identical "next" values in each session. Instead, every time you generate the "next" value always get that "next" value from the sequence.)
Oracle Setup:
CREATE SEQUENCE lim_id_seq START WITH 50178;
CREATE TABLE temp_data ( temp_id, base_id, "desc" ) AS
SELECT CAST( 1008 AS VARCHAR2(10) ), 720, 'GP' FROM DUAL UNION ALL
SELECT CAST( 1009 AS VARCHAR2(10) ), 721, 'GT' FROM DUAL UNION ALL
SELECT CAST( 1010 AS VARCHAR2(10) ), 722, 'GA' FROM DUAL UNION ALL
SELECT CAST( 1021 AS VARCHAR2(10) ), 723, 'P' FROM DUAL UNION ALL
SELECT CAST( 1021 AS VARCHAR2(10) ), 724, 'G' FROM DUAL UNION ALL
SELECT CAST( 1021 AS VARCHAR2(10) ), 725, 'X' FROM DUAL
Update using the Sequence:
UPDATE temp_data
SET temp_id = 'LIM' || lim_id_seq.NEXTVAL;
Result:
SELECT * FROM temp_data;
TEMP_ID | BASE_ID | desc
:------- | ------: | :---
LIM50178 | 720 | GP
LIM50179 | 721 | GT
LIM50180 | 722 | GA
LIM50181 | 723 | P
LIM50182 | 724 | G
LIM50183 | 725 | X
db<>fiddle here

SQL Server - Renumber in Order

I have a table that I need to reorder a column, but I need to keep the original order by date.
TABLE_1
id num_seq DateTimeStamp
fb4e1683-7035-4895-b2c8-d084d9b42ce3 111 08-02-2005
e40e4c3e-65e4-47b7-b13a-79e8bce2d02d 114 10-07-2017
49e261a8-a855-4844-a0ac-37b313da2222 113 01-30-2010
6c4bffb7-a056-4a20-ae1c-5a31bdf683f2 112 04-15-2006
I want to reorder num_seq starting with 1001 through 1004 and keep the numbering in order. So 111 = 1001 and 112 = 1002 and so forth.
This is what I have so far:
DECLARE #num INT
SET #num = 0
UPDATE Table_1
SET #num = num_seq = #id + 1
GO
I know that UPDATE doesn't let me use the keyword ORDER BY. Is there a way to do this in SQL 2008 R2?
Stage the new num_seq in a CTE, then leverage that in your update statement:
declare #Table_1 table (id uniqueidentifier, num_seq int, DateTimeStamp datetime);
insert into #Table_1
values
('fb4e1683-7035-4895-b2c8-d084d9b42ce3', 111, '08-02-2005'),
('e40e4c3e-65e4-47b7-b13a-79e8bce2d02d', 114, '10-07-2017'),
('49e261a8-a855-4844-a0ac-37b313da2222', 113, '01-30-2010'),
('6c4bffb7-a056-4a20-ae1c-5a31bdf683f2', 112, '04-15-2006');
;with stage as
(
select *,
num_seq_new = 1000 + row_number()over(order by DateTimeStamp asc)
from #Table_1
)
update stage
set num_seq = num_seq_new;
select * from #Table_1
Returns:
id num_seq DateTimeStamp
FB4E1683-7035-4895-B2C8-D084D9B42CE3 1001 2005-08-02 00:00:00.000
E40E4C3E-65E4-47B7-B13A-79E8BCE2D02D 1004 2017-10-07 00:00:00.000
49E261A8-A855-4844-A0AC-37B313DA2222 1003 2010-01-30 00:00:00.000
6C4BFFB7-A056-4A20-AE1C-5A31BDF683F2 1002 2006-04-15 00:00:00.000

TSQL: How to apply Condition to sub grouping

Image I have the following table with multiple codes for a single person for different periods (id is the primary key)
id code Name Start Finish
325 1353 Bob NULL 2012-07-03 16:21:16.067
1742 1353 Bob 2012-07-03 16:21:16.067 2012-08-03 15:56:29.897
1803 1353 Bob 2012-08-03 15:56:29.897 NULL
17 575 Bob NULL NULL
270 834 Bob NULL 2012-07-20 15:51:19.913
1780 834 Bob 2012-07-20 15:51:19.913 2012-07-26 16:26:54.413
1789 834 Bob 2012-07-26 16:26:54.413 2012-08-21 15:36:58.940
1830 834 Bob 2012-08-21 15:36:58.940 2012-08-24 14:26:05.890
1835 834 Bob 2012-08-24 14:26:05.890 2012-08-30 12:01:05.313
1838 123 Bob 2012-08-30 12:01:05.313 2012-09-05 09:29:02.497
1844 900 Bob 2012-09-05 09:29:02.497 NULL
What I want to do update the table such that the code is take from the latest person.
id code Name Start Finish
325 900 Bob NULL 2012-07-03 16:21:16.067
1742 900 Bob 2012-07-03 16:21:16.067 2012-08-03 15:56:29.897
1803 900 Bob 2012-08-03 15:56:29.897 NULL
17 900 Bob NULL NULL
270 900 Bob NULL 2012-07-20 15:51:19.913
1780 900 Bob 2012-07-20 15:51:19.913 2012-07-26 16:26:54.413
1789 900 Bob 2012-07-26 16:26:54.413 2012-08-21 15:36:58.940
1830 900 Bob 2012-08-21 15:36:58.940 2012-08-24 14:26:05.890
1835 900 Bob 2012-08-24 14:26:05.890 2012-08-30 12:01:05.313
1838 900 Bob 2012-08-30 12:01:05.313 2012-09-05 09:29:02.497
1844 900 Bob 2012-09-05 09:29:02.497 NULL
Latest person is defined as the person with the latest (max?) Start AND (Finish IS NULL or Finish >= GetDate()) WITHIN the Group of people of same Name AND Code
In the above example that is where id = 1844 (with the groups of Bob it's got the latest Start and the Finish is Null)
I pretty sure this is possible with a single statement but I can see how to define 'Latest Person' such that I can join it back to get rows I want to update
Edit: Please note that I cannot rely on the ordering of the Id column only the date columns.
Something like this will do:
update this set code = (
select top (1) that.code from table1 that
where that.name = this.name -- match on name
and (that.Finish is null or that.Finish >= getdate()) -- filter for current rows only
order by that.Start desc, that.id desc -- rank by start, break ties with id
)
from table1 this
I hope your table is well indexed, and/or not too big, because this is expensive to do in one step.
Alternate form, using OUTER APPLY, and more easily extensible:
update this set code = that.code
from table1 this
outer apply (
select top (1) that.code from table1 that
where that.name = this.name -- match on name
and (that.Finish is null or that.Finish >= getdate()) -- filter for current rows
order by that.Start desc, that.id desc -- rank by start, break ties with id
) that
Alternate method using windowing functions, without a join:
update this set code = _latest_code
from (
-- identify the latest code per name
select *, _latest_code = max(
case
when (finish is null or finish >= getdate())
and _row_number = 1
then code else null
end
) over (partition by name)
from (
-- identify the latest row per name
select *, _row_number = row_number() over (
partition by name order by
case when finish is null or finish >= getdate() then 0 else 1 end
, start desc, id desc)
from table1
) this
) this