Limiting/Aggregating Results When Selecting on an ID followed by a CASE - tsql

Trying to pull some data from a Log table.
Simplified version of the query I'm using below:
SELECT
type_id AS type_id
, CASE WHEN LogOperation = 'Insert' THEN CreateDT ELSE NULL END AS create_date
, CASE WHEN (AuditID > 0 OR AuditID IS NOT NULL) AND OldVal_AuditID IS NULL THEN LogTimeStamp ELSE NULL END AS audit_date
FROM TypeLOG
ORDER BY type_id
Sample result set:
|---------|-------------|------------|
| type_id | create_date | audit_date |
|---------|-------------|------------|
| 176 | 2019-06-01 | NULL |
| 177 | 2019-06-01 | NULL |
| 177 | NULL | 2019-06-03 |
| 178 | 2019-06-03 | NULL |
| 178 | NULL | 2019-06-04 |
| 178 | NULL | NULL |
How could I pull this data so only 1 row per type_id is shown? This is a log table so naturally other operations such as updates and deletes create multiple rows per type_id.
Related to this, there can theoretically be multiple 'Inserts' per type_id... I would want the first 'Insert' per type_id, I suppose based on the earliest LogTimeStamp.
How do I need to modify my query to reflect this?
Thanks!

You gotta do the CTE's or self joins it should look like that
WITH CTE1 AS (
SELECT TYPE_ID,
CASE WHEN LogOperation = 'Insert' THEN CreateDT ELSE NULL END AS create_date
FROM TypeLOG
), CTE2 (
SELECT TYPE_ID,
CASE WHEN (AuditID > 0 OR AuditID IS NOT NULL) AND OldVal_AuditID IS NULL THEN LogTimeStamp ELSE NULL END AS audit_date
FROM TypeLOG
)
SELECT CTE1.TYPE_ID, CREATE_DATE, AUDIT_DATE
FROM CTE1 JOIN CTE2
ON CTE1.TYPE_ID = CTE2.TYPE_ID
ORDER BY TYPE_ID

Related

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

postgresql pivot using crosstab

I have trouble using crosstab() in postgresql-11.
Here is my table,
CREATE TABLE monitor(tz timestamptz, level int, table_name text, status text);
The table monitors events on other tables. It contains
table_name (table on which the event occurred)
timestamp(time at which the event occurred)
level (level of the event)
status of the event (start/end of the event)
Here is the sample data to it.
tz | level | status | table_name
----------------------------------+-------+--------+--------------
2019-10-24 16:18:34.89435+05:30 | 2 | start | test_table_2
2019-10-24 16:18:58.922523+05:30 | 2 | end | test_table_2
2019-11-01 10:31:08.948459+05:30 | 3 | start | test_table_3
2019-11-01 10:41:22.863529+05:30 | 3 | end | test_table_3
2019-11-01 10:51:44.009129+05:30 | 3 | start | test_table_3
2019-11-01 12:35:23.280294+05:30 | 3 | end | test_table_3
Given a timestamp, I want to list out all current events at that time. It could be done using the criteria,
start_time >= 'given_timestamp' and end_time <= 'given_timestamp'
So I tried to use crosstab() to pivot the table over columns table_name,status and timestamp. My query is,
with q1 (table_name, start_time,end_time) as
(select * from crosstab
('select table_name, status, tz from monitor ')
as finalresult (table_name text, start_time timestamptz, end_time timestamptz)),
q2 (level,start_time,end_time) as
(select * from crosstab('select level, status, tz from monitor ')
as finalresult (level int, start_time timestamptz, end_time timestamptz))
select q1.table_name,q2.level,q1.start_time,q1.end_time
from q1,q2
where q1.start_time=q2.start_time;
The output of the query is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
But my expected output is,
table_name | level | start_time | end_time
--------------+-------+----------------------------------+----------------------------------
test_table_2 | 2 | 2019-10-24 16:18:34.89435+05:30 | 2019-10-24 16:18:58.922523+05:30
test_table_3 | 3 | 2019-11-01 10:31:08.948459+05:30 | 2019-11-01 10:41:22.863529+05:30
test_table_3 | 3 | 2019-11-01 10:51:44.009129+05:30 | 2019-11-01 12:35:23.280294+05:30
How do I achieve the expected output? Or is there any better way other than crosstab?
I would use a self join for this. To keep the rows on the same level and table together you can use a window function to assign numbers to them so they can be distinguished.
with numbered as (
select tz, level, table_name, status,
row_number() over (partition by table_name, status order by tz) as rn
from monitor
)
select st.table_name, st.level, st.tz as start_time, et.tz as end_time
from numbered as st
join numbered as et on st.table_name = et.table_name
and et.status = 'end'
and et.level = st.level
and et.rn = st.rn
where st.status = 'start'
order by st.table_name, st.level;
This assumes that there will never be a row with status = 'end' and an earlier timestamp then the corresponding row with status = 'start'
Online example: https://rextester.com/QYJK57764

Hierarchy trees in database and web app

I want to create web app which will use tree data structures. Users will be able to create, update and delete trees. I have the following table in PostgreSQL called nodes in database:
id INTEGER PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE,
parent_id INTEGER NULL REFERENCE nodes(id)
Getting data
I want to get data in the following form:
id | name | children
---|------|--------------
1 | a | [2,3]
2 | b | []
3 | c | [4]
4 | d | []
I created query which returns data in form
id | name | parent_id
---|------|--------------
1 | a |
2 | b | 1
3 | c | 1
4 | d | 3
And here is code:
WITH RECURSIVE nodes_cte(id, name, parent_id, level) AS (
SELECT nodes.id, nodes.name, nodes.parent_id, 0 AS level
FROM nodes
WHERE name = 'a'
UNION ALL
SELECT nodes.id, nodes.name, nodes.parent_id, level+1
FROM nodes
JOIN nodes_cte
ON nodes_cte.id = nodes.parent_id
)
SELECT * FROM nodes_cte;
Can I change SQL code to get what I want or should I do that in app??
Inserting data
I want to know what are the ways to insert data into the table. I think that following approach will work for me:
create sequence in database
increase sequence for number of elements in tree
manually compute ids in app and insert elements in the table
Are there better ways?
CREATE TABLE nodes
( id INTEGER PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
-- I created query which returns data in form
INSERT INTO nodes(id,name,parent_id)VALUES
( 1 , 'a' , NULL)
,( 2 , 'b' , 1)
,( 3 , 'c' , 1)
,( 4 , 'd' , 3)
;
SELECT p.id, p.name
, array_agg(c.id) AS children
FROM nodes p
LEFT JOIN nodes c ON c.parent_id = p.id
GROUP BY p.id, p.name
;
Result:
id | name | children
----+------+----------
1 | a | {2,3}
2 | b | {NULL}
3 | c | {4}
4 | d | {NULL}
(4 rows)
Extra: using generate_series() to insert a bunch of records. Each record having id/3 as parent, (except when zero).
INSERT INTO nodes(id,name,parent_id)
SELECT gs, 'zzz_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 5,25) gs
;
INSERTING/UPDATING DATA
Normally, your front-end should not mess with sequences, but leave that to the DBMS. You already have a UNIQUE constraint on name, because it is a natural key . So, your front-end should use that key to address rows in the nodes table, like in:
CREATE TABLE nodes2
( id SERIAL NOT NULL PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
INSERT INTO nodes2(name,parent_id)
SELECT 'Omg_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 1,15) gs
;
PREPARE upd (text, text) AS
-- child, parent
UPDATE nodes2 c
SET parent_id = p.id
FROM nodes2 p
WHERE p.name = $2 -- parent
AND c.name = $1 -- child
;
EXECUTE upd( 'Omg_12', 'Omg_11');
EXECUTE upd( 'Omg_15', 'Omg_11');
Result:
CREATE TABLE
INSERT 0 15
PREPARE
UPDATE 1
UPDATE 1
id | name | children
----+--------+-----------
1 | Omg_1 | {3,4,5}
2 | Omg_2 | {6,7,8}
3 | Omg_3 | {9,10,11}
4 | Omg_4 | {13,14}
5 | Omg_5 | {NULL}
6 | Omg_6 | {NULL}
7 | Omg_7 | {NULL}
8 | Omg_8 | {NULL}
9 | Omg_9 | {NULL}
10 | Omg_10 | {NULL}
11 | Omg_11 | {15,12}
12 | Omg_12 | {NULL}
13 | Omg_13 | {NULL}
14 | Omg_14 | {NULL}
15 | Omg_15 | {NULL}
(15 rows)

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

Select multiple row values into single row with multi-table clauses

I've searched the forums and while I see similar posts, they only address pieces of the full query I need to formulate (array_aggr, where exists, joins, etc.). If the question I'm posting has been answered, I will gladly accept references to those threads.
I did find this thread ...which is very similar to what I need, except it is for MySQL, and I kept running into errors trying to get it into psql syntax. Hoping someone can help me get everything together. Here's the scenario:
Attribute
attrib_id | attrib_name
UserAttribute
user_id | attrib_id | value
Here's a small example of what the data looks like:
Attribute
attrib_id | attrib_name
-----------------------
1 | attrib1
2 | attrib2
3 | attrib3
4 | attrib4
5 | attrib5
UserAttribute -- there can be up to 15 attrib_id's/value's per user_id
user_id | attrib_id | value
----------------------------
101 | 1 | valueA
101 | 2 | valueB
102 | 1 | valueC
102 | 2 | valueD
103 | 1 | valueA
103 | 2 | valueB
104 | 1 | valueC
104 | 2 | valueD
105 | 1 | valueA
105 | 2 | valueB
Here's what I'm looking for
Result
user_id | attrib1_value | attrib2_value
--------------------------------------------------------
101 | valueA | valueB
102 | valueC | valueD
103 | valueA | valueB
104 | valueC | valueD
105 | valueA | valueB
As shown, I'm looking for single rows that contain:
- user_id from the UserAttribute table
- attribute values from the UserAttribute table
Note: I only need attribute values from the UserAttribute table for two specific attribute names in the Attribute table
Again, any help or reference to an existing solution would be greatly appreciated.
UPDATE:
#ronin provided a query that gets the results desired:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;
To build on that, I tried to add some 'LIKE' pattern matching within the 'WHEN' condition (against the ua.value), but everything ends up as the 'FALSE' value. Will start a new question to see if that can be incorporated if I cannot figure it out. Thanks all for the help!!
If each attribute only has a single value for a user, you can start by making a sparse matrix:
SELECT user_id
,CASE WHEN attrib_id = 1 THEN value ELSE NULL END AS attrib_1_val
,CASE WHEN attrib_id = 2 THEN value ELSE NULL END AS attrib_2_val
FROM UserAttribute;
Then compress the matrix using an aggregate function:
SELECT user_id
,MAX(CASE WHEN attrib_id = 1 THEN value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN attrib_id = 2 THEN value ELSE NULL END) AS attrib_2_val
FROM UserAttribute
GROUP BY user_id;
In response to the comment, searching by attribute name rather than id:
SELECT ua.user_id
,MAX(CASE WHEN a.attrib_name = 'attrib1' THEN ua.value ELSE NULL END) AS attrib_1_val
,MAX(CASE WHEN a.attrib_name = 'attrib2' THEN ua.value ELSE NULL END) AS attrib_2_val
FROM UserAttribute ua
JOIN Attribute a ON (a.attrib_id = ua.attrib_id)
WHERE a.attrib_name IN ('attrib1', 'attrib2')
GROUP BY ua.user_id;
Starting with Postgres 9.4 you can use the simpler aggregate FILTER clause:
SELECT user_id
,MAX(value) FILTER (WHERE attrib_id = 1) AS attrib_1_val
,MAX(value) FILTER (WHERE attrib_id = 2) AS attrib_2_val
FROM UserAttribute
WHERE attrib_id IN (1,2)
GROUP BY 1;
For more than a few attributes or for top performance, look to crosstab() from the additional module tablefunc (Postgres 8.3+). Details here:
PostgreSQL Crosstab Query
What about something like this:
select ua.user_id, a.attrib_name attrib_value1, a2.attrib_name attrib_value2
from user_attribute ua
left join attribute a on a.atribute_id=ua.attribute_id and a.attribute_id in (1,2)
left join user_attribute ua2 on ua2.user_id=ua.user_id and ua2.attribute_id > ua.attribute_id
left join attribute a2 on a2.attribute_id=ua2.attribute_id and a2.attribute_id in (1,2)