Postgres, update statement from jsonb array with sorting - postgresql

I have a jsonb column in my table - it contains array of json objects
one of fields in these objects is a date.
Now i added new column in my table of type timestamp.
And now i need statement which hepls me to update new column with most recent date value from jsonb array column af a same record.
Following statement works great on selecting most recent date from jsonb array column of certain record:
select history.date
from document,
jsonb_to_recordset(document.history) as history(date date)
where document.id = 'd093d6b0-702f-11eb-9439-0242ac130002'
order by history.date desc
limit 1;
On update i have tried following:
update document
set status_recent_change_date = subquery.history.date
from (
select id, history.date
from document,
jsonb_to_recordset(document.history) as history(date date)
) as subquery
where document.id = subquery.id
order by history.date desc
limit 1;
Last statement does not working.

demo:db<>fiddle
UPDATE document d
SET status_recent_change_date = s.date
FROM (
SELECT DISTINCT ON (id)
*
FROM document,
jsonb_to_recordset(document.history) AS history(date date)
ORDER BY id, history.date DESC
) s
WHERE d.id = s.id;
Using LIMIT would not work because you limit the entire output of your SELECT statement. But you want to limit the output of each document.id. This can be done using DISTINCT ON (id).
This result can be used to update each record using their id values.

You most likely don't need to use LIMIT command.
It is enough to do the sorting inside SUBQUERY:
UPDATE document SET status_recent_change_date = subquery.hdate
FROM (
SELECT id, history.date AS hdate
FROM document, jsonb_to_recordset(document.history) AS history(date date)
ORDER BY history.date DESC
) AS subquery
WHERE document.id = subquery.id

Related

remove duplicate items from postgres

I need help writing the query to SELECT rows which have duplicate productIDs
the table is 4 columns
id,property_id,status,price
20,13356,sold,200000
24,78436,sold,730000
12504,13356,sold,200000
...
I currently have the following python script
from psycopg2.extensions import AsIs
import psycopg2
conn = psycopg2.connect(...)
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
def get_dict_sql(cur, query, single=False):
cur.execute(query)
if single:
return dict(cur.fetchone())
z = cur.fetchall()
return [dict(row) for row in z]
columns = ['property_id', 'status', 'price']
seen = set()
rows = get_dict_sql(cursor, "SELECT * FROM listings")
insert_statement = 'insert into listings_temp (%s) values %s'
for row in rows:
if row['product_id'] in seen:
continue
seen.add(row['product_id'])
values = [row[column] for column in columns]
cursor.execute(insert_statement)
q2 = cursor.mogrify(insert_statement, (AsIs(','.join(columns)), tuple(values)))
cursor.execute(q2)
conn.commit()
I created a new table to store the new data and this script 26 hours ago and still didn't finish, is there a way of SELECT only rows where product_id is duplicated?
or even better a query which does directly in Postgres?
The PostgreSQL way to fetch duplicates:
demo:db<>fiddle
This gives you duplicates:
SELECT
*
FROM (
SELECT
*,
row_number() OVER (PARTITION BY product_id)
FROM
listings
) s
WHERE row_number >= 2
The row_number() window function adds a row count to every element of a certain group (the PARTITION, which are the product_ids here). With that you are able to fetch only those, where the row count is >= 2
To remove the fetched record directly, you can combine the SELECT statement with a DELETE statement:
step-by-step demo:db<>fiddle
DELETE FROM t
WHERE id IN
(
SELECT
id
FROM (
SELECT
*,
row_number() OVER (PARTITION BY product_id)
FROM
t
) s
WHERE row_number >= 2
);

Update column with more than one value

I have a table tableA which looks something like this:
issue_id start_date end_date
issue1 2019-11-07 2020-04-30
issue2 2019-11-07 2020-01-28
I have to update the end_date based on the results of the query.
UPDATE tableA SET end_date =
(
SELECT max_end_date from update_end_date
)
WHERE issue_id = (SELECT issue_id FROM update_end_date);
It works when when query returns one result. However it fails when more than one results are returned which make sense. I cannot pre determine the results of the query so it might return more than one result. Is there any way if I can update the column with multiple values.
You could use correlated subquery:
UPDATE tableA
SET end_date = (SELECT max_end_date
from update_end_date
WHERE update_end_date.issue_id = tableA.issue_id)
WHERE issue_id IN (SELECT issue_id FROM update_end_date);
Another possibility to #Lukas solution is using proprietary PostgreSQL's syntax UPDATE FROM
UPDATE tablea
SET end_date = max_end_date
FROM update_end_date
WHERE tablea.issue_id = update_end_date.issue_id

How to get a row for each occurrence of Id in IN clause?

Given that I have a list of repeating ids that I need to fetch some additional data to populate xls spreadsheet, how can I do that. "IN" clause returns only 1 match, but I need a row for each occurrence of an Id. I looked at PIVOT, thinking I could create a select list and then do inner join.
Select m.Id, m.LegalName, m.OtherId
from MyTable m
where m.OtherId in (1,2,1,1,3,1,4,4,2,1)
You can use VALUES clause :
SELECT t.id as OtherId, m.id, m.LegalName
FROM ( VALUES (1),(2),(1),(1),(3),(1),(4),(4),(2),(1)
) t(ID) INNER JOIN
MyTable m
ON m.OtherId = t.id;

How to compare two values with SQL in Google Big Query?

I am trying to get from the Google Big Query database all records which have the same value in different columns. Let's say, when sending some event from the phone I am setting variable machine_name to the firebase user_properties. And then I am sending the event event_notification_send. And when I am querying table - I want to fetch all data from DB with events with name event_notification_send which has parameter machine_name with some value X1 and that record must at the same time have a parameter in user_properties, in key Last_notification with the same value X1.
How can I do that SQL query?
Thanks.
Here is sample of my code:
#standardSQL
SELECT *
FROM
`myProject.analytics_159820162.events_*`
WHERE
_TABLE_SUFFIX BETWEEN '20180725' AND '20180727'
AND event_name in ("event_notification_received", "event_notification_dissmissed")
AND platform = "ANDROID"
AND
(SELECT COUNTIF((key = "machine_name"))
FROM UNNEST(event_params)
) > 0 -- to see if specified event has such key
AND
(SELECT COUNTIF((key = "Last_notification"))
FROM UNNEST(user_properties)
) > 0 -- to see if specified event has such key
ORDER BY event_timestamp ASC
To check if row/event has parameters "machine_name" and "Last_notification" with same value you can use below statement
SELECT COUNT(DISTINCT key) cnt
FROM UNNEST(event_params)
WHERE key IN ("machine_name", "Last_notification")
GROUP BY value
ORDER BY cnt DESC
LIMIT 1
Assuming that the rest of your query in question is correct - below adds your criteria to it
#standardSQL
SELECT *
FROM `myProject.analytics_159820162.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20180725' AND '20180727'
AND event_name IN ("event_notification_received", "event_notification_dissmissed")
AND platform = "ANDROID"
AND (
SELECT COUNT(DISTINCT key) cnt
FROM UNNEST(event_params)
WHERE key IN ("machine_name", "Last_notification")
GROUP BY value
ORDER BY cnt DESC
LIMIT 1
) = 2
ORDER BY event_timestamp ASC
Note: using below is just to be on safe side in case if event has multiple parameters wit the same keys but different values
ORDER BY cnt DESC
LIMIT 1

db2 - How to get the min date and the next from the same table

I have a table with date attribute and i need to do a query that gets the MIN date and the next of the MIN date
And I tried that :
select min(SC.TIMESTAMP) as minDate, result.TIMESTAMP
from Event SC
INNER JOIN
(SELECT TIMESTAMP from Event
HAVING TIMESTAMP > min(SC.TIMESTAMP)
) as result on result.BUSINESSID1 = SC.BUSINESSID1
where SC.BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
and SC.LOCATIONCODE = '1';
Could you please advice how to do that ?
Thanks in Advance
Perhaps you can rearrange your query into this form:
select
min(TS), min(TS2)
from
event,
(select TS as TS2 from event where TS > (select min(TS) from event))
Add extra criteria as desired. I would try to rewrite yours, but it isn't entirely clear what the criteria for the count are supposed to be. If you are expecting more than one row (for example, the min and min2 of each LOCATIONCODE) then you will probably want a GROUP BY in there.
Also, I wouldn't call a column TIMESTAMP as it is a reserved word.
You can use the ROW_NUMBER() OLAP Function:
SELECT *
FROM (
SELECT
TIMESTAMP
,ROW_NUMBER() OVER (
PARTITION BY BUSINESSSTEP, LOCATIONCODE
ORDER BY TIMESTAMP ASC
) AS RN
FROM EVENT
WHERE BUSINESSSTEP = 'CONTAINER_PLACING_EVENT'
AND LOCATIONCODE = '1'
) A
WHERE RN < 3
This will return as rows instead of columns, but it should get you what you want. If you think your original query would have returned multiple rows (for multiple entities), you can change the PARTITION BY clause to include the column that makes them distinct.