Postgres select and merge rows - postgresql

I am working on a table which looks something like this:
user_id | key | scope | value
--------+-------------+-----------+-------
1 | someSetting | user | false
1 | someSetting | group | true
1 | someSetting | company | false
2 | someSetting | user | false
2 | someSetting | group | true
3 | someSetting | user | false
4 | someSetting | group | true
The settings are in a hierarchy: company -> group -> user, with the user overriding the group which in turn overrides the company. When querying by user_id, I want to effectively merge the settings by this hierarchy, if it exists. For the above sample, I want to see this as the result:
user_id | key | value
--------+-------------+-------
1 | someSetting | false
2 | someSetting | true
3 | someSetting | false
4 | someSetting | true
I am currently doing the merge operation after the rows are retrieved from Postgres, but it would be more efficient if this can be done in the query itself. I looked at aggregate functions, but doesn't look like any of them fit my requirement.
This seems simple enough that I'm sure it can be done using Postgres. Any pointers appreciated!

You can use the ROW_NUMBER() window function with a PARTITION BY and a pretty cool ORDER BY.
Idea:
Get a ROW_NUMBER for every record with the same user_id and ORDER BY a custom sort order.
SELECT everything you want from the CTE a WHERE row number is 1.
Example:
WITH a AS
(
SELECT user_id
, key
, scope
, ROW_NUMBER() OVER(PARTITION BY user_id
ORDER BY array_position(array['user','group','company'], scope)) AS rno
FROM test
)
SELECT user_id
, key
, scope
FROM a
WHERE rno = 1;
DBFiddle to show it work.
Bonus:
If you were to make a function to do this you could even pass in other arrays for setting a custom sort order.

What you want to do is to change your scope settings from separate rows to separate columns, so your recordset looks like this (note I'm using 0 for false and 1 for true):
+---------+-------------+--------------+---------------+-----------------+
| user_id | key | user_setting | group_setting | company_setting |
+---------+-------------+--------------+---------------+-----------------+
| 1 | someSetting | 0 | 1 | 0 |
| 2 | someSetting | 0 | 1 | NULL |
| 3 | someSetting | 0 | NULL | NULL |
| 4 | someSetting | NULL | 1 | NULL |
+---------+-------------+--------------+---------------+-----------------+
To do this, you have a few options. Here's one of them, using conditional aggregation. Basically, you group by user_id and key, then combine an aggregate function (it can be either MIN or MAX) with a CASE statement:
WITH
settings_pivot AS
(
SELECT
[user_id],
[key],
MIN(CASE WHEN [scope] = 'user' THEN [value] ELSE NULL END) AS user_setting,
MIN(CASE WHEN [scope] = 'group' THEN [value] ELSE NULL END) AS group_setting,
MIN(CASE WHEN [scope] = 'company' THEN [value] ELSE NULL END) AS company_setting
FROM settings
GROUP BY
[user_id],
[key]
)
SELECT
[user_id],
[key],
COALESCE(user_setting, group_setting, company_setting) AS derived_setting
FROM settings_pivot
If you just SELECT * from the settings_pivot CTE, you'll get the pivoted data I have at the beginning. Using COALESCE, however, you give precedence as you indicate.
Note: I'm using SQL Server, since Postgres on my machine doesn't want to boot up. So you'll have to replace the square braces with double quotes: "user_id" instead of [user_id].

Related

Counting consecutive days in postgres

I'm trying to count the number of consecutive days in two tables with the following structure:
| id | email | timestamp |
| -------- | -------------- | -------------- |
| 1 | hello#example.com | 2021-10-22 00:35:22 |
| 2 | hello2#example.com | 2021-10-21 21:17:41 |
| 1 | hello#example.com | 2021-10-19 00:35:22 |
| 1 | hello#example.com | 2021-10-18 00:35:22 |
| 1 | hello#example.com | 2021-10-17 00:35:22 |
I would like to count the number of consecutive days of activity. The data above would show:
| id | email | length |
| -------- | -------------- | -- |
| 1 | hello#example.com | 1 |
| 2 | hello2#example.com | 1 |
| 1 | hello#example.com | 3 |
This is made more difficult because I need to join the two tables using a UNION (or something similar and then run the grouping. I tried to build on this query (Finding the length of a series in postgres) but I'm unable to group by consecutive days.
select max(id) as max_id, email, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from began_playing_video
window
wp as (partition by email order by id desc),
wa as (order by id desc)
) s
group by email, grp
order by 1 desc
Any ideas on how I could do this in Postgres?
First create an aggregate function in order to count the adjacent dates within an ascendant ordered list. The jsonb data type is used because it allows to mix various data types inside the same array :
CREATE OR REPLACE FUNCTION count_date(x jsonb, y jsonb, d date)
RETURNS jsonb LANGUAGE sql AS
$$
SELECT CASE
WHEN d IS NULL
THEN COALESCE(x,y)
ELSE
to_jsonb(d :: text)
|| CASE
WHEN COALESCE(x,y) = '[]' :: jsonb
THEN '[1]' :: jsonb
WHEN COALESCE(x->>0, y->>0) :: date + 1 = d :: date
THEN jsonb_set(COALESCE(x-0, y-0), '{-1}', to_jsonb(COALESCE(x->>-1, y->>-1) :: integer + 1))
ELSE COALESCE(x-0, y-0) || to_jsonb(1)
END
END ;
$$
DROP AGGREGATE IF EXISTS count_date(jsonb, date) ;
CREATE AGGREGATE count_date(jsonb, date)
(
sfunc = count_date
, stype = jsonb
) ;
Then iterate on the count_date on your table grouped by id :
WITH list AS (
SELECT id, email, count_date('[]', timestamp ORDER BY timestamp :: timestamp) as count_list
FROM your_table
GROUP BY id, email
)
SELECT id, email, jsonb_array_elements(count_list-0) AS length
FROM list

Reset column with numeric value that represents the order when destroying a row

I have a table of users that has a column called order that represents the order in they will be elected.
So, for example, the table might look like:
| id | name | order |
|-----|--------|-------|
| 1 | John | 2 |
| 2 | Mike | 0 |
| 3 | Lisa | 1 |
So, say that now Lisa gets destroyed, I would like that in the same transaction that I destroy Lisa, I am able to update the table so the order is still consistent, so the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 2 | Mike | 0 |
Or, if Mike were the one to be deleted, the expected result would be:
| id | name | order |
|-----|--------|-------|
| 1 | John | 1 |
| 3 | Lisa | 0 |
How can I do this in PostgreSQL?
If you are just deleting one row, one option uses a cte and the returning clause to then trigger an update
with del as (
delete from mytable where name = 'Lisa'
returning ord
)
update mytable
set ord = ord - 1
from del d
where mytable.ord > d.ord
As a more general approach, I would really recommend trying to renumber the whole table after every delete. This is inefficient, and can get tedious for multi-rows delete.
Instead, you could build a view on top of the table:
create view myview as
select id, name, row_number() over(order by ord) ord
from mytable

Postgresql use more than one row as expression in sub query

As the title says, I need to create a query where I SELECT all items from one table and use those items as expressions in another query. Suppose I have the main table that looks like this:
main_table
-------------------------------------
id | name | location | //more columns
---|------|----------|---------------
1 | me | pluto | //
2 | them | mercury | //
3 | we | jupiter | //
And the sub query table looks like this:
some_table
---------------
id | item
---|-----------
1 | sub-col-1
2 | sub-col-2
3 | sub-col-3
where each item in some_table has a price which is in an amount_table like so:
amount_table
--------------
1 | 1000
2 | 2000
3 | 3000
So that the query returns results like this:
name | location | sub-col-1 | sub-col-2 | sub-col-3 |
----------------------------------------------------|
me | pluto | 1000 | | |
them | mercury | | 2000 | |
we | jupiter | | | 3000 |
My query currently looks like this
SELECT name, location, (SELECT item FROM some_table)
FROM main_table
INNER JOIN amount_table WHERE //match the id's
But I'm running into the error more than one row returned by a subquery used as an expression
How can I formulate this query to return the desired results?
you should decide on expected result.
to get one-tp-many relation:
SELECT name, location, some_table.item
FROM main_table
JOIN some_table on true -- or id if they match
INNER JOIN amount_table --WHERE match the id's
to get one-to-one with all rows:
SELECT name, location, (SELECT array_agg(item) FROM some_table)
FROM main_table
INNER JOIN amount_table --WHERE //match the id's

NULL help in T_SQL script

On SQL Server 2008R2, I am using this script:
SELECT a.id,
a.ea1,
b.ea2
FROM database1table1 AS a
WHERE a.id LIKE N'Active;
The result set looks like this:
+-----+-----+---------------+---------------+
| Row | ID | EA1 | EA2 |
+-----+-----+---------------+---------------+
| 1 | 1 | wf#email.co | NULL |
| 2 | 1 | NULL | wf2#email.co |
| 3 | 1 | NULL | NULL |
| 4 | 2 | NULL | NULL |
| 5 | 3 | wf3#email.co | NULL |
+-----+-----+---------------+---------------+
etc.
ID = business number.
EA = email address.
In the above output, there are three rows where ID=1, but only two of those have email addresses.
I want my result to output the rows where there is no email address. So for this example, the output should only include rows where ID=2.
I have tried adding this WHERE clause:
AND (a.EA1 IS NULL) AND (a.EA2 IS NULL);
It's still returning rows where ID=1, because one of the rows there has no email address.
Can anyone please suggest an amendment to my script which would only return the row where ID=2?
Many thanks
Try with NOT EXISTS
SELECT
*
FROM
Tbl T
WHERE
T.EA1 IS NULL AND
T.EA2 IS NULL AND
NOT EXISTS
(
SELECT 1 FROM Tbl IT
WHERE
IT.ID = T.ID AND
(
IT.EA1 IS NOT NULL OR
IT.EA2 IS NOT NULL
)
)
;WITH CTE
AS
(
SELECT ID,MAX(ROW) AS RW,MAX(EA1) AS EA1,MAX(EA2) AS EA2
FROM #TEMP GROUP BY ID
)
SELECT * FROM CTE WHERE EA1 IS NULL AND EA2 IS NULL
Output:
ID RW EA1 EA2
2 4 NULL NULL

Updating multiple rows with a certain value from the same table

So, I have the next table:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | null |
12:10:00| update | null |
12:15:00| insert | null |
12:20:00| out | null |
12:30:00| access | 2 |
12:35:00| select | null |
The table is bigger (aprox 1-1,5 mil rows) and there will be ID equal to 2,3,4 etc and rows between.
The following should be the result:
time | name | ID |
12:00:00| access | 1 |
12:05:00| select | 1 |
12:10:00| update | 1 |
12:15:00| insert | 1 |
12:20:00| out | 1 |
12:30:00| access | 2 |
12:35:00| select | 2 |
What is the most simple method to update the rows without making the log full? Like, one ID at a time.
You can do it with a sub query:
UPDATE YourTable t
SET t.ID = (SELECT TOP 1 s.ID
FROM YourTable s
WHERE s.time < t.time AND s.name = 'access'
ORDER BY s.time DESC)
WHERE t.name <> 'access'
Index on (ID,time,name) will help.
You can do it using CTE as below:
;WITH myCTE
AS ( SELECT time
, name
, ROW_NUMBER() OVER ( PARTITION BY name ORDER BY time ) AS [rank]
, ID
FROM YourTable
)
UPDATE myCTE
SET myCTE.ID = myCTE.rank
SELECT *
FROM YourTable ORDER BY ID