PostgreSQL - Optimize query with multiple subqueries

PostgreSQL - Optimize query with multiple subqueries - postgresql

I have 2 tables, users and sessions. The tables look like this:
users - id (int), name (varchar)
sessions - id (int), user_id (int), ip (inet), cookie_identifier (varchar)
All columns have an index.
Now, I am trying to query all users that have a session with the same ip or cookie_identifier as a specific user.
Here is my query:
SELECT *
FROM "users"
WHERE "id" IN
(SELECT "user_id"
FROM "sessions"
WHERE "user_id" <> 1234
AND ("ip" IN
(SELECT "ip"
FROM "sessions"
WHERE "user_id" = 1234
GROUP BY "ip")
OR "cookie_identifier" IN
(SELECT "cookie_identifier"
FROM "sessions"
WHERE "user_id" = 1234
GROUP BY "cookie_identifier"))
GROUP BY "user_id")
The users table has ~200,000 rows, the sessions table has ~1.5 million rows. The query takes around 3-5 seconds.
Is it possible to optimize those results?

I would suggest, as a trial, to remove all grouping:
SELECT
*
FROM users
WHERE id IN (
SELECT
user_id
FROM sessions
WHERE user_id <> 1234
AND (ip IN (
SELECT
ip
FROM sessions
WHERE user_id = 1234
)
OR cookie_identifier IN (
SELECT
cookie_identifier
FROM sessions
WHERE user_id = 1234
)
)
)
;
If that isn't helpful, try altering the above to use EXISTS instead of IN
SELECT
*
FROM users u
WHERE EXISTS (
SELECT
NULL
FROM sessions s
WHERE s.user_id <> 1234
AND u.id = s.user_id
AND EXISTS (
SELECT
NULL
FROM sessions s2
WHERE s2.user_id = 1234
AND (s.ip = s2.ip
OR s.cookie_identifier = s2.cookie_identifier
)
)
)
;

Related

How do you find the number of users whose first/last visits are the same website

Given a table of timestamp,user_id,country,site_id.
How do you find the number of users whose first/last visits are the same website?
/* unique users first site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT MIN(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
/* unique users last site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT max(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
I am not sure how to count when these are equal?

I'd use the DISTINCT ON operator to pick out the first/last visits for each user, then aggregate over these to check if they're different. something like:
WITH first_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp
), last_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp DESC
)
SELECT user_id,
array_to_string(array_agg(DISTINCT site_id), ', ') AS sites,
MIN(timestamp) AS first_visit, MAX(timestamp) as last_visit
FROM (
SELECT * FROM first_visits
UNION ALL
SELECT * FROM last_visits) x
GROUP BY user_id
HAVING COUNT(DISTINCT site_id) = 1;

Aggregate function for corresponding row

I have the following table with a combined primary key of id and ts to implement historization:
create table "author" (
"id" bigint not null,
"ts" timestamp not null default now(),
"login" text unique not null,
primary key ("id", "ts")
);
Now I am interested only in the latest login value. Therefor I group by id:
select "id", max("ts"), "login" from "author" group by "id";
But this throws an error: login should be used in an aggregate function.
id and max("ts") uniquely identify a row because the tupple (id, ts) is the primary key. I need the login which matches the row identified by id and max("ts").
I can write a sub-select to find the login:
select ao."id", max(ao."ts"),
(select ai.login from "author" ai
where ao."id" = ai."id" and max(ao."ts") = ai.ts)
from "author" ao
group by "id";
This works but it is quite noisy and not very clever, because it searches the whole table although searching the group would be sufficient.
Does an aggregate function exist, which avoids the sub-select and gives me the remaining login, which belongs to id and max("ts")?

You have to identify the correct key to get the value you like from the table.
The correct key is:
select "id", max("ts") from "author" group by "id";
And using this to get the login you want:
select a1."id", a1.ts, a1.login
from "author" a1
inner join (select "id", max("ts") maxts, "login" from "author" group by "id") a2
ON a1.id = a2.id AND a1.ts = a2.maxts;
Alternatively using window functions:
SELECT "id", "ts", login
FROM (
select "id", "ts", CASE WHEN "ts" = max("ts") OVER (PARTITION BY "id") THEN 1 ELSE 0 END as isMax, "login" from "author" group by "id"
) dt
WHERE isMax = 1
There's a few other ways to skin this cat, but that's basically the gist.

calculate the percentage of users from CTE

I've 2 CTE. The first counts the number of users. The second does the same. It is necessary to calculate the percentage ratio between them.
Prompt how it can be done?
WITH count AS ( SELECT user_id
from users u
where u.status = 'Over'),
users as (Select user_id
from users u
where u.status LIKE 'LR'
and user_id IN (select * from count))
Select COUNT(*) From users
WITH count AS ( SELECT user_id
from users u
where u.description = 'Track'),
users as (Select user_id
from from users u
where u.status NOT LIKE 'LR'
and user_id IN (select * from count))
Select COUNT(*) From users

You can do it without CTE, just simple select with 2 counts:
SELECT count( CASE WHEN description = 'Over' AND status LIKE 'LR' THEN 1 END )
/
count( CASE WHEN description = 'Track' AND status NOT LIKE 'LR' THEN 1 END )
As Ratio
FROM users

With minimal changes, you can just do one bigger CTE:
WITH count_1 AS
(
SELECT user_id
FROM users u
WHERE u.status = 'Over'
),
users_1 AS
(
SELECT user_id
FROM users u
WHERE u.status LIKE 'LR'
AND user_id IN (SELECT user_id FROM count_1)
),
count_2 AS
(
SELECT user_id
FROM users u
WHERE u.description = 'Track'
),
users_2 AS
(
SELECT user_id
FROM users u
WHERE u.status NOT LIKE 'LR'
AND user_id IN (select user_id from count_2)
)
SELECT
CAST( (SELECT count(*) FROM users_1) AS FLOAT) /
(SELECT count(*) FROM users_2) AS ratio
NOTE 1: The query doesn't make any sense, so I guess there is some misspelling, or some columns messed up. The count_1 will choose users with a status = 'Over', the users_1 will choose the ones which have also a status = 'LR' (the result is already ZERO).
NOTE 2: You wouldn't make queries this way... The following query means exactly the same, and is much simpler (and faster):
WITH
count_1 AS
(
SELECT count(user_id) AS c
FROM users u
WHERE u.description = 'Over'
AND u.status = 'LR'
),
count_2 AS
(
SELECT count(user_id) AS c
FROM users u
WHERE u.description = 'Track'
AND u.status <> 'LR'
)
SELECT
(count_1.c + 0.0) / count_2.c AS ratio
FROM
count_1, count_2 ;

Yet another version:
SELECT count(*) FILTER (WHERE description = 'Over' AND status LIKE 'LR')
/
count(*) FILTER (WHERE description = 'Track' AND status NOT LIKE 'LR')
As Ratio
FROM users

Postgres id to name mapping in an array while creating CSV file

I have a table with id to group name mapping.
1. GroupA
2. GroupB
3. GroupC
.
.
.
15 GroupO
And I have user table with userId to group ID mapping, group ID is defined as array in user table
User1 {1,5,7}
User2 {2,5,9}
User3 {3,5,11,15}
.
.
.
I want to combine to table in such a way to retrieve userID and groupName mapping in CSV file.
for example: User1 {GroupA, GroupE, GroupG}
Essentially group ID should get replace by group name while creating CSV file.

Setup:
create table mapping(id int, group_name text);
insert into mapping
select i, format('Group%s', chr(i+ 64))
from generate_series(1, 15) i;
create table users (user_name text, user_ids int[]);
insert into users values
('User1', '{1,5,7}'),
('User2', '{2,5,9}'),
('User3', '{3,5,11,15}');
Step by step (to understand the query, see SqlFiddle):
Use unnest() to list all single user_id in a row:
select user_name, unnest(user_ids) user_id
from users
Replace user_id with group_name by joining to mapping:
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
Aggregate group_name into array for user_name:
select user_name, array_agg(group_name)
from (
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
) m
group by 1
Use the last query in copy command:
copy (
select user_name, array_agg(group_name)
from (
select user_name, group_name
from (
select user_name, unnest(user_ids) id
from users
) u
join mapping m on m.id = u.id
) m
group by 1
)
to 'c:/data/example.txt' (format csv)

Say you have two tables in this form:
Table groups
Column | Type
-----------+---------
groupname | text
groupid | integer
Table users
Column | Type
----------+----------
username | text
groupids | integer[] <-- group ids as inserted in table groups
You can query the users replacing the group id with group names with this code:
WITH users_subquery AS (select username,unnest (groupids) AS groupid FROM users)
SELECT username,array_agg(groupname) AS groups
FROM users_subquery JOIN groups ON users_subquery.groupid = groups.groupid
GROUP BY username
If you need the groups as string (useful for the csv export), surround the query with a array_to_string statement:
SELECT username, array_to_string(groups,',') FROM
(
WITH users_subquery AS (select username,unnest (groupids) AS groupid FROM users)
SELECT username,array_agg(groupname) AS groups
FROM users_subquery JOIN groups ON users_subquery.groupid = groups.groupid
GROUP BY username
) as foo;
Result:
username | groups
----------+-----------------
user1 | group1,group2
user2 | group2,group3

Query to get last conversations for user inbox

I need a specific SQL query to select last 10 conversations for user inbox.
Inbox shows only conversations(threads) with every user - it selects the last message from the conversation and shows it in inbox.
Edited.
Expecting result: to extract latest message from each of 10 latest conversations. Facebook shows latest conversations in the same way
And one more question. How to make a pagination to show next 10 latest messages from previous latest conversations in the next page?
Private messages in the database looks like:
| id | user_id | recipient_id | text
| 1 | 2 | 3 | Hi John!
| 2 | 3 | 2 | Hi Tom!
| 3 | 2 | 3 | How are you?
| 4 | 3 | 2 | Thanks, good! You?

As per my understanding, you need to get the latest message of the conversation on per-user basis (of the last 10 latest conversations)
Update: I have modified the query to get the latest_conversation_message_id for every user conversation
The below query gets the details for user_id = 2, you can modify, users.id = 2 to get it for any other user
SQLFiddle, hope this solves your purpose
SELECT
user_id,
users.name,
users2.name as sent_from_or_sent_to,
subquery.text as latest_message_of_conversation
FROM
users
JOIN
(
SELECT
text,
row_number() OVER ( PARTITION BY user_id + recipient_id ORDER BY id DESC) AS row_num,
user_id,
recipient_id,
id
FROM
private_messages
GROUP BY
id,
recipient_id,
user_id,
text
) AS subquery ON ( ( subquery.user_id = users.id OR subquery.recipient_id = users.id) AND row_num = 1 )
JOIN users as users2 ON ( users2.id = CASE WHEN users.id = subquery.user_id THEN subquery.recipient_id ELSE subquery.user_id END )
WHERE
users.id = 2
ORDER BY
subquery.id DESC
LIMIT 10
Info: The query gets the latest message of every conversation with any other user, If user_id 2, sends a message to user_id 3, that too is displayed, as it indicates the start of a conversation. The latest message of every conversation with any other user is displayed

To solve groupwise-max in pg you can use DISTINCT ON. Like this:
SELECT
DISTINCT ON(pm.user_id)
pm.user_id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id= <my user id>
ORDER BY pm.user_id, pm.id DESC;
http://sqlfiddle.com/#!12/4021d/19
To get the latest X however we will have to use it in a subselect:
SELECT
q.user_id,
q.id,
q.text
FROM
(
SELECT
DISTINCT ON(pm.user_id)
pm.user_id,
pm.id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id=2
ORDER BY pm.user_id, pm.id DESC
) AS q
ORDER BY q.id DESC
LIMIT 10;
http://sqlfiddle.com/#!12/4021d/28
To get both sent and recieved threads:
SELECT
q.user_id,
q.recipient_id,
q.id,
q.text
FROM
(
SELECT
DISTINCT ON(pm.user_id,pm.recipient_id)
pm.user_id,
pm.recipient_id,
pm.id,
pm.text
FROM
private_messages AS pm
WHERE pm.recipient_id=2 OR pm.user_id=2
ORDER BY pm.user_id,pm.recipient_id, pm.id DESC
) AS q
ORDER BY q.id DESC
LIMIT 10;
http://sqlfiddle.com/#!12/4021d/42

Paste it after your WHERE clause
ORDER BY "ColumnName" [ASC, DESC]
UNION Description at W3Schools it combines the result of this 2 statements.
SELECT "ColumnName" FROM "TableName"
UNION
SELECT "ColumnName" FROM "TableName"

For large data sets I think you might like to try running the two statements and then consolidating the results, as an index scan on (user_id and id) or (recipient_id and id) ought to be very efficient at getting the 10 most recent conversations of each type.
with sent_messages as (
SELECT *
FROM private_messages
WHERE user_id = my_user_id
ORDER BY id desc
LIMIT 10),
received_messages as ( SELECT *
FROM private_messages
WHERE recipient_id = my_user_id
ORDER BY id desc
LIMIT 10),
all_messages as (
select *
from sent_messages
union all
select *
from received_messages)
select *
from all_messages
order by id desc
limit 10
Edit: Actually another query worth trying might be:
select *
from private_messages
where id in (
select id
from (
SELECT id
FROM private_messages
WHERE user_id = my_user_id
ORDER BY id desc
LIMIT 10
union all
SELECT id
FROM private_messages
WHERE recipient_id = my_user_id
ORDER BY id desc
LIMIT 10) all_ids
order by id desc
limit 10) last_ten_ids
order by id desc
This might be better in 9.2+, where the indexes alone could be used to get the id's, or in cases where the most recent number to retrieve is very large. Still a bit unclear on that though. If in doubt I'd go for the former version.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

PostgreSQL - Optimize query with multiple subqueries - postgresql

Related

How do you find the number of users whose first/last visits are the same website

Aggregate function for corresponding row

calculate the percentage of users from CTE

Postgres id to name mapping in an array while creating CSV file

Query to get last conversations for user inbox

Categories

Resources