i've got a table that's kinda like this
id
account
111111
333-333-2
111111
333-333-1
222222
444-444-1
222222
555-555-1
222222
555-555-2
and i'm trying to aggregate everything up to look like this
id
account
111111
333-333-1, -2
222222
444-444-1, 555-555-1, -2
so far i've got this
SELECT
id,
CONCAT((STRING_AGG(DISTINCT SUBSTRING(account FROM '^(([^-]*-){2})'), ', ')),
(STRING_AGG(DISTINCT SUBSTRING(account FROM '[^-]*$'), ', '))) account
GROUP BY id
but this produces
id
account
111111
333-333-1, 2
222222
444-444-, 555-555-1, 2
, A AS (
SELECT id,
SUBSTRING(account FROM '^(([^-]*-){2})') first_account,
STRING_AGG(DISTINCT SUBSTRING(account FROM '[^-]*$'), ', ') second_account
FROM table
GROUP BY id, first_account
)
select id, STRING_AGG(DISTINCT first_account || second_account, ', ')
FROM A
GROUP BY id
i ended up figuring it out and this worked for me :))
I would suggest a different approach: first split the account numbers into main part and suffix, then do separate grouping operations on them:
SELECT
id,
string_agg(accounts, ', ') AS account
FROM (
SELECT
id,
concat(account_main, string_agg(account_suffix, ', ')) AS accounts
FROM (
SELECT
id,
substr(account, 1, 7) AS account_main,
substr(account, 8, 9) AS account_suffix
FROM
example
) AS t1
GROUP BY
id,
account_main
) AS t2
GROUP BY
id;
Related
My initial question is posted here (In psql how to run a Loop for a Select query with CTEs and get the output shown in read-only db?), which isn't defined well, so I am creating new question here.
I want to know how can I use a loop variable (or something similar) inside a Select query with CTEs .
I hope the following is a minimal reproducible example:
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
insert into persons values (4,'Smith','Eric','713 Louise Circle','Paris');
insert into persons values (5,'Smith2','Eric2','715 Louise Circle','London');
insert into persons values (8,'Smith3','Eric3','718 Louise Circle','Madrid');
Now I run the following for different values of (1,2,3)
WITH params AS
(
SELECT <ROWNUMBER> AS rownumber ),
person AS
(
SELECT personid, lastname, firstname, address
FROM params, persons
ORDER BY personid DESC
LIMIT 1
OFFSET ( SELECT rownumber - 1
FROM params) ),
filtered AS
(
SELECT *
FROM person
WHERE address ~ (SELECT rownumber::text FROM params)
)
SELECT *
FROM filtered;
and getting these outputs respectively for 1,2 and 3:
| personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 8 | Smith3 | Eric3 | 718 Louise Circle
(1 row)
| personid | lastname | firstname | address
|----------|----------|-----------|---------
(0 rows)
| personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 4 | Smith | Eric | 713 Louise Circle
(1 row)
My goal is to have a single query with loop or any other means to get the union of all 3 above select runs. I only have read-only access to db, so can't output in a new table. The GUI software I use have options to output in an internal window or export to a plain text file. The desired result would be:
|personid | lastname | firstname | address
|----------|----------|-----------|-------------------
| 4 | Smith | Eric | 713 Louise Circle
| 8 | Smith3 | Eric3 | 718 Louise Circle
(2 rows)
In reality the the loop variable is used in a more complicated way.
If I decipher this right, you basically want to select all people where the row number according to the descending ID appears in the address. The final result should then be limited to certain of these row numbers.
Then you don't need to use that cumbersome LIMIT/OFFSET construct at all. You can simply use the row_number() window function.
To filter for the row numbers you can simply use IN. Depending on what you want here you can either use a list of literals, especially if the numbers aren't consecutive. Or you can use generate_series() to generate a list of consecutive numbers. Of course you can also use a subquery, when the numbers are stored in another table.
With a list of literals that would look something like this:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (1, 2, 4);
If you want to use generate_series() an example would be:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (SELECT s.n
FROM generate_series(1, 3) s (n));
And a subquery of another table could be used like so:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
WHERE pn.address LIKE concat('%', pn.n, '%')
AND pn.n IN (SELECT t.nmuloc
FROM elbat t);
For larger sets of numbers you can also consider to use an INNER JOIN on the numbers instead of IN.
Using generate_series():
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
INNER JOIN generate_series(1, 1000000) s (n)
ON s.n = pn.n
WHERE pn.address LIKE concat('%', pn.n, '%');
Or when the numbers are in another table:
SELECT pn.personid,
pn.lastname,
pn.firstname,
pn.address,
pn.city
FROM (SELECT p.personid,
p.lastname,
p.firstname,
p.address,
p.city,
row_number() OVER (ORDER BY p.personid DESC) n
FROM persons p) pn
INNER JOIN elbat t
ON t.nmuloc = pn.n
WHERE pn.address LIKE concat('%', pn.n, '%');
Note that I also changed the regular expression pattern matching to a simple LIKE. That would make the queries a bit more portable. But you can of course replace that by any expression you really need.
db<>fiddle (with some of the variants)
Given a table of timestamp,user_id,country,site_id.
How do you find the number of users whose first/last visits are the same website?
/* unique users first site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT MIN(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
/* unique users last site*/
SELECT ts,SWE.site_id, SWE.user_id
FROM SWE
WHERE SWE.ts = (
SELECT max(t.timestamp)
FROM SWE t
WHERE
t.user_id = SWE.user_id
)
I am not sure how to count when these are equal?
I'd use the DISTINCT ON operator to pick out the first/last visits for each user, then aggregate over these to check if they're different. something like:
WITH first_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp
), last_visits AS (
SELECT DISTINCT ON (user_id) * FROM user_visits
ORDER BY user_id, timestamp DESC
)
SELECT user_id,
array_to_string(array_agg(DISTINCT site_id), ', ') AS sites,
MIN(timestamp) AS first_visit, MAX(timestamp) as last_visit
FROM (
SELECT * FROM first_visits
UNION ALL
SELECT * FROM last_visits) x
GROUP BY user_id
HAVING COUNT(DISTINCT site_id) = 1;
I have a table in PostgreSQL, something like this:
ID NAME
450 China
525 Germany
658 Austria
I’d like to query every names where ID < 500 and at the same time where ID > 500 and retrieve the result in two columns using
array_to_string(array_agg(NAME), ', ').
I need the following result:
column1 (ID < 500) column2 (ID > 500)
China Germany, Austria
Try using conditional aggregation:
SELECT
STRING_AGG(CASE WHEN ID < 500 THEN NAME END, ', ') AS ID_lt_500,
STRING_AGG(CASE WHEN ID >= 500 THEN NAME END, ', ') AS ID_gt_500
FROM yourTable;
Demo
Edit:
If you are using a version of Postgres which does not support STRING_AGG, then do as you were already doing:
SELECT
ARRAY_TO_STRING(ARRAY_AGG(CASE WHEN ID < 500 THEN NAME END), ', ') AS ID_lt_500,
ARRAY_TO_STRING(ARRAY_AGG(CASE WHEN ID >= 500 THEN NAME END), ', ') AS ID_gt_500
FROM yourTable;
Demo
Something like:
select (select string_agg(name, ', ')
from the_table
where id <= 500) as column1,
(select string_agg(name, ', ')
from the_table
where id > 500) as column2;
Alternatively:
select string_agg(name, ', ') filter (where id <= 500) as column1,
string_agg(name, ', ') filter (where id > 500) as column2
from the_table;
I'm returning a unique list of id's from the users table, where specific columns in a related table (positions) contain a matching string.
The related table may have multiple records for each user record.
The query is taking a really really long time (its not scaleable), so I'm wondering if I'm structuring the query wrong in some fundamental way?
Users Table:
id | name
-----------
1 | frank
2 | kim
3 | jane
Positions Table:
id | user_id | title | company | description
--------------------------------------------------
1 | 1 | manager | apple | 'Managed a team of...'
2 | 1 | assistant | apple | 'Assisted the...'
3 | 2 | developer | huawei | 'Build a feature that...'
For example: I want to return the user's id if a related positions record contains "apple" in either the title, company or description columns.
Query:
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%'
UPDATE
I like the idea of moving this into a join clause. But what I'm looking to do is filter users conditional on below. And I'm not sure how to do both in a join.
1) finding the keyword in title, company, description
or
2) finding the keyword with full-text search in an associated string version of a document in another table.
select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple')
So I was originally thinking it might look like,
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
(select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%')
or
(select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple'))
But then it was really slow - I can confirm the slowness is from the first condition, not the full-text search.
Might not be the best solution, but a quick option is:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON (
p.user_id = u.id
AND ( description || title || company )
LIKE '%apple%'
);
Basically got rid of the subquery, unnecessary string_agg usage, grouping on position table etc.
What it does is doing conditional join and removing duplicate is covered by distinct on.
PS! I used table aliases u and p to shorten the example
EDIT: adding also WHERE example as requested
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON ( p.user_id = u.id )
WHERE ( p.description || p.title || p.company ) LIKE '%apple%'
OR ...your other conditions...;
EDIT2: new details revealed setting new requirements of the original question. So adding new example for updated ask:
Since you doing lookups to 2 different tables (positions and uploads) with OR condition then simple JOIN wouldn't work.
But both lookups are verification type lookups - only looking does %apple% exists, then you do not need to aggregate and group by and convert the data.
Using EXISTS that returns TRUE for first match found is what you seem to need anyway. So removing all unnecessary part and using with LIMIT 1 to return positive value if first match found and NULL if not (latter will make EXISTS to become FALSE) will give you same result.
So here is how you could solve it:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
WHERE EXISTS (
SELECT 1
FROM positions p
WHERE p.users_id = u.id::int
AND ( description || title || company ) LIKE '%apple%'
LIMIT 1
)
OR EXISTS (
SELECT 1
FROM uploads up
WHERE up.user_id = u.id::int -- you had here reference to table 'document', but it doesn't exists in your example query, so I just added relation to 'upoads' table as you have in FROM, assuming 'content' column exists there
AND up.content LIKE '%apple%'
LIMIT 1
);
NB! in your example queries have references to tables/aliases like documents which doesn't reflect anywhere in the FROM part. So either you have cut in your example real query with wrong naming or you have made other way typo is something you need to verify and adjust my example query accordingly.
I have a table ProductNumberDuplicates_backups, which has two columns named ProductID and ProductNumber. There are some duplicate ProductNumbers. How can I count the distinct number of products, then print out the outcome like "() products was backup." ? Because this is inside a stored procedure, I have to use a variable #numrecord as the distinct number of rows. I put my codes like this:
set #numrecord= select distinct ProductNumber
from ProductNumberDuplicates_backups where COUNT(*) > 1
group by ProductID
having Count(ProductNumber)>1
Print cast(#numrecord as varchar)+' product(s) were backed up.'
obviously the error was after the = sign as the select can not follow it. I've search for similar cases but they are just select statements. Please help. Many thanks!
Try
select #numrecord= count(distinct ProductNumber)
from ProductNumberDuplicates_backups
Print cast(#numrecord as varchar)+' product(s) were backed up.'
begin tran
create table ProductNumberDuplicates_backups (
ProductNumber int
)
insert ProductNumberDuplicates_backups(ProductNumber)
select 1
union all
select 2
union all
select 1
union all
select 3
union all
select 2
select * from ProductNumberDuplicates_backups
declare #numRecord int
select #numRecord = count(ProductNumber) from
(select ProductNumber, ROW_NUMBER()
over (partition by ProductNumber order by ProductNumber) RowNumber
from ProductNumberDuplicates_backups) p
where p.RowNumber > 1
print cast(#numRecord as varchar) + ' product(s) were backed up.'
rollback