postgres concat in group by - postgresql

One question about Postgresql selects. This works as it should:
SELECT
name,SUM(cash)
FROM
costumers
GROUP BY (name)
but how can I concat two (or more) fields in the GROUP BY clause?
This is what I tried:
SELECT
name,SUM(cash)
FROM
costumers
GROUP BY (name || ' ' || nickname)

That will work, except that you need to select the expression you group by:
SELECT
(name || ' ' || nickname) AS name_and_nickname,
SUM(cash) AS total_cash
FROM costumers
GROUP BY (name || ' ' || nickname)
Another option is to group by two fields by separating them with a comma:
SELECT
name, nickname, SUM(cash) AS total_cash
FROM costumers
GROUP BY name, nickname
Note that these two are not exactly equivalent. In particular these two rows will end up in the same group with the first version and in different groups in the second version:
name | nickname | cash
--------+-----------+----
foo | bar baz | 10
foo bar | baz | 20
The second option is probably what you mean.

Related

SQL WHERE condition that one field's string can be found in another field

Here's some sample data
ID | sys1lname | sys2lname
------------------------------------
1 | JOHNSON | JOHNSON
2 | FULTON | ANDERS-FULTON
3 | SMITH | SMITH-DAVIDS
4 | HARRISON | JONES
The goal is to find records where the last names do NOT match, BUT allow when sys1lname can be found somewhere within sys2lname, which may or may not be a hyphenated name. So from the above data, only record 4 should return.
When I put this (SUBSTRING(sys2lname, CHARINDEX(sys2lname, ccm.NAME_LAST), LEN(sys1lname))) in the SELECT statement it will properly return the part of sys2lname that matches sys1lname.
But when I use that in the WHERE clause
WHERE 1=1
AND sys1lname <> sys2lname
OR sys1lname not in ('%' + (SUBSTRING(sys2lname, CHARINDEX(sys1lname, sys2lname), LEN(sys1lname))))
the records with hyphenated names are in the result set.
And I can't figure out why.
Just use a NOT LIKE:
SELECT ID
FROM dbo.YourTable
WHERE sys2lname NOT LIKE '%' + sys1lname + '%';
If you could have a name like 'Smith' in sys1lname and 'BlackSmith' (or even 'Green-Blacksmith') in sys2lname and don't want them to match, I would use STRING_SPLIT and a NOT EXISTS:
SELECT ID
FROM dbo.YourTable YT
WHERE NOT EXISTS (SELECT 1
FROM STRING_SPLIT(YT.sys2lname,'-') SS
WHERE SS.[value] = YT.sys1lname);

LIKE search of joined and concatenated records is really slow (PostgreSQL)

I'm returning a unique list of id's from the users table, where specific columns in a related table (positions) contain a matching string.
The related table may have multiple records for each user record.
The query is taking a really really long time (its not scaleable), so I'm wondering if I'm structuring the query wrong in some fundamental way?
Users Table:
id | name
-----------
1 | frank
2 | kim
3 | jane
Positions Table:
id | user_id | title | company | description
--------------------------------------------------
1 | 1 | manager | apple | 'Managed a team of...'
2 | 1 | assistant | apple | 'Assisted the...'
3 | 2 | developer | huawei | 'Build a feature that...'
For example: I want to return the user's id if a related positions record contains "apple" in either the title, company or description columns.
Query:
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%'
UPDATE
I like the idea of moving this into a join clause. But what I'm looking to do is filter users conditional on below. And I'm not sure how to do both in a join.
1) finding the keyword in title, company, description
or
2) finding the keyword with full-text search in an associated string version of a document in another table.
select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple')
So I was originally thinking it might look like,
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
(select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%')
or
(select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple'))
But then it was really slow - I can confirm the slowness is from the first condition, not the full-text search.
Might not be the best solution, but a quick option is:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON (
p.user_id = u.id
AND ( description || title || company )
LIKE '%apple%'
);
Basically got rid of the subquery, unnecessary string_agg usage, grouping on position table etc.
What it does is doing conditional join and removing duplicate is covered by distinct on.
PS! I used table aliases u and p to shorten the example
EDIT: adding also WHERE example as requested
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON ( p.user_id = u.id )
WHERE ( p.description || p.title || p.company ) LIKE '%apple%'
OR ...your other conditions...;
EDIT2: new details revealed setting new requirements of the original question. So adding new example for updated ask:
Since you doing lookups to 2 different tables (positions and uploads) with OR condition then simple JOIN wouldn't work.
But both lookups are verification type lookups - only looking does %apple% exists, then you do not need to aggregate and group by and convert the data.
Using EXISTS that returns TRUE for first match found is what you seem to need anyway. So removing all unnecessary part and using with LIMIT 1 to return positive value if first match found and NULL if not (latter will make EXISTS to become FALSE) will give you same result.
So here is how you could solve it:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
WHERE EXISTS (
SELECT 1
FROM positions p
WHERE p.users_id = u.id::int
AND ( description || title || company ) LIKE '%apple%'
LIMIT 1
)
OR EXISTS (
SELECT 1
FROM uploads up
WHERE up.user_id = u.id::int -- you had here reference to table 'document', but it doesn't exists in your example query, so I just added relation to 'upoads' table as you have in FROM, assuming 'content' column exists there
AND up.content LIKE '%apple%'
LIMIT 1
);
NB! in your example queries have references to tables/aliases like documents which doesn't reflect anywhere in the FROM part. So either you have cut in your example real query with wrong naming or you have made other way typo is something you need to verify and adjust my example query accordingly.

string_agg: more than two attributes concatenation

I am using postgresql 9.0
I am wonder if its possible to concatenate three attributes together.
this is how I concatenate two attributes (book & the comma):
SELECT string_agg(book, ',') FROM authors where id = 1;
| book1,book2,book3|
--------------------
how can I do something like below:
SELECT string_agg(name, ':', book, ',') FROM authors where id = 1;
| Ahmad: book1,book2,book3|
----------------
can some one help? thanks.
Just concatenate the fields like this:
SELECT name || ':' || string_agg(book, ',') FROM authors where id = 1;
Edit:
If your SQL returns multiple names you need to group by name (if you have multiple authors with the same name it gets a bit more complicated. I won't cover that case in this answer):
SELECT name || ':' || string_agg(book, ',')
FROM authors where id = 1
GROUP BY name;
If you want the books in alphabetical order you can add an ORDER BY for the books:
SELECT name || ':' || string_agg(book, ',') WITHIN GROUP ORDER BY book
FROM authors where id = 1
GROUP BY name;
SELECT name || ': ' || string_agg(book, ',') FROM authors where id = 1 group by name ;

SELECT DISTINCT and TRIM on the fly

I need to select some string from DB. The problem is that those strings are stored in DB in some inconvenient way. For example I have:
| "Kraków"
| "Kraków "
| "KRAKÓW"
I have to get only single name of the city - in this case: "Kraków". City names are stored in a few tables.
I tryied something like that:
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_object UNION DISTINCT
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_firms UNION DISTINCT
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_other UNION DISTINCT
WHERE
published = '1'
But this don't work. I think this is becouse SQL cant do it "on the fly". Any ideas?
I've just created test DB and your code almost works!
Try to check several things:
inside trim function: &nbsp, but in your data example: ;nbsp
WHERE condition (published = '1') only for third table (cities_other)
too much "distinct" statements ;)
Did you mean:
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_object WHERE published = '1'
UNION
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_firms WHERE published = '1'
UNION
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_other WHERE published = '1';
?
...and if this still won't help, try to give us error message you get. :-)
select distinct
lower(
regexp_replace(city_name, '^ | $', '', 'g')
) city_name
from (
select city_name from cities_object
where published = '1'
union
select city_name from cities_firms
where published = '1'
union
select city_name from cities_other
where published = '1'
) s
SELECT replace(lower(city_name), ';nbsp', '') AS city_name
FROM (
SELECT city_name FROM cities_object WHERE published = '1'
UNION ALL
SELECT city_name FROM cities_firms WHERE published = '1'
UNION ALL
SELECT city_name FROM cities_other WHERE published = '1'
) sub
GROUP BY 1
replace() removes any occurrence of ;nbsp anywhere in the string. It's not as powerful as regexp_replace(), but a lot faster. Place it after lower() to replace ;NBSP also.
But are you sure your artefact is ;nbsp, not ?
While UNION makes sense to collect data from three source tables, since you want to eliminate duplicates anyway, it may be faster to use UNION ALL and eliminate duplicates once in the final GROUP BY (or DISTINCT) step. Depends on existing indices, the number of duplicates and data distribution.
You can test performance with EXPLAIN ANALYZE.
Use a wildcard.
WHERE FirstName LIKE LOWER('Kraków%')

Converting Access Pivot Table to SQL Server

I'm having trouble converting a MS Access pivot table over to SQL Server. Was hoping someone might help..
TRANSFORM First(contacts.value) AS FirstOfvalue
SELECT contacts.contactid
FROM contacts RIGHT JOIN contactrecord ON contacts.[detailid] = contactrecord.[detailid]
GROUP BY contacts.contactid
PIVOT contactrecord.wellknownname
;
Edit: Responding to some of the comments
Contacts table has three fields
contactid | detailid | value |
1 1 Scott
contactrecord has something like
detailid | wellknownname
1 | FirstName
2 | Address1
3 | foobar
contractrecord is dyanamic in that the user at anytime can create a field to be added to contacts
the access query pulls out
contactid | FirstName | Address1 | foobar
1 | Scott | null | null
which is the pivot on the wellknownname. The key here is that the number of columns is dynamic since the user can, at anytime, create another field for the contact. Being new to pivot tables altogether, I'm wondering how I can recreate this access query in sql server.
As for transform... that's a built in access function. More information is found about it here. First() will just take the first result on that matching row.
I hope this helps and appreciate all the help.
I quick search for dynamic pivot tables comes up with this article.
After renaming things in his last query on the page I came up with this:
DECLARE #PivotColumnHeaders VARCHAR(max);
SELECT #PivotColumnHeaders = COALESCE(#PivotColumnHeaders + ',['+ CAST(wellknownname as varchar) + ']','['+ CAST(wellknownname as varchar) + ']')
FROM contactrecord;
DECLARE #PivotTableSQL NVARCHAR(max);
SET #PivotTableSQL = N'
SELECT *
FROM (
SELECT
c.contactid,
cr.wellknownname,
c.value
FROM contacts c
RIGHT JOIN contactrecord cr
on c.detailid = cr.detailid
) as pivotData
pivot(
min(value)
for wellknownname in (' + #PivotColumnHeaders +')
) as pivotTable
'
;
execute(#PivotTableSQL);
which despite its ugliness, it does the job