SELECT DISTINCT and TRIM on the fly - postgresql

I need to select some string from DB. The problem is that those strings are stored in DB in some inconvenient way. For example I have:
| "Kraków"
| "Kraków "
| "KRAKÓW"
I have to get only single name of the city - in this case: "Kraków". City names are stored in a few tables.
I tryied something like that:
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_object UNION DISTINCT
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_firms UNION DISTINCT
SELECT DISTINCT(LOWER(TRIM(city_name, ' '))) FROM cities_other UNION DISTINCT
WHERE
published = '1'
But this don't work. I think this is becouse SQL cant do it "on the fly". Any ideas?

I've just created test DB and your code almost works!
Try to check several things:
inside trim function: &nbsp, but in your data example: ;nbsp
WHERE condition (published = '1') only for third table (cities_other)
too much "distinct" statements ;)
Did you mean:
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_object WHERE published = '1'
UNION
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_firms WHERE published = '1'
UNION
SELECT DISTINCT(LOWER(TRIM(city_name, '&nbsp'))) FROM cities_other WHERE published = '1';
?
...and if this still won't help, try to give us error message you get. :-)

select distinct
lower(
regexp_replace(city_name, '^ | $', '', 'g')
) city_name
from (
select city_name from cities_object
where published = '1'
union
select city_name from cities_firms
where published = '1'
union
select city_name from cities_other
where published = '1'
) s

SELECT replace(lower(city_name), ';nbsp', '') AS city_name
FROM (
SELECT city_name FROM cities_object WHERE published = '1'
UNION ALL
SELECT city_name FROM cities_firms WHERE published = '1'
UNION ALL
SELECT city_name FROM cities_other WHERE published = '1'
) sub
GROUP BY 1
replace() removes any occurrence of ;nbsp anywhere in the string. It's not as powerful as regexp_replace(), but a lot faster. Place it after lower() to replace ;NBSP also.
But are you sure your artefact is ;nbsp, not ?
While UNION makes sense to collect data from three source tables, since you want to eliminate duplicates anyway, it may be faster to use UNION ALL and eliminate duplicates once in the final GROUP BY (or DISTINCT) step. Depends on existing indices, the number of duplicates and data distribution.
You can test performance with EXPLAIN ANALYZE.

Use a wildcard.
WHERE FirstName LIKE LOWER('Kraków%')

Related

How to avoid duplicates in the STRING_AGG function

My query is below:
select
u.Id,
STRING_AGG(sf.Naziv, ', ') as 'Ustrojstvena jedinica',
ISNULL(CONVERT(varchar(200), (STRING_AGG(TRIM(p.Naziv), ', ')), 121), '')
as 'Partner',
from Ugovor as u
left join VezaUgovorPartner as vup
on vup.UgovorId = u.Id AND vup.IsDeleted = 'false'
left join [TEST_MaticniPodaci2].dbo.Partner as p
on p.PartnerID = vup.PartnerId
left join [dbo].[VezaUgovorUstrojstvenaJedinica] as vuu
on vuu.UgovorId = u.Id
left join [TEST_MaticniPodaci2].hcphs.SifZavod as sf
on sf.Id = vuu.UstrojstvenaJedinicaId
left join [dbo].[SifVrstaUgovora] as vu
on u.VrstaUgovoraId = vu.Id
group by u.Id, sf.Naziv
My problem is that I can have more sf.Naziv and also only one sf.Naziv so I have to check if there is one and then show only one result and if there is two or more to show more results. But for now the problem is when I have only one sf.Naziv, query returns two sf.Naziv with the same name because in first STRING_AGG i have more records about p.Naziv.
I have no idea how to implement DISTINCT into STRING_AGG function
Any other solutions are welcome, but I think it should work with DISTINCT function.
It looks like distinct won't work, so what you should do is put your whole query in a subquery, remove the duplicates there, then do STRING_AGG on the data that has no duplicates.
SELECT STRING_AGG(data)
FROM (
SELECT DISTINCT FROM ...
)
I like this format for distinct values:
(d is required but you can use any variable name there)
SELECT STRING_AGG(LoadNumber, ',') as LoadNumbers FROM (SELECT DISTINCT LoadNumber FROM [ASN]) d
A sample query to remove duplicates while using STRING_AGG().
WITH cte AS (
SELECT DISTINCT product
FROM activities
)
SELECT STRING_AGG(product, ',') products
FROM cte;
Or you can use the following query. The result is same -
SELECT STRING_AGG(product, ',') as products
from (
SELECT product
FROM Activities
GROUP BY product
) as _ ;

LIKE search of joined and concatenated records is really slow (PostgreSQL)

I'm returning a unique list of id's from the users table, where specific columns in a related table (positions) contain a matching string.
The related table may have multiple records for each user record.
The query is taking a really really long time (its not scaleable), so I'm wondering if I'm structuring the query wrong in some fundamental way?
Users Table:
id | name
-----------
1 | frank
2 | kim
3 | jane
Positions Table:
id | user_id | title | company | description
--------------------------------------------------
1 | 1 | manager | apple | 'Managed a team of...'
2 | 1 | assistant | apple | 'Assisted the...'
3 | 2 | developer | huawei | 'Build a feature that...'
For example: I want to return the user's id if a related positions record contains "apple" in either the title, company or description columns.
Query:
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%'
UPDATE
I like the idea of moving this into a join clause. But what I'm looking to do is filter users conditional on below. And I'm not sure how to do both in a join.
1) finding the keyword in title, company, description
or
2) finding the keyword with full-text search in an associated string version of a document in another table.
select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple')
So I was originally thinking it might look like,
select
distinct on (users.id) users.id,
users.name,
...
from users
where (
(select
string_agg(distinct users.description, ', ') ||
string_agg(distinct users.title, ', ') ||
string_agg(distinct users.company, ', ')
from positions
where positions.users_id::int = users.id
group by positions.users_id::int) like '%apple%')
or
(select
to_tsvector(string_agg(distinct documents.content, ', '))
from documents
where users.id = documents.user_id
group by documents.user_id) ## to_tsquery('apple'))
But then it was really slow - I can confirm the slowness is from the first condition, not the full-text search.
Might not be the best solution, but a quick option is:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON (
p.user_id = u.id
AND ( description || title || company )
LIKE '%apple%'
);
Basically got rid of the subquery, unnecessary string_agg usage, grouping on position table etc.
What it does is doing conditional join and removing duplicate is covered by distinct on.
PS! I used table aliases u and p to shorten the example
EDIT: adding also WHERE example as requested
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
JOIN positions p ON ( p.user_id = u.id )
WHERE ( p.description || p.title || p.company ) LIKE '%apple%'
OR ...your other conditions...;
EDIT2: new details revealed setting new requirements of the original question. So adding new example for updated ask:
Since you doing lookups to 2 different tables (positions and uploads) with OR condition then simple JOIN wouldn't work.
But both lookups are verification type lookups - only looking does %apple% exists, then you do not need to aggregate and group by and convert the data.
Using EXISTS that returns TRUE for first match found is what you seem to need anyway. So removing all unnecessary part and using with LIMIT 1 to return positive value if first match found and NULL if not (latter will make EXISTS to become FALSE) will give you same result.
So here is how you could solve it:
SELECT DISTINCT ON ( u.id ) u.id,
u.name
FROM users u
WHERE EXISTS (
SELECT 1
FROM positions p
WHERE p.users_id = u.id::int
AND ( description || title || company ) LIKE '%apple%'
LIMIT 1
)
OR EXISTS (
SELECT 1
FROM uploads up
WHERE up.user_id = u.id::int -- you had here reference to table 'document', but it doesn't exists in your example query, so I just added relation to 'upoads' table as you have in FROM, assuming 'content' column exists there
AND up.content LIKE '%apple%'
LIMIT 1
);
NB! in your example queries have references to tables/aliases like documents which doesn't reflect anywhere in the FROM part. So either you have cut in your example real query with wrong naming or you have made other way typo is something you need to verify and adjust my example query accordingly.

How to sort this data inside the field based on the recent Date

I have a data in the field as " Date: 03-21-13 12/13/14/15 Date:04-21-13 39/12/34/14 Date:04-19-13 19/45/65/12 ".How to sort this data inside the field based on the recent Date.
It should Look like
Date:04-21-13 39/12/34/14
Date:04-19-13 19/45/65/12
Date: 03-21-13 12/13/14/15
Because you are storing it as text, you cannot correctly sort directly on the column (as you appear to have discovered). You will need to split the column, and then sort on that. Something like:
Declare #tvTable Table (
TextColumn varchar(max)
)
Insert #tvTable
Select '04-19-13 19/45/65/12'
Union All
Select '04-21-13 39/12/34/14'
Union All
Select '03-21-13 12/13/14/15'
Union All
Select '03-25-13 17/18/19/20'
Union All
Select '05-01-13 99/88/77/66'
Union All
Select '02-01-13 11/22/33/44'
Select t.TextColumn
From #tvTable t
Cross Apply dbo.fncDelimitedSplit8k(TextColumn, ' ') split
Where split.ItemNumber = 1
Order By Cast(split.Item As DateTime) Desc
The split function taken from Jeff Moden Tally OH!

T-SQL how to count the number of duplicate rows then print the outcome?

I have a table ProductNumberDuplicates_backups, which has two columns named ProductID and ProductNumber. There are some duplicate ProductNumbers. How can I count the distinct number of products, then print out the outcome like "() products was backup." ? Because this is inside a stored procedure, I have to use a variable #numrecord as the distinct number of rows. I put my codes like this:
set #numrecord= select distinct ProductNumber
from ProductNumberDuplicates_backups where COUNT(*) > 1
group by ProductID
having Count(ProductNumber)>1
Print cast(#numrecord as varchar)+' product(s) were backed up.'
obviously the error was after the = sign as the select can not follow it. I've search for similar cases but they are just select statements. Please help. Many thanks!
Try
select #numrecord= count(distinct ProductNumber)
from ProductNumberDuplicates_backups
Print cast(#numrecord as varchar)+' product(s) were backed up.'
begin tran
create table ProductNumberDuplicates_backups (
ProductNumber int
)
insert ProductNumberDuplicates_backups(ProductNumber)
select 1
union all
select 2
union all
select 1
union all
select 3
union all
select 2
select * from ProductNumberDuplicates_backups
declare #numRecord int
select #numRecord = count(ProductNumber) from
(select ProductNumber, ROW_NUMBER()
over (partition by ProductNumber order by ProductNumber) RowNumber
from ProductNumberDuplicates_backups) p
where p.RowNumber > 1
print cast(#numRecord as varchar) + ' product(s) were backed up.'
rollback

Filter union result

I'm making select with a union.
SELECT * FROM table_1
UNION
SELECT * FROM table_2
Is it possible to filter query results by column values?
Yes, you can enclose your entire union inside another select:
select * from (
select * from table_1 union select * from table_2) as t
where t.column = 'y'
You have to introduce the alias for the table ("as t"). Also, if the data from the tables is disjoint, you might want to consider switching to UNION ALL - UNION by itself works to eliminate duplicates in the result set. This is frequently not necessary.
A simple to read solution is to use a CTE (common table expression). This takes the form:
WITH foobar AS (
SELECT foo, bar FROM table_1
UNION
SELECT foo, bar FROM table_2
)
Then you can refer to the CTE in subsequent queries by name, as if it were a normal table:
SELECT foo,bar FROM foobar WHERE foo = 'value'
CTEs are quite powerful, I recommend further reading here
One tip that you will not find in that MS article is; if you require more than one CTE put a comma between the expression statements. eg:
WITH foo AS (
SELECT thing FROM place WHERE field = 'Value'
),
bar AS (
SELECT otherthing FROM otherplace WHERE otherfield = 'Other Value'
)
If you want to filter the query based on some criteria then you could do this -
Select * from table_1 where table_1.col1 = <some value>
UNION
Select * from table_2 where table_2.col1 = <some value>
But, I would say if you want to filter result to find the common values then you can use joins instead
Select * from table_1 inner join table_2 on table_1.col1 = table_2.col1