PostgreSQL: Selecting one address from almost but not exactly duplicate rows - postgresql

I have a big table that I'm trying to join another table to, however the table has entries such as:
--- Name | Address | Priority
----------------------------------------
1 | Jane Doe | 123 Baker St | 1
2 | Jane Doe | 345 Clay Dr | 2
3 | Jeff Boe | 231 Street St| 1
4 | Karen Al | 4232 Elm St | 1
5 | Karen Al | 5632 Pine Ct | 2
What I really want to select is one single address per person. The correct address I want is priority 2. However some of the addresses don't have a priority 2, so I can't join only on priority 2.
I've tried the following test query:
SELECT DISTINCT n.ID, LastName, FirstName, MAX(Address), MAX(Address2), City, State, PostalCode, n.Phone
FROM NormalTable n
JOIN Contracts cn ON n.ID = cn.ID
Which returns the table that I sketched out above, with the same person/sameID but different addresses.
Is there a way to do this in one query? I can think of maybe doing one INSERT statement into my final table where I do all the priority 2 addresses and then ANOTHER INSERT statement for IDs that aren't in the table yet, and use the priority 1 address for those. But I'd much prefer if there's a way to do this all in one go where I end up with only the address I want.

You could choice the address you need joining a subquery for max priority
select m.LastName, m.FirstName, m.Address, m.Address2, m.City, m.State, m.PostalCode, m.Phone
from my_table m
inner join (
select LastName, FirstName, max(priority) max_priority
from my_table
group by LastName, FirstName
) t on t.LastName = m.LastName
AND t.FirstName = m.FirstName
AND t.max_priority = m.priority

I think you want something like this
SELECT DISTINCT (Name), Address, Priority
ORDER BY Priority DESC
How this works is that the DISTINCT (Name) only returns one row per name. The row returned for each Name is the first row. Which will be the one with the highest priority because of the ORDER BY.

Related

order by case alphabetical ordering

Question is not too specific but not sure how to explain the question well.
I have table in my db with names in the field.
I would like to order the names in the way if the name starts with certain alphabet, order that first and so on.
What I have now is
SELECT
(T.firstname||' '||T.lastname) as Full_Name
FROM
TABLE T
ORDER BY
CASE
WHEN LPAD(T.firstname, 1) = 'J' THEN T.firstname
WHEN LPAD(T.firstname, 1) = 'B' THEN T.firstname
END DESC,
Full_Name ASC
Now this returns as what I would like to see, name starting with 'J' is ordered first then 'B' then the rest.
However, the result looks like
What I get What I want
Full_Name Full_Name
---------- ----------
Junior MR James A
John Doe Joe Bob
Joe Bob John Doe
James A Junior MR
Brad T B Test
Bob Joe Bb Test
Bb Test Bob Joe
B Test Brad T
A Test A Test
Aa Test Aa Test
AFLKJASDFJ AFLKJASDFJ
Ann Doe Ann Doe
But what I want is that J and B to be sorted alphabetical order as well, right now it is doing reverse alphabetical order.
How can I specify the order inside of case?
I tried having 2 seperate case statement for different cases for starting with 'J' and 'B', it just shows me the same result
Just make one extra column, material using triggers or volatile using expression only executed when select is run, and then use it in sorting.
For secondary sorting use original components of names, not the expression bringing both names together thus destroying the information which was which.
Examples: https://dbfiddle.uk/?rdbms=firebird_3.0&fiddle=fbf89b3903d3271ae6c55589fd9cfe23
create table T (
firstname varchar(10),
lastname varchar(10),
fullname computed by (
Coalesce(firstname, '-') || ' ' || Coalesce(T.lastname, '-')
),
sorting_helper computed by (
CASE WHEN firstname starting with 'J' then 100
WHEN firstname starting with 'B' then 50
ELSE 0
END
)
)
Notice the important distinction: my helper expression is "ranking" one. It yields one of several pre-defined ranks, thus putting "James" and "Joe" into the same bin having exactly the same ranking value. Your expression still yields the names themselves, thus erroneously keeping difference between those names. But you do NOT want that difference, you told you want all J-started names to be moved upwards and then sorted among themselves by usual rules. So, just do what you say, make an expression that pulls all J-names together WITHOUT making distinction between those.
insert into T
select
'John', 'Doe'
from rdb$database union all select
'James', 'A'
from rdb$database union all select
'Aa ', 'Test'
from rdb$database union all select
'Ann', 'Doe'
from rdb$database union all select
'Bob', 'Joe'
from rdb$database union all select
'Brad', 'Test'
from rdb$database union all select
NULL, 'Smith'
from rdb$database union all select
'Ken', NULL
from rdb$database
8 rows affected
select * from T
FIRSTNAME | LASTNAME | FULLNAME | SORTING_HELPER
:-------- | :------- | :---------- | -------------:
John | Doe | John Doe | 100
James | A | James A | 100
Aa | Test | Aa Test | 0
Ann | Doe | Ann Doe | 0
Bob | Joe | Bob Joe | 50
Brad | Test | Brad Test | 50
null | Smith | - Smith | 0
Ken | null | Ken - | 0
Select FullName from T order by sorting_helper desc, firstname asc, lastname asc
| FULLNAME |
| :---------- |
| James A |
| John Doe |
| Bob Joe |
| Brad Test |
| - Smith |
| Aa Test |
| Ann Doe |
| Ken - |
Or without computed-by column
Select FullName from T order by (CASE WHEN firstname starting with 'J' then 0
WHEN firstname starting with 'B' then 1
ELSE 2
END) asc, firstname asc, lastname asc
| FULLNAME |
| :---------- |
| James A |
| John Doe |
| Bob Joe |
| Brad Test |
| - Smith |
| Aa Test |
| Ann Doe |
| Ken - |
For extra tuning of the positioning of the rows lacking name or surname you can also use NULLS FIRST or NULLS LAST option as described in Firebird docs at https://firebirdsql.org/file/documentation/reference_manuals/user_manuals/html/nullguide-sorts.html
The problem with this approach however, on big enough tables, would be that you won't be able to use indices built over names and surnames for sorting, instead you would have to resort to un-sorted pulling of data (aka NATURAL SORT when reading QUERY PLAN) and then sorting it into temporary files on disk. Which might turn very slow and volume-consuming on large enough data.
You can try to make it better by creating "index by the expression", using your ranking expression there. And hope that FB optimizer will use it (it is quite tricky with verbose expressions like CASE). Frankly you would probably still be left without it (at least I did not manage to make FB 2.1 utilize index-by-case-expression there).
You can "materialize" the ranking expression into a regular SmallInt Not Null column instead of COMPUTED BY one, and use TRIGGER of BEFORE UPDATE OR INSERT type keep that column populated with proper data. Then you can create a regular index over that regular column. While it will add two bytes to each row, that is not that much a grow.
But even then, the index with very few distinct values does not add much value, it will have "low selectivity". Also, index-by-expression can not be compound one (meaning, including other columns past the expression).
So for large data you'd practically better be with using THREE different queries fused together. Add scaffolding, if you did not do already:
create index i58647579_names on T58647579 ( firstname, lastname )
Then you can do triple-select like this:
WITH S1 as (
select FullName from T58647579
where firstname starting with 'J'
order by firstname asc, lastname asc
), S2 as (
select FullName from T58647579
where firstname starting with 'B'
order by firstname asc, lastname asc
), S3 as (
select FullName from T58647579
where (firstname is null)
or ( (firstname not starting with 'J')
and (firstname not starting with 'B')
)
order by firstname asc, lastname asc
)
SELECT * FROM S1
UNION ALL
SELECT * FROM S2
UNION ALL
SELECT * FROM S3
And while you would traverse the table thrice - you would do it by pre-sorted index:
PLAN (S1 T58647579 ORDER I58647579_NAMES INDEX (I58647579_NAMES))
PLAN (S2 T58647579 ORDER I58647579_NAMES INDEX (I58647579_NAMES))
PLAN (S3 T58647579 ORDER I58647579_NAMES)

Postgresql how to concatenate strings but only when values are different?

I want to create a view that is aggregating my column "country". My table looks like this:
project_ref | country
----------------------
1 | Italy
1 | Italy
2 | France
2 | Italy
Currently, I run the following query:
CREATE VIEW a AS
SELECT project_ref,
string_agg(country, ', ') AS country
FROM b GROUP BY project_ref ORDER BY project_num ASC;
and I get the following table as a result:
project_ref | country
----------------------------
1 | Italy, Italy
2 | France, Italy
Is there a way to remove the duplicated values "Italy, Italy" in order to have "Italy" mentioned only once?
I would like to have the following table instead:
project_ref | country
---------------------------
1 | Italy
2 | France, Italy
But I can't find the way to get there... Any ideas?
I'm using PostgreSQL 9.4.5 version.
Thanks a lot in advance!
Just add distinct inside string_agg:
string_agg(distinct country, ', ')
You can use a subquery to remove the duplicate records and create an array with them. If you want to store the country collections as text separated by comma, use the function ARRAY_TO_STRING as follows:
CREATE VIEW a AS
SELECT project_ref,
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT country
FROM b q2
WHERE q1.project_ref = q2.project_ref),',') AS country
FROM b q1
GROUP BY project_ref
And here is your view without the duplicates:
db=# SELECT * FROM a;
project_ref | country
-------------+-----------------
1 | Italy
2 | France,Italy
(2 Zeilen)
An advantage of this approach is that you can run your DISTINCT with more than one column, by means of using DISTINCT ON (colmun1, column2, ...).

How to use COUNT() in more that one column?

Let's say I have this 3 tables
Countries ProvOrStates MajorCities
-----+------------- -----+----------- -----+-------------
Id | CountryName Id | CId | Name Id | POSId | Name
-----+------------- -----+----------- -----+-------------
1 | USA 1 | 1 | NY 1 | 1 | NYC
How do you get something like
---------------------------------------------
CountryName | ProvinceOrState | MajorCities
| (Count) | (Count)
---------------------------------------------
USA | 50 | 200
---------------------------------------------
Canada | 10 | 57
So far, the way I see it:
Run the first SELECT COUNT (GROUP BY Countries.Id) on Countries JOIN ProvOrStates,
store the result in a table variable,
Run the second SELECT COUNT (GROUP BY Countries.Id) on ProvOrStates JOIN MajorCities,
Update the table variable based on the Countries.Id
Join the table variable with Countries table ON Countries.Id = Id of the table variable.
Is there a possibility to run just one query instead of multiple intermediary queries? I don't know if it's even feasible as I've tried with no luck.
Thanks for helping
Use sub query or derived tables and views
Basically If You You Have 3 Tables
select * from [TableOne] as T1
join
(
select T2.Column, T3.Column
from [TableTwo] as T2
join [TableThree] as T3
on T2.CondtionColumn = T3.CondtionColumn
) AS DerivedTable
on T1.DepName = DerivedTable.DepName
And when you are 100% percent sure it's working you can create a view that contains your three tables join and call it when ever you want
PS: in case of any identical column names or when you get this message
"The column 'ColumnName' was specified multiple times for 'Table'. "
You can use alias to solve this problem
This answer comes from #lotzInSpace.
SELECT ct.[CountryName], COUNT(DISTINCT p.[Id]), COUNT(DISTINCT c.[Id])
FROM dbo.[Countries] ct
LEFT JOIN dbo.[Provinces] p
ON ct.[Id] = p.[CountryId]
LEFT JOIN dbo.[Cities] c
ON p.[Id] = c.[ProvinceId]
GROUP BY ct.[CountryName]
It's working. I'm using LEFT JOIN instead of INNER JOIN because, if a country doesn't have provinces, or a province doesn't have cities, then that country or province doesn't display.
Thanks again #lotzInSpace.

Select Query from postgres with count

I have two tables, one contains customers, the other one the bookings.
Now i want to see how many bookings come from one person but display it with their name instead of the id.
SELECT booking.id, COUNT(booking.id) AS idcount
FROM booking
GROUP BY booking.id ORDER BY idcount DESC;
The output is (correct count):
id | idcount
----------+--------
2 | 8
1 | 4
My attempt at getting the name displayed instead of the id was:
SELECT customer.lastn, customer.firstn, COUNT(booking.id) AS idcount
FROM booking, customer
GROUP BY customer.lastn, customer.firstn ORDER BY idcount DESC;
The output (wrong count):
lastn | firstn | idcount
----------+---------+--------
Adam | Michael | 13
Jackson | Leo | 13
13 is the total number of bookings (i just cut the output off) so there's that coming from, however i cant make the transition to get the right count with the name.
You need to use a JOIN in your FROM clause:
SELECT customer.lastn, customer.firstn, COUNT(booking.id) AS idcount
FROM booking
JOIN customer ON booking.id = customer.id
GROUP BY customer.lastn, customer.firstn
ORDER BY idcount DESC;
The JOIN here tells how the booking table relates to your customer table.

Update Count column in Postgresql

I have a single table laid out as such:
id | name | count
1 | John |
2 | Jim |
3 | John |
4 | Tim |
I need to fill out the count column such that the result is the number of times the specific name shows up in the column name.
The result should be:
id | name | count
1 | John | 2
2 | Jim | 1
3 | John | 2
4 | Tim | 1
I can get the count of occurrences of unique names easily using:
SELECT COUNT(name)
FROM table
GROUP BY name
But that doesn't fit into an UPDATE statement due to it returning multiple rows.
I can also get it narrowed down to a single row by doing this:
SELECT COUNT(name)
FROM table
WHERE name = 'John'
GROUP BY name
But that doesn't allow me to fill out the entire column, just the 'John' rows.
you can do that with a common table expression:
with counted as (
select name, count(*) as name_count
from the_table
group by name
)
update the_table
set "count" = c.name_count
from counted c
where c.name = the_table.name;
Another (slower) option would be to use a co-related sub-query:
update the_table
set "count" = (select count(*)
from the_table t2
where t2.name = the_table.name);
But in general it is a bad idea to store values that can easily be calculated on the fly:
select id,
name,
count(*) over (partition by name) as name_count
from the_table;
Another method : Using a derived table
UPDATE tb
SET count = t.count
FROM (
SELECT count(NAME)
,NAME
FROM tb
GROUP BY 2
) t
WHERE t.NAME = tb.NAME