Parsing addresses from varchar in PostgreSQL - postgresql

Could you please advise me what is the best way of parsing address from string? I have available a table of addresses exported in the form of OSM Points (city, street, house number, country code, post code, geometry column, ...), and text parameter entered by user, for example:
'Prague Letna 15'
This string I need to parse (city, street name, street number, ...) and based on these data I want select from the Points table the greatest similarity point. I will be grateful for any advice.
I treid this:
select *
from parse_address('Prague Letna 15')
but result is not good.

Related

QGIS: is there a way to specify labels for only some countries?

In QGIS,I want to display only the names of counties which have data of more than 1 (the data range is 1-18), and not display all country names on a world map.
I'm on a Mac.
I don't think 'priority' will work on its own, and it might need a string?
In the attributes, I have a column with no. projects (so 1-18 for <15 countries), and wondered if this could be part of the string. Something like 'if more than 1, label with country name'.

Atomic values / divisibility to reach 1NF

After reading about normalization I am unsure of how to interpreter the 1 NF requirements
According to wikipedia, something is in first normal form, if the "domain of each attribute contains only atomic indivisible values"
My question is: Who decides what is indivisible or not?
You may divide a date datatype into year, month, day, second, nanoseconds. You may aswell divide an adress into the exact latitude coordinates. When can you really be sure that you have reached 1NF?
Would this table be considered 1NF?
fullName
fullAdresss
Joe Zowesson
87th Victoria Street London EC96 1MB, 14584
Mason Hamburg
47th Jeremy Street London EC26 1MB, 13584
Dedrik Terry
27th Burger Street London EC16 1MB, 17584
My interpretation here is that the value Joe Zowesson is indivisible in regards to the column fullName. And that both zip code, street number and street name is atomic in relation to the column name fullAddress.
I am almost certain that I am in the wrong, but I can not yet understand why.
The question is in regards to an upcoming exam, where I will need to "proove" which normal form something currently is in. Something that I find very hard depending on how you interpreter the word atomic.
You have misunderstood the concept of 1NF basically. By atomic value, it is meant that when you have a column for Name, you should not store any other values alongside it. In other words, the column intended for the Name should not store ID, Address or anything else together with Name, so that when you query the column Name you get only Name, and not name with Id or Address. And Name can be in any form you want whether it be First name + Last name or First name + Last name + Middle name + Previous name.
The decision of whether you need separate columns for the related data should be made during design. Let's suppose you have table Student:
StudentId
FullName
Address
Average grade
1
John Done
New York, US
3.4
2
Robert Bored
New York, US
0
3
Student LName
Dallas, US
1
4
Another LName
Munich, Germany
2
In this case, it means that you do not write queries and don't need data based on First name, Last name separately, but you need all at once for example:
SELECT FullName
FROM Student
WHERE StudentId = 1;
John Done
And when you need First name, Last name separately, you decompose them into several columns, for example:
StudentId
FullName
LastName
Address
Average grade
1
John
Done
New York, US
3.4
2
Robert
Bored
New York, US
0
3
Student
LName
Dallas, US
1
4
Another
LName
Munich, Germany
2
And your queries might look like this:
SELECT LastName, AverageGrade
FROM Student
WHERE AverageGrade >= 1 AND FirstName != 'John';
The result will be:
| LastName | AverageGrade |
---------------------------
| LName | 1 |
| LName | 2 |
Or something like this maybe:
UPDATE Student
SET AverageGrade = 4
WHERE LastName = 'LName' AND FirstName != 'Student'
Basically, the decision depends on how you manipulate the data and in which form you need it.
To sum it up. Whether the relation is in 1NF or not depends on what values you're trying to store on this table, as I mentioned above, one column should store only one type of value, e.g ID, Address, Name, etc. And the decision of how your columns' values will look depends on the design and how you NEED TO STORE the data. If you do not need to query fistname, middlename, lastname, secondname separately, then what you can do is just save all of them in one column FullName and it will still be in 1NF. But if you need them separately, you can store them in separate columns, and again it will still be in 1NF, but it might violate other rules.
Here are some tutorials you might find useful: https://www.studytonight.com/dbms/first-normal-form.php
Let the application, and how it will be used, guide you as to what data should be split further into additional fields (or not).
For example;
If, in your application, you are constantly splitting first name from last name so that you can say "Hi Joe" on correspondence, you should split fullName into two fields. Conversely, If you had two fields firstName and lastName, and were always concatenating them so that you could correctly address an envelope, it would make more sense to have those two fields stored in a single column in your table.
In practice, it is not uncommon for a database to show some de-normalization with the above example given how common both scenarios are but the risk is that they get out of sync if someone updates first name (for example) but doesn't update fullName.
Consider things like how you will force your users to follow a certain pattern if you decide to go with a single column fullName. How would you prevent "Smith, Joe" if your application needed "Joe Smith"?
Dates are another good example and again, whether you split the parts into separate columns depends on how they will be used.
A datetime field which indicates when a row was inserted probably doesn't need to be split out, but if you had many queries which were only interested in the year (for example), it might make sense to split it out.
This only scratches the surface which is why this answer is more about how to think about the underlying problem. Yes normalizing your database is important for all kinds of reasons, but how far you go with it depends on how your data will be used at the end of the day.

Getting address from db by fuzzy match POSTGRESQL

I have got address database with 1 million rows. And user will be add any address text(without specific structure and grammar mistakes acceptable). I must seperate address by sections like region, city, town, village and so on. So I almost have done it with trigram alghoritm. But it's so slow. My question is how can I optimize my request? For now I have got this:
FROM adresses_1
ORDER BY SIMILARITY(CONCAT(region, district, city, town, area, street, building), **address_text**) DESC
LIMIT 1;```
you could run the addresses they enter through an address standardization API (like smartystreets) to validate the address and pick out the address components you want (to store in discreet fields). This will make future retrieval, filtering, proximity searching, etc very accurate. I have used smartystreets on millions of records in the past.
Your expression as written is not indexable. If you build a GiST trigram index on the expression CONCAT(region, district, city, town, area, street, building), then you could use:
ORDER BY CONCAT(region, district, city, town, area, street, building) <-> **address_text** ASC
LIMIT 1
Or if you build the GIN trigram index instead, the ORDER BY wouldn't be directly indexable; but instead you could use the index to efficiently filter out anything "obviously" not close, then sort the remaining ones.
WHERE CONCAT(region, district, city, town, area, street, building) % **address_text**
ORDER BY SIMILARITY(CONCAT(region, district, city, town, area, street, building), **address_text**) DESC
LIMIT 1
Or you could do as Jake proposes, and use software specially written for standardizing addresses.

Dynamically search a column up to a static name and extract the address to a new column in Postgres/GIS

I'm totally new to this and just hacking around, but I'm trying to pull out an address that comes before the city name and put it in a new column. I'm trying to use substring dynamically, but I'm getting lost with the syntax. Any ideas on where to point me?
You probably should be looking at substr with regular expression.
Let's say your table as a column 'full_address', with the street, city, and state info. You add a new column simply called street_address, and you want to extract the street address from each row and place it into the street_address column.
First, consider how to get the street address out of the full_address column. Perhaps this regular expression, where break the address into comma separated <street_address>,<city>,<state>.
(.*),[^,]*,[^,]*
Which simply says:
capture anything -- the street address
comma -- exact match on a comma
anything not a comma -- the city, assuming no commas allowed in city
comma -- exact match on a second comma
anything not a comma -- the state.
(Clearly, you need to figure out a useful regex for your data).
Given a regex, you can view your street addresses as:
SELECT SUBSTRING(full_address FROM '(.*),[^,]*,[^,]*') FROM my_table;
Assuming the regex captures what you want, you can do an update using:
UPDATE my_table SET street_address = SUBSTRING(full_address FROM '(.*),[^,]*,[^,]*');

Switch column data where a column contains a number?

I have a table that have 3 columns
id, company and adress
i found a bug today that saved the adress in the company-column and company in the adress-column SOMETIMES, i have corrected the bug and now im trying to put the data in the right places
every adress has a number in it so my guess is that the easiest way is to switch adress and company columns if there is a number in the company-column (if there should be a number in the real company name this wont matter that much :p).
How should i write this in TSQL?
I'm not sure this is right thing to do here but as I can't think of any other alternative this should do it.
Update dbo.MyTable
Set Company = Address,
Address = Company
Where Company like '%[0-9]%'
You can try this: i put a simple protection to avoid the swap if the company adress already contains a number
insert into COMPANY (NAME, ADDRESS)
VALUES ('2 bld d''Italie' , 'CA') ,
('Take 2' , 'anselmo street 234') ,
('Microsoft' , '1 Microsoft Way Redmond'),
('lake street 14' , 'Norton'),
('lake street 17' , 'trendMicro');
SELECT * FROM COMPANY
UPDATE COMPANY set NAME = ADDRESS, ADDRESS = NAME
WHERE NAME like '%[0-9]%' and (ADDRESS not like '%[0-9]%')
SELECT * FROM COMPANY
You could notice that the take 2 line won't be swapped