I have a table in my database which contains addresses. There are separate fields for street number, unit number, street name, street direction, and province. I want to merge all these columns into one in order to make a single address, and then compare it with the incoming addresses from a CSV data file. If the addresses match, then assign them the same address id as in the database table for the respective addresses.
I am new to talend, and I'm not sure how to tackle this problem.
I don't know Talend.
However, how about creating a view (in the database) which would present the address as you wanted, and then compare view's ADDRESS column to the CSV file? Something like this (mostly blindly guessing as I have no idea what columns you mentioned really represent):
This is a table you currently have:
SQL> select * from address;
STREET_NUMBER UNIT_NUMBER STREET_NAME STREE PROVINC
------------- ----------- ----------- ----- -------
1 2 Main street North Central
2 2 Zara street South West
Create a view; its full_address column format should match the CSV file's value as close as possible. I probably failed to do so, but - as you didn't provide any sample data, I hope that the idea itself will be a good starting point:
SQL> create or replace view v_address as
2 select street_number,
3 unit_number,
4 -- concatenate next 3 columns
5 street_name ||', '|| street_direction ||', '|| province as full_addres
6 from address;
View created.
SQL> select * From v_address;
STREET_NUMBER UNIT_NUMBER FULL_ADDRESS
------------- ----------- ---------------------------
1 2 Main street, North, Central
2 2 Zara street, South, West
SQL>
Finally, you'd compare full_address to CSV file's values.
Related
I'm dealing with a lot of unique data that has the same type of columns, but each group of rows have different attributes about them and I'm trying to see if PostgreSQL has a way of storing metadata about groups of rows in a database or if I would be better off adding custom columns to my current list of columns to track these different attributes. Microsoft Excel for instance has a way you can merge multiple columns into a super-column to group multiple columns into one, but I don't know how this would translate over to a PostgreSQL database. Thoughts anyone?
Right, can't upload files. Hope this turns out well.
Section 1 | Section 2 | Section 3
=================================
Num1|Num2 | Num1|Num2 | Num1|Num2
=================================
132 | 163 | 334 | 1345| 343 | 433
......
......
......
have a "super group" of columns (In SQL in general, not just postgreSQL), the easiest approach is to use multiple tables.
Example:
Person table can have columns of
person_ID, first_name, last_name
employee table can have columns of
person_id, department, manager_person_id, salary
customer table can have columns of
person_id, addr, city, state, zip
That way, you can join them together to do whatever you like..
Example:
select *
from person p
left outer join student s on s.person_id=p.person_id
left outer join employee e on e.person_id=p.person_id
Or any variation, while separating the data into different types and PERHAPS save a little disk space in the process (example if most "people" are "customers", they don't need a bunch of employee data floating around or have nullable columns)
That's how I normally handle this type of situation, but without a practical example, it's hard to say what's best in your scenario.
I have two tables: one containing distinct persons and another table containing place names. Every person is coupled to a place name ID - and the place name ID gives more information about the place (for example the name, longitude and latitude).
The place name table is skewed, there are a lot of semi-duplicates (names written a bit differently e.g. London/Londen). For every place name I now also have the 'real' place name via Google API.
Persons:
ID Name Birthplace
1 John 1
2 Sarah 2
3 Jane 3
4 Tom 4
Place names:
ID PlaceName GooglePlaceName
1 New York City New York, NY, USA
2 Amsterdam Amsterdam, Netherlands
3 Londen London, UK
4 London London, UK
So when looking at this data, Jane and Tom are actually from the same place.
I already have a query which gets the duplicate IDs from the place name table:
SELECT id FROM placenames WHERE googleplacename IN (SELECT googleplacename FROM placenames GROUP BY googleplacename HAVING COUNT (googleplacename) > 1);
This returns
ID
1 3
2 4
Now I'm wondering if it's possible to update the person table, so Jane and Tom both get the same Birthplace ID (doesn't matter if it's 3 or 4) and afterwards remove the duplicate rows from the place name table so either the place name with ID 3 or the place name with ID 4 remains, depending on which one has stayed in the persons table.
If I'm totally going in the wrong direction, by trying to solve this with SQL I'd also like to know. I'm using Java and Spring to access the database.
Since, it doesn't matter which id is used to replace, lets take the first id in a list of duplicates.
i.e.
birthplace
3
4
becomes
birthplace
3
3
to do this first create a table mapping original & replacement id values
your select statement has the original ids, to that you can add the replacement ids using the window function first_value partitioned by googleplacename
Use this mapping table in the from clause of the update persons statement, joining on records where birthplace equals an original_id but not a replacement_id
UPDATE persons
SET birthplace = replacement_id
FROM (
SELECT id original_id, FIRST_VALUE(id) OVER (PARTITION BY googleplacename) replacement_id
FROM placenames
WHERE googleplacename IN (
SELECT googleplacename FROM placenames GROUP BY 1 HAVING COUNT(*) > 1
)
) replacement_table
WHERE birthplace = original_id
AND birthplace != replacement_id
How do I create a permanent table in PGSQL - for example: Say I have the following mapping table called 'Cars_Mapping_Table' (this table currently resides in an excel doc)
{
FULL_NAME ----- ABBREV
ford --------------- fd
chevy ------------- ch
nissan ------------ ni
I want to create this table in the database and be able to make updates to it if a new car brand comes out. Pretend all my tables in the database always use the ABBREV field and I want the easy 'Cars_Mapping_Table' available so that I can always convert the abbreviation to the Full_name.
I feel like this should be fairly simple, but I can't find how to do it. THANKS
This will create the table for you that you can update when you have new data:
drop table if exists Cars_Mapping_Table;
create table Cars_Mapping_Table
(
FULL_NAME varchar(50),
ABBREV varchar(2)
)
distkey(ABBREV)
sortkey(ABBREV);
insert into Cars_Mapping_Table
select 'ford', 'fd' union
select 'chevy','ch' union
select 'nissan','ni'
I´m just starting with views in Postgresql 9.1 and have following question.
Given are the following tables:
pupils
name | age
============
john 15
jack 16
cars
type | owner
============
volvo 1
vw 2
Is it possible to create a view that gives me this as result
ident | column
==============
john pupils
jack pupils
volvo cars
vw cars
My example might look a bit abstract but I´m in the the need to create one view from very different tables which all share one column which I´m interested in but except this have nothing in common.
My poor first step:
CREATE OR REPLACE VIEW test AS
SELECT pupils.name, cars.type AS ident
FROM pupils,cats
thanks,
t book
You don't want a cartesian join between the two tables, you want a UNION
create or replace view test
as
select name as ident,
'pupils' as table_source
from pupils
union all
select type,
'cars'
FROM cars
union all
select cloud_number,
'clouds'
FROM clouds
select tree_name,
'trees'
FROM trees;
You can add any number of tables to this. The only restriction is that the common column must have a "compatible" data type (e.g. all varchar). If e.g. the 5th table has a date column that you want to include you need to add an explicit type case (or use a formatting function).
The column names of the result are determined by the column names of the first select in the union.
Alternatively you could also name them in the create view part
create or replace view (ident, some_column) test
as
select ...
I have column address that looks like this:
address
--------------
Virginia Ave
Baker Ave
Elm Road
.....
I need to separate each record into 2 columns.First column will hold street name and 2nd column is street abbreviation so it will look like this:
StreetName StreetAbbr
----------- -----------
Virginia Ave
Baker Ave
Elm Road
What is the easiest and most efficient(I have huge number of records)
to do this?
Thanks.
Assuming that the string will be StreetName StreetAbbr, the following sql code maybe be you useful, i'm not sure that is the most effectively, but works:
CREATE TABLE TEST(
ADDRESS VARCHAR(100));
INSERT INTO TEST(ADDRESS) VALUES('Virginia Ave');
INSERT INTO TEST(ADDRESS) VALUES('Baker Ave');
INSERT INTO TEST(ADDRESS) VALUES('Elm Road');
SELECT
(string_to_array(ADDRESS, ' '))[1] AS StreetName,
(string_to_array(ADDRESS, ' '))[2] AS StreetAbbr
FROM TEST;
Here is the LINK with the example. You can also do this with regular expressions or string simple functions (please see this link). Anyway before doing all this, you would have in mind the normalization.