postgresql calling column with same name - postgresql

I have two tables, where they have the same ID name (I cannot change the way the tables are designed) and I'm trying to query table2's ID, how would I do this when they are joined?
create table table1(
id integer, -- PG: serial
description MediumString not null,
primary key (id)
);
create table table2 (
id integer, -- PG: serial
tid references table1(id),
primary key (id)
);
So basically when they're joined, two columns will have the same name "id" if I do the following query
select * from table1
join table2 on table1.id = table2.tid;

Alias the columns if you want both "id"s
SELECT table1.id AS id1, table2.id AS id2
FROM table1...

If you want to query all * on both tables but still be able to reference a specific id you can do that too, you will end up with duplicate id columns that you probably won't use, but in some situations if you really need all the data, it's worth it.
select table1.*, table2.*, table1.id as 'table1.id', table2.id as 'table2.id'
from ...

You cannot select it using select *.
try this :
select table1.id, table1.description, table2.id, table2.tid
from table1
inner join table2
on table1.id = table2.tid

Related

PySpark. How do I make sure the daily incremental data has NO duplicated UUID as PK in HIVE

I created a table in hive with UUID as Primary Key, for example
create table if not exists mydb.mytable as SELECT uuid() as uni_id, c.name, g.city, g.country
FROM client c
INNER JOIN geo g ON c.geo_id = g.id
Every day, I need to insert data to mytable, How do I make sure the daily incremental data has NO duplicated UUID as PK?
If by UUID, what you're looking for is a series Universally Unique Identifier, then I think you can use an auto-increment id. In pure HQL, it can be achieved by row_numer and cross join.
insert overwrite table dest_tbl
select
a.rn + b.mid as id, col1, col2,...
from (
select
*, row_number() over(order by rand()) as rn
from src_tbl
) a
join (select max(id) as mid from dst_tbl) b

Postgres join involving tables having join condition defined on an text array

I have two tables in postgresql
One table is of the form
Create table table1(
ID serial PRIMARY KEY,
Type []Text
)
Create table table2(
type text,
sellerID int
)
Now i want to get all the rows from table1 which are having type same that in table2 but the problem is that in table1 the type is an array.
In case the type in the table has an identifiable delimiter like ',' ,';' etc. you can rewrite the query as regexp_split_to_table(type,',') or versions later than 9.5 unnest function can be use too.
For eg.,
select * from
( select id ,regexp_split_to_table(type,',') from table1)table1
inner join
select * from table2
on trim(table1.type) = trim(table2.type)
Another good example can be found - https://www.dbrnd.com/2017/03/postgresql-regexp_split_to_array-to-split-string-using-different-delimiters/
SELECT
a[1] AS DiskInfo
,a[2] AS DiskNumber
,a[3] AS MessageKeyword
FROM (
SELECT regexp_split_to_array('Postgres Disk information , disk 2 , failed', ',')
) AS dt(a)
You can use the ANY operator in the JOIN condition:
select *
from table1 t1
join table2 t2 on t2.type = any (t1.type);
Note that if the types in the table1 match multiple rows in table2, you would get duplicates (from table1) because that's how a join works. Maybe you want an EXISTS condition instead:
select *
from table1 t1
where exists (select *
from table2 t2
where t2.type = any(t1.type));

PostgreSQL count other values of ID that have the same value of other column

Let's say we have the following table that stores id of an observation and its address_id. You can create the table with the following code:
drop table if exists schema.pl_address_cnt;
create table schema.pl_address_cnt (
id serial,
address_id int);
insert into schema.pl_address_cnt(address_id) values
(100), (101), (100), (101), (100), (125), (128), (200), (200), (100);
My task is to count for each id how many other ids (thus -1) have the same address_id. I've come up with a solution that turns out to be quite expensive (explain) on the original dataset. I wonder whether my solution can be somehow optimised.
with tmp_table as (select address_id
, count(distinct id) as id_count
from schema.pl_address_cnt
group by address_id
)
select id
, id_count - 1
from schema.pl_address_cnt as pac
left join tmp_table as tt on tt.address_id=pac.address_id;
You can try to omit the CTE and do a self left join on common address but different ID and then aggregate this.
SELECT pac1.id,
count(pac2.id)
FROM pl_address_cnt pac1
LEFT JOIN pl_address_cnt pac2
ON pac1.address_id = pac2.address_id
AND pac1.id <> pac2.id
GROUP BY pac1.id
ORDER BY pac1.id;
For performance you can try indexes on (address_id, id) and (id).

Conditionally insert from one table into another

The same name may appear in multiple rows of table1. I would like to enumerate all names in sequential order 1, 2, ... One way to do so is to
create new table with name as primary key and id as serial type.
Select name from table1 and insert it into table2 only when it doesn't exist
table1 (name vchar(50), ...)
table2 (name vchar(50) primary key, id serial)
insert into table2(name)
select name
from table1 limit 9
where not exists (select name from table2 where name = table1.name)
This doesn't work. How to fix it?
Just select distinct values:
insert into table2(name)
select distinct name
from table1
order by name;

Join two tables with count from first table

I know there is an obvious answer to this question, but I'm like a noob trying to remember how to write queries. I have the following table structure in Postgresql:
CREATE TABLE public.table1 (
accountid BIGINT NOT NULL,
rpt_start DATE NOT NULL,
rpt_end DATE NOT NULL,
CONSTRAINT table1_pkey PRIMARY KEY(accountid, rpt_start, rpt_end)
)
WITH (oids = false);
CREATE TABLE public.table2 (
customer_id BIGINT NOT NULL,
read VARCHAR(255),
CONSTRAINT table2 PRIMARY KEY(customer_id)
)
WITH (oids = false);
The objective of the query is to display a result set of accountid's, count of accountid's in table1 and read from table2. The join is on table1.accountid = table2.customer_id.
The result set should appear as follows:
accountid count read
1234 2 100
1235 9 110
1236 1 91
The count column reflect the number of rows in table1 for each accountid. The read column is a value from table2 associated with the same accountid.
select accountid, "count", read
from
(
select accountid, count(*) "count"
from table1
group by accountid
) t1
inner join
table2 t2 on t1.accountid = t2.customer_id
order by accountid
SELECT table2.customer_id, COUNT(*), table2.read
FROM table2
LEFT JOIN table1 ON (table2.customer_id = table1.accountid)
GROUP BY table2.customer_id, table2.read
SELECT t2.customer_id, t2.read, COUNT(*) AS the_count
FROM table2 t2
JOIN table1 t1 ON t1.accountid = t2.customer_id
GROUP BY t2.customer_id, t2.read
;