How can we achieve the below output using two source tables (PostgreSQL) - postgresql

Can you please help me in getting the below output using 2 different tables?
Table 1: has below data
Table 2: has below data
The output table should be with below data:
Please help me to achieve this output, Sorry if I missed anything
Thanks in Advance!!

You could split the sub-select into 3 separate queries and then use UNION to both "glue" the results set together and de-duplicate the results set where the name1 columns match between the two table e.g. "John".
Something like this :
insert into target(id,name1,name2,name3)
select src1.id,src1.name1,src2.name1,src1.name1
from src1
inner join src2 on src1.id = src2.id
where src1.name1 is not null
and src2.name1 is not null
union
select src1.id,src1.name1,src2.name1,src2.name1
from src1
inner join src2 on src1.id = src2.id
where src1.name1 is not null
and src2.name1 is not null
union
select src1.id,src1.name1,src2.name1,coalesce(src1.name1,src2.name1)
from table1 src1
inner join table2 src2 on src1.id = src2.id
where src1.name1 is null
or src2.name1 is null
order by 1
Handling the null values in a separate select since UNION will not de-duplicate null values

Related

Select multiple non aggregated columns with group by in postgres

I'm making a query with having multiple non aggregated columns with group by clause but Postgres is throwing an error that I have to add non aggregated columns in group by or use any aggregate function on that column this is the query that I'm trying to run.
select
tb1.pipeline as pipeline_id,
tb3.pipeline_name as pipeline_name,
tb2."name" as integration_name,
cast(tb1.integration_id as VARCHAR) as integration_id,
tb1.created_at as created_at,
cast(tb1.id as VARCHAR) as batch_id,
sum(tb1.row_select) as row_select,
sum(tb1.row_insert) as row_insert,
from
table1 tb1
join
table2 tb2 on tb1.integration_id = tb2.id
join
table3 tb3 on tb1.pipeline = tb3.id
where
tb1.pipeline is not null
and tb1.is_super_parent = false
group by
tb1.pipeline
and I found one solution/hack for this error that is I added max function in all other non aggregated columns this solves my problem.
select
tb1.pipeline as pipeline_id,
max(tb3.pipeline_name) as pipeline_name,
max(tb2."name") as integration_name,
max(cast(tb1.integration_id as VARCHAR)) as integration_id,
max(tb1.created_at) as created_at,
max(cast(tb1.id as VARCHAR)) as batch_id,
sum(tb1.row_select) as row_select,
sum(tb1.row_insert) as row_insert,
from
table1 tb1
join
table2 tb2 on tb1.integration_id = tb2.id
join
table3 tb3 on tb1.pipeline = tb3.id
where
tb1.pipeline is not null
and tb1.is_super_parent = false
group by
tb1.pipeline
But I don't want to add max functions when there is no need for that second thing is that applying max to all other column query will be expensive so any other better approach that I can do to solve the above issue, thanks in advance.
Well the first thing you need is to learn to format your queries in so as to get an idea of their flow at a glance. Note due to the extra comma in row_insert, from your query will give a syntax error. With that said; How do you solve your issue?
You cannot avoid the additional aggregates or the expanded group by as long as the exist in the scope same query. You need to separate the aggregation from selection of additional columns. You basically have 2 choices:
Perform the aggregation in a CTE.
with sums (pipeline_id, row_select, row_insert) as
( select tb1.pipeline
, sum(tb1.row_select) as row_select
, sum(tb1.row_insert) as row_insert
table1 tb1
where tb1.pipeline is not null
and tb1.is_super_parent = false
group by tb1.pipeline
)
select s.pipeline_id
, tbl3.pipeline_name
, tb2."name" integration_name
, s.row_select
, s.row_insert
from sums s
join table2 tbl2 on (s.pipeline_id = tb2.id)
join table3 tbl3 on (s.pipeline_id = tb3.id);
Perform the aggregation in a sub-query.
select s.pipeline_id
, tbl3.pipeline_name
, tb2."name" integration_name
, s.row_select
, s.row_insert
from ( select tb1.pipeline
, sum(tb1.row_select) as row_select
, sum(tb1.row_insert) as row_insert
table1 tb1
where tb1.pipeline is not null
and tb1.is_super_parent = false
group by tb1.pipeline
) s
join table2 tbl2 on (s.pipeline_id = tb2.id)
join table3 tbl3 on (s.pipeline_id = tb3.id);
NOTE: Not tested as no sample data supplied.

Apply join, sort on date column and select the first row where one of the column value is not null

I have two tables(Table A and Table B) in a Postgres DB.
Both have "id" column in common. Table A has one column called "id" and Table B has three columns: "id, date, value($)".
For each "id" of Table A there exists multiple rows in Table B in the following format - (id, date, value).
For instance, for Table A with "id" as 1 if there exists following rows in Table B:
(1, 2018-06-21, null)
(1, 2018-06-20, null)
(1, 2018-06-19, 202)
(1, 2018-06-18, 200)
I would like to extract the most recent dated non-null value. For example for id - 1, the result should be 202. Please share your thoughts or let me know in case more info is required.
Here is the solution I went ahead with:
with mapping as ( select distinct table1.id, table2.value, table2.date, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
left join table2 on table2.id=table1.id and table2.value is not null
)
select * from mapping where row_number = 1
Let me know if there is scope for improvement.
You may very well want an inner join, not an outer join. If you have an id in table1 that does not exist in table2 or that has only null values you will get NULL for both date and value. This is due to the how outer join works. What it says is if nothing in the right side table matches the ON condition then return NULL for each column in that table. So
with mapping as
(select distinct table1.id
, table2.value
, table2.date
, row_number() over (partition by table1.id order by table2.date desc nulls last) as row_number
from table1
join table2 on table2.id=table1.id and table2.value is not null
)
select *
from mapping
where row_number = 1;
See example of each here. Your query worked because all your test data satisfied the 1st condition of the ON condition. You really need test data that fails to see what your query does.
Caution: DATE and VALUE are very poor choice for a column names. Both are SQL standard reserved words, although not Postgres specifically. Further DATE is a Postgres data type. Having columns with names the same as datatype leads to confusion.

How to update a table column based on the condition of two other table columns using st_contains()

I'm attempting to update a table column based on the st_contains() results using two other tables. The code I've written below returns too many results. What do I need to change to make this work?
UPDATE "PRIMARY_USE_DESC"."CAD_Primary_Desc" SET "Parcel_Desc" = 'PARK'
WHERE (
SELECT geom_poly
FROM "Buda_Parks" t1
LEFT JOIN (
SELECT geom_point
FROM "HCAD_POINTS"
) t2 ON ST_Contains(t1.geom_poly, t2.geom_point)
) IS NOT NULL
Ok, I figured it out. Posting it here in case someone else has this question.
UPDATE "PRIMARY_USE_DESC"."CAD_Primary_Desc"
Set "Parcel_Desc" = 'PARK'
FROM
(select geom_poly from "Buda_Parks" t1
left join (SELECT geom_point FROM "HCAD_POINTS") t2 on ST_Contains(geom_poly,geom_point)) t3
WHERE ST_Contains(geom_poly, geom_point)

HiveQL - How to tackle with elements not appearing in dictionary

The thing is: I've got this lookup table which I use as a dictionary to create a new column that 'translates' the meaning of a certain column of codes.
Let's say:
Table1:
ID Code
01 A
02 B
03 C
Lookup_table (dictionary):
Code Meaning
A Alice
B Bob
C Charlie
I can easily make a JOIN to create a new table (Table2) with the new column 'Meaning' added to it:
Table2:
CREATE TABLE Table2 AS SELECT T1.ID, T1.Code, LKP.Meaning
FROM Table1 AS T1
LEFT OUTER JOIN Lookup_table AS LKP
ON (T1.Code = LKP.Code);
But: What to do if a new Code appears in Table1 (e.g. ("04", "D") ) and there is no translation for it in Lookup_table? (given you want to avoid a NULL as an answer) Is there a way to obtain something like 'others' in Meaning to answer to this situation?
Thanks!
You could use COALESCE() in order to achieve that. COALESCE() takes two arguments, while selecting the first argument that is not NULL.
You can modify your query as follows:
CREATE TABLE Table2 AS
SELECT
T1.ID AS ID,
T1.Code AS Code,
COALESCE(LKP.Meaning,'others') AS Meaning
FROM Table1 AS T1
LEFT OUTER JOIN Lookup_table AS LKP
ON (T1.Code = LKP.Code);
In your case this would mean to put LKP.Meaning as first parameter. If this value is NULL, it will use 'others' as displayed.
See also the Hive Documentation.

compare a table with sys columns

It seems an easy question but I am comparing a table contain random tables and columns with syscolumns
this query will give me the missing columns
select object_name(syscolumns.id) , syscolumns.name
from syscolumns
where not exists
(select 1
from CHECKINGTABLE
where object_name(CHECKINGTABLE.id) = object_name(syscolumns.id)
and CHECKINGTABLE.name=syscolumns.name)
and object_name(syscolumns.id) ='TAB1'
but in such way it will give me wrong results
select object_name(syscolumns.id) , syscolumns.name
from syscolumns , CHECKINGTABLE
where not exists
(select 1
from CHECKINGTABLE
where object_name(CHECKINGTABLE.id) = object_name(syscolumns.id)
and CHECKINGTABLE.name=syscolumns.name)
and object_name(syscolumns.id) =object_name(CHECKINGTABLE.id)
what I am doing wrong ? I want a query to compare a table I own with syscolumns to identify the missing data in my table
You could try the following. I do not have an ASE DB at hand but the syntax should be correct I hope.
SELECT syscolumns.id , syscolumns.name
FROM syscolumns LEFT OUTER JOIN CHECKINGTABLE
ON (CHECKINGTABLE.id = syscolumns.id
AND CHECKINGTABLE.name=syscolumns.name)
WHERE CHECKINGTABLE.name IS NULL