How to Use Count as a Criteria in PostgreSQL - postgresql

I have an existing table1 which contains "account", "tax_year" and other fields. I want to create a table2 with records from table1 when the frequency of CONCAT(account, tax_year) is 1 and meet the WHERE clause.
For instance, if table1 looks like below:
account year
aaa 2014
bbb 2016
bbb 2016
ddd 2014
ddd 2014
ddd 2015
Table2 should be:
account year
aaa 2014
ddd 2015
Here is my script:
DROP TABLE IF EXISTS table1;
CREATE table2 AS
SELECT
account::text,
tax_year::text,
building_number,
imprv_type,
building_style_code,
quality,
quality_description,
date_erected,
yr_remodel,
actual_area,
heat_area,
gross_area,
CONCAT(account, tax_year) AS unq
FROM table1
WHERE imprv_type=1001 and date_erected>0 and date_erected IS NOT NULL and quality IS NOT NULL and quality_description IS NOT NULL and yr_remodel>0 and yr_remodel IS NOT NULL and heat_area>0 and heat_area IS NOT NULL
GROUP BY account,
tax_year,
building_number,
imprv_type,
building_style_code,
quality,
quality_description,
date_erected,
yr_remodel,
actual_area,
heat_area,
gross_area,
unq
HAVING COUNT(unq)=1;
I've spent two days on it but still can't figure out how to make it right. Thank you ahead for your help!

The proper way to use count of pairs (account, tax_year) in table1:
select account, tax_year
from table1
where imprv_type=1001 -- and many more...
group by account, tax_year
having count(*) = 1;
so you should try:
create table table2 as
select *
from table1
where (account, tax_year) in (
select account, tax_year
from table1
where imprv_type=1001 -- and many more...
group by account, tax_year
having count(*) = 1
);

COUNT() = 1 is equivalent to NOT EXISTS(another with the same key fields):
SELECT
account, tax_year
-- ... maybe more fields ...
FROM table1 t1
WHERE NOT EXISTS ( SELECT *
FROM table1 nx
WHERE nx.account = t1.account -- same key field(s)
AND nx.tax_year = t1.tax_year
AND nx.ctid <> t1.ctid -- but a different row!
);
Note: I replaced the COUNT(CONCAT(account, tax_year) concatenation of key fields by a composite match key.

Related

Conditional JOIN with two different keys

I have a query that produces two separate IDs:
SELECT
date,
user_id,
vendor_id,
SUM(purchase) user_purchase
SUM(spend) vendor_spend
GROUP BY 1,2,3
FROM tabla.abc
This produces results like this:
date user_id vendor_id user_purchase vendor_spend
1/1/18 123 NULL 5.00 0.00
1/1/18 NULL 456 0.00 10.00
I want to join it on a table that looks like this:
client_id user_id vendor_id
456789 123 NULL
101112 NULL 456
But the problem is, I obviously want to join it on both the appropiate IDs so my final output can look like this:
date client_id user_id vendor_id user_purchase vendor_spend
1/1/18 456790 123 NULL 5.00 0.00
1/1/18 101112 NULL 456 0.00 10.00
So is there a way I can do like, a conditional join? Something like WHERE user_id IS NULL THEN... etc...
Use not distinct from because one of the argument may be null:
select *
from (
select
date,
user_id,
vendor_id,
sum(purchase) user_purchase,
sum(spend) vendor_spend
from table1
group by 1,2,3
) t1
join table2 t2
on (t1.user_id, t1.vendor_id)
is not distinct from (t2.user_id, t2.vendor_id)
Note that for performance reasons you should join already aggregated table (hence I have placed the original query in a derived table).
Try this:
SELECT
date,
COALESCE(lu.client_id, lv.client_id) AS client_id,
user_id,
vendor_id,
SUM(purchase) user_purchase
SUM(spend) vendor_spend
FROM tabla.abc
LEFT JOIN tabla.link AS lu USING (user_id)
LEFT JOIN tabla.link AS lv USING (vendor_id)
GROUP BY 1,2,3,4
I think the sufficient join is just this:
FROM aggregated_table t1
LEFT JOIN client_id_table t2
ON t1.user_id=t2.user_id
OR t1.vendor_id=t2.vendor_id
because as I understand you need to join by user id if there is user id and by vendor id if there is vendor id. Using a left join with OR does exactly that.
Also, conditional join is possible as well. If you're familiar with a CASE statement it works perfectly well in join conditions. Similar thing can be expressed as:
FROM aggregated_table t1
LEFT JOIN client_id_table t2
ON CASE
WHEN t1.user_id is not null THEN t1.user_id=t2.user_id
WHEN t1.vendor_id is not null THEN t1.vendor_id=t2.vendor_id
END
but this is too verbose compared to the previous option that I think should produce the same result

Return the most recent value when joining two tables

I am trying to join two tables and return the most recent value for a field.
Currently, if aa.time_day does not equal bb.time, then the bb.time field returns null. I would like this to return the most recent value less than or equal to the aa.time_date value.
My query currently looks like this:
Select
aa.day_time
aa.name
aa.value
bb.name
bb.target_value
bb.time
FROM
x.table1 aa LEFT JOIN y.table2 bb
ON aa.name = bb.name AND aa.day_time=bb.time
WHERE aa.day_time = TO_DATE(‘01/01/2017’,’DD/MM/YYYY’)
Searching Stackoverflow and other websites showed me a number of solutions, unfortunately nothing worked. The query below is the closest I got to success as it did not throw up an error message, however it ran for several hours and I had to stop it. The query above worked in about 5 seconds.
Select
aa.day_time
aa.name
aa.value
bb.name
bb.target_value
bb.time
FROM
x.table1 aa LEFT JOIN y.table2 bb
ON aa.name = bb.name AND aa.day_time=
(SELECT MAX (bb.time)
FROM y.table2
WHERE bb.time <= aa.day_time)
WHERE a.day_time = TO_DATE(‘01/01/2017’,’DD/MM/YYYY’)
I'm not familiar with SQL, so thank you very much for your help in advance.
If this is oracle and there is a one to many relationship t1 to t2 then a using a cte to find the most recent date from t2 might do it
DROP TABLE T1;
DROP TABLE T2;
CREATE TABLE T1(DAY_TIME DATE,NAME VARCHAR(3), VALUE NUMBER);
CREATE TABLE T2(DAY_TIME DATE,NAME VARCHAR(3), VALUE NUMBER);
TRUNCATE TABLE T1;
INSERT INTO T1(day_time,NAME,VALUE) VALUES (to_date('2018-01-01','YYYY-MM-DD'),'aaa',10);
TRUNCATE TABLE T2;
INSERT INTO T2(day_time,NAME,VALUE) VALUES (to_date('2017-01-01','YYYY-MM-DD'),'aaa',10);
INSERT INTO T2(day_time,NAME,VALUE) VALUES (to_date('2018-02-01','YYYY-MM-DD'),'aaa',10);
SELECT * FROM T1;
SELECT * FROM T2;
WITH cte AS
(
select name,day_time,value
from T2
where T2.day_time = (select MAX(t3.DAY_TIME) FROM T2 t3 WHERE t3.DAY_TIME <= TO_DATE('2018-01-01','YYYY-MM-DD') and t3.name = t2.name)
)
SELECT t1.name,t1.day_time,t1.value,
cte.name,cte.day_time,cte.value
from t1
left join cte on t1.name = cte.name
where t1.day_time = to_date('2018-01-01','YYYY-MM-DD');
NAME DAY_TIME VALUE NAME DAY_TIME VALUE
---- ---------------------- ---------- ---- ---------------------- ----------
aaa 01-JAN-2018 00:00:00 10 aaa 01-JAN-2017 00:00:00 10
If there are no entries at all in t2 then the t2 side of the select will by empty.

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Compare 2 tables fields and if they match, copy primary key over to form relation POSTGRESQL

This shows some sample data that I might have (real data is much larger):
table1:
date forename surname PK
1998 john harry
1928 fred kale
table2:
date forename surname PK
1998 john harry 2
1928 fred kale 98
I need to compare table2 with table1 and if they match then I need to add the same PK from table2 into table1 to form a relation.
EDIT: I would like to add that in table1, the "people" can appear twice but only once in table2.
PostgreSQL:
UPDATE table1
SET FK = table2.PK
FROM table2
WHERE
table1.date = table2.date
AND table1.forename = table2.forename
AND table1.surname = table2.surname
SQL Server
UPDATE t1
SET FK = t2.PK
FROM
table1 t1 INNER JOIN table2 t2
ON t1.date = t2.date
AND t1.forename = t2.forename
AND t1.surname = t2.surname

Missing Weeks View

I have 2 tables which I need to compare to find missing data.
TableA: Definition table
Year, Week, cmp_code, [other columns]
TableB: Cash Receipts
Year, WeekNo, FranchiseID
TableA has all the possible combinations of ID week and year we should have data for. TableB is the data we actually have. I need to list out what we don't have yet, so the delta for B-A. How do I construct the query to find these missing values?
You can use NOT EXISTS
SELECT [Year], [Week], ID
FROM TableA AS a
WHERE NOT EXISTS
( SELECT 1
FROM TableB AS b
WHERE b.[Year] = a.[Year]
AND b.[Week] = a.[Week]
AND b.ID = a.ID
);
You can use theexceptset operator to return the difference between two sets:
SELECT [Year], [Week], cmp_code FROM TableA
EXCEPT
SELECT [Year], [WeekNo], FranchiseID FROM TableB
This will return the rows in TableA that doesn't have exact matches in TableB. The same result can be achieved using a correlatednot existsquery, or aleft join. Thenot existsshould perform best.