Sorry about the lame Title... If I could summarize this in a few words I might have had better luck finding an existing solution here!
I have a table that simplified looks like this:
ID PRODUCT
___ _________
100 Savings
200 Mortgage
200 Visa
300 Mortgage
300 Savings
I need to select rows based on the product of each ID. For example, I can do this:
SELECT DISTINCT ID
FROM table1
WHERE Product NOT IN ('Savings', 'Chequing')
This would return:
ID
___
200
300
However, in the case of ID 300 they do have Savings so I actually do not want this returned. In plain English I want to
Select * from table1 where 'Savings' and 'Chequing' are not the product for any row with that ID.
Desired result in this case would be one row with ID 200 since they do not have Savings or Chequing.
How can I do this?
Select the rows that match the item you do not want to match then compare therr ids
e.g.
select distinct id from table1 where id not in (
SELECT ID
FROM table1
WHERE Product IN ('Savings', 'Chequing')
)
You can use NOT EXISTS:
SELECT DISTINCT t1.ID
FROM dbo.Table1 t1
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.Table1 t2
WHERE t2.Poduct IN ('Savings', 'Chequing')
AND t2.ID = t1.ID
)
Demo
Worth reading: Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
Related
I have query such as
select * from batteries as b ORDER BY inserted_at desc
which gives me data such as
and I have an query such as
select voltage, datetime, battery_id from battery_readings ORDER BY inserted_at desc limit 1
which returns data as
I want to combine both 2 above queries, so in one go, I can have each battery details as well as its last added voltage and datetime from battery_readings.
Postgres has a very useful syntax for this, called DISTINCT ON. This is different from plain DISTINCT in that it keeps only the first row of each set, defined by the sort order. In your case, it would be something like this:
SELECT DISTINCT ON (b.id)
b.id,
b.name,
b.source_url,
b.active,
b.user_id,
b.inserted_at,
b.updated_at,
v.voltage,
v.datetime
FROM battery b
JOIN battery_voltage v ON (b.id = v.battery_id)
ORDER BY b.id, v.datetime desc;
I think that widowing will make what you expected.
Assuming two tables
create table battery (id int, name text);
create table bat_volt(measure_time int, battery_id int, val int);
One of the possible queries is like this:
with latest as (select battery_id, max(measure_time) over (partition by battery_id) from bat_volt)
select * from battery b join bat_volt bv on bv.battery_id=b.id where (b.id,bv.measure_time) in (select * from latest);
If you have Postgres version which supports lateral, it might also make sense to try it out (in case there are way more values than batteries, it could have better performance).
select * from battery b
join bat_volt bv on bv.battery_id=b.id
join lateral
(select battery_id, max(measure_time) over (partition by battery_id) from bat_volt bbv
where bbv.battery_id = b.id limit 1) lbb on (lbb.max = bv.measure_time AND lbb.battery_id = b.id);
Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;
I have 2 tables which I need to compare to find missing data.
TableA: Definition table
Year, Week, cmp_code, [other columns]
TableB: Cash Receipts
Year, WeekNo, FranchiseID
TableA has all the possible combinations of ID week and year we should have data for. TableB is the data we actually have. I need to list out what we don't have yet, so the delta for B-A. How do I construct the query to find these missing values?
You can use NOT EXISTS
SELECT [Year], [Week], ID
FROM TableA AS a
WHERE NOT EXISTS
( SELECT 1
FROM TableB AS b
WHERE b.[Year] = a.[Year]
AND b.[Week] = a.[Week]
AND b.ID = a.ID
);
You can use theexceptset operator to return the difference between two sets:
SELECT [Year], [Week], cmp_code FROM TableA
EXCEPT
SELECT [Year], [WeekNo], FranchiseID FROM TableB
This will return the rows in TableA that doesn't have exact matches in TableB. The same result can be achieved using a correlatednot existsquery, or aleft join. Thenot existsshould perform best.
The query below returns 9,817 records. Now, I want to SELECT one more field from another table. See the 2 lines that are commented out, where I've simply selected this additional field and added a JOIN statement to bind this new columns. With these lines added, the query now returns 649,200 records and I can't figure out why! I guess something is wrong with my WHERE criteria in conjunction with the JOIN statement. Please help, thanks.
SELECT DISTINCT dbo.IMPORT_DOCUMENTS.ITEMID, BEGDOC, BATCHID
--, dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.CATEGORY_ID
FROM IMPORT_DOCUMENTS
--JOIN dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS ON
dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID = dbo.IMPORT_DOCUMENTS.ITEMID
WHERE (BATCHID LIKE 'IC0%' OR BATCHID LIKE 'LP0%')
AND dbo.IMPORT_DOCUMENTS.ITEMID IN
(SELECT dbo.CATEGORY_COLLECTION_CATEGORY_RESULTS.ITEMID FROM
CATEGORY_COLLECTION_CATEGORY_RESULTS
WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN(
SELECT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16))
AND Sample_Id > 0)
AND dbo.IMPORT_DOCUMENTS.ITEMID NOT IN
(SELECT ASSIGNMENT_FOLDER_DOCUMENTS.Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
One possible reason is because one of your tables contains data at lower level, lower than your join key. For example, there may be multiple records per item id. The same item id is repeated X number of times. I would fix the query like the below. Without data knowledge, Try running the below modified query.... If output is not what you're looking for, convert it into SELECT Within a Select...
Hope this helps....
Try this SQL: SELECT DISTINCT a.ITEMID, a.BEGDOC, a.BATCHID, b.CATEGORY_ID FROM IMPORT_DOCUMENTS a JOIN (SELECT DISTINCT ITEMID FROM CATEGORY_COLLECTION_CATEGORY_RESULTS WHERE SCORE >= .7 AND SCORE <= .75 AND CATEGORY_ID IN (SELECT DISTINCT CATEGORY_ID FROM CATEGORY_COLLECTION_CATS WHERE COLLECTION_ID IN (11,16)) AND Sample_Id > 0) B ON a.ITEMID =b.ITEMID WHERE a.(a.BATCHID LIKE 'IC0%' OR a.BATCHID LIKE 'LP0%') AND a.ITEMID NOT IN (SELECT DIDTINCT Item_Id FROM ASSIGNMENT_FOLDER_DOCUMENTS)
I want to select everything from table one, which contains a column JID. These are available things the player can learn. But there is a table2 which has a list of things the player has already learned. So if JID is in table2, it has been learned, and I do not want that selected from table 1.
For exampel.
table 1
JID Title Description Rank
table 2
JID UserID value1 value2
table 1 might have 100 rows, but if table 2 has 9 rows with some of the JID's from table 1, I dont want them selected. What is more, it is specific to user ID in table 2. So i need to filter table2 by JID != match JID in table1, but ONLY IF userID = a php variable passed.
Hope this makese sense. I DO NOT want a subquery. I think it is possiblet o use a left outer join on JID, but I am not sure how to invole the USERID... HELP!
ps the JID can be in table1 and table2 if the UID does not match... if it matches, then the JID cannot be found in table2 to be selected.
with not exists:
SELECT t1.*
FROM table1 t1
WHERE NOT EXISTS (SELECT NULL
FROM table2 t2
WHERE t2.UID = 'php var'
AND t1.JID = t2.JID)
with left join:
SELECT t1.*
FROM table1 t1
LEFT JOIN table2 t2 on t1.JID = t2.JID
AND t2.UID = 'php var'
WHERE t2.JID IS NULL