Selecting other columns not in count, group by - postgresql

So I have a table as follows
product_id sender_id timestamp ...other columns...
1 2 1222
1 2 3423
1 2 1231
2 2 890
3 4 234
2 3 234234
I want to get rows where sender_id = 2, but I want to count and group by product_id and sort by timestamp descending. This means I need the following result
product_id sender_id timestamp count ...other columns...
1 2 3423 3
2 2 890 1
I tried the following query:
SELECT product_id, sender_id, timestamp, count(product_id), ...other columns...
FROM table
WHERE sender_id = 2
GROUP BY product_id
But I get the following error Error in query: ERROR: column "table.sender_id" must appear in the GROUP BY clause or be used in an aggregate function
Seems like I cannot SELECT columns that are not in the GROUP BY. Another method which I found online was to join
SELECT product_id, sender_id, timestamp, count, ...other columns...
FROM table
JOIN (
SELECT product_id, COUNT(product_id) AS count
FROM table
GROUP BY (product_id)
) table1 ON table.product_id = table1.product_id
WHERE sender_id = 2
GROUP BY product_id
But doing this simply lists all rows without grouping or counting. My guess is that the ON part simply extends table again.

Try grouping using product_id, sender_id
select product_id, sender_id, count(product_id), max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
order by maxtm desc
If you want other columns too:
select t.*, t1.product_count
from t
inner join (
select product_id, sender_id, count(product_id) product_count, max(timestamp) maxtm
from t
where sender_id = 2
group by product_id, sender_id
) t1
on t.product_id = t1.product_id and t.sender_id = t1.sender_id and t.timestamp = t1.maxtm
order by t1.maxtm desc

Just do a workout with your data:
CREATE TABLE products (product_id INTEGER,
sender_id INTEGER,
time_stamp INTEGER)
INSERT INTO products VALUES
(1,2,1222),
(1,2,3423),
(1,2,1231),
(2,2,890),
(3,4,234),
(2,3,234234)
SELECT product_id,sender_id,string_agg(time_stamp::text,','),count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id
Here you have distinct time_stamp ,so you need to apply some aggregate or just remove that column in select statement.
If you remove time_stamp in select statement then it would be very easy like below :
SELECT product_id,sender_id,count(product_id)
FROM products
WHERE sender_id=2
GROUP BY product_id,sender_id

Related

PostgreSQL command : using the result obtained from first Query and using it In second Query : write as single query

SELECT partner_id
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
GROUP BY ts.partner_id
From the query we can get the partners id.Using that partner id we want check in trip delicery sales lines table and want to find each customer last two sale product quantity sum. If last two sale have product qty as 2 & 5 want result as partner_id | count as Mn2333 - 7
here fore example i take partner id as 34806. But i want to check all partner_id obtained from last query
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id='34806'
AND td.route_id='152'
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
You can run this query
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,
(SELECT partner_id FROM trip_delivery_sales ts WHERE ts.route_id='152') as ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
Or this one
with ts as (SELECT distinct partner_id FROM trip_delivery_sales WHERE route_id='152')
SELECT td.partner_id,sum(product_qty)
FROM trip_delivery_sales_lines td,ts
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
GROUP BY td.partner_id
ORDER BY td.order_date DESC
LIMIT 2
You might be looking for
SELECT DISTINCT ts.partner_id, ARRAY(
SELECT product_qty
FROM trip_delivery_sales_lines td
WHERE td.partner_id=ts.partner_id
AND td.product_id='432'
ORDER BY td.order_date DESC
LIMIT 2
) AS product_qty_arr
FROM trip_delivery_sales ts
WHERE ts.route_id='152'
or just
SELECT
partner_id,
array_agg(product_qty ORDER BY order_date DESC) as product_qty_arr
FROM (
SELECT
td.partner_id,
td.product_qty,
td.order_date,
row_number() OVER (PARTITION BY td.partner_id ORDER BY td.order_date DESC)
FROM trip_delivery_sales_lines td
JOIN trip_delivery_sales ts USING (partner_id)
WHERE ts.route_id='152'
AND td.product_id='432'
) AS enumerated
WHERE row_number <= 2
GROUP BY partner_id
See also PostgreSQL: top n entries per item in same table or Optimize GROUP BY query to retrieve latest row per user

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

Using Derived Tables and CTEs to Display Details?

I am teaching myself T-SQL and am struggling to comprehend the following example..
Suppose you want to display several nonaggregated columns along with
some aggregate expressions that apply to the entire result set or to a
larger grouping level. For example, you may need to display several
columns from the Sales.SalesOrderHeader table and calculate the
percent of the TotalDue for each sale compared to the TotalDue for all
the customer’s sales. If you group by CustomerID, you can’t include
other nonaggregated columns from Sales.SalesOrderHeader unless you
group by those columns. To get around this, you can use a derived
table or a CTE.
Here are two examples given...
SELECT c.CustomerID, SalesOrderID, TotalDue, AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID) AS c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
WITH c AS
(SELECT CustomerID, SUM(TotalDue) AS SumOfTotalDue,
AVG(TotalDue) AS AvgOfTotalDue
FROM Sales.SalesOrderHeader
GROUP BY CustomerID)
SELECT c.CustomerID, SalesOrderID, TotalDue,AvgOfTotalDue,
TotalDue/SumOfTotalDue * 100 AS SalePercent
FROM Sales.SalesOrderHeader AS soh
INNER JOIN c ON soh.CustomerID = c.CustomerID
ORDER BY c.CustomerID;
Why doesn't this query produce the same result..
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
I'm looking for someone to explain the above examples in another way or step through it logically so I can understand how they work?
The aggregates in this statement (i.e. SUM and AVG) don't do anything:
SELECT CustomerID, SalesOrderID, TotalDue, AVG(TotalDue) AS AvgOfTotalDue,
TotalDue/SUM(TotalDue) * 100 AS SalePercent
FROM Sales.SalesOrderHeader
GROUP BY CustomerID, SalesOrderID, TotalDue
ORDER BY CustomerID
The reason for this is you're grouping by TotalDue, so all records in the same group have the same value for this field. In the case of AVG this means you're guarenteed for AvgOfTotalDue to always equal TotalDue. For SUM it's possible you'd get a different result, but as you're also grouping by SalesOrderID (which I'd imagine is unique in the SalesOrderHeader table) you will only have one record per group, so again this will always equal the TotalDue value.
With the CTE example you're only grouping by CustomerId; as a customer may have many sales orders associated with it, these aggregate values will be different to the TotalDue.
EDIT
Explanation of the aggregate of field included in group by:
When you group by a value, all rows with that same value are collected together and aggregate functions are performed over them. Say you had 5 rows with a total due of 1 and 3 with a total due of 2 you'd get two result lines; one with the 1s and one with the 2s. Now if you perform a sum on these you have 3*1 and 2*2. Now divide by the number of rows in that result line (to get the average) and you have 3*1/3 and 2*2/2; so things cancel out leaving you with 1 and 2.
select totalDue, avg(totalDue)
from (
select 1 totalDue
union all select 1 totalDue
union all select 1 totalDue
union all select 2 totalDue
union all select 2 totalDue
) x
group by totalDue
select uniqueId, totalDue, avg(totalDue), sum(totalDue)
from (
select 1 uniqueId, 1 totalDue
union all select 2 uniqueId, 1 totalDue
union all select 3 uniqueId, 1 totalDue
union all select 4 uniqueId, 2 totalDue
union all select 5 uniqueId, 2 totalDue
) x
group by uniqueId
Runnable Example: http://sqlfiddle.com/#!2/d41d8/21263

Summing Multiple Records by maxdate

I have a table with the following data
Bldg Suit SQFT Date
1 1 1,000 9/24/2012
1 1 1,500 12/31/2011
1 2 800 8/31/2012
1 2 500 10/1/2005
I want to write a query that will sum the max date for each suit record, so the desired result would be 1,800, and must be in one cell/row. This will ultimately be part of subquery, I am just not getting what I expect with the queries I have writtren so far.
Thanks in advance.
You can use the following (See SQL Fiddle with Demo):
select sum(t1.sqft) Total
from yourtable t1
inner join
(
select max(dt) mxdt, suit, bldg
from yourtable
group by suit, bldg
) t2
on t1.dt = t2.mxdt
and t1.bldg = t2.bldg
and t1.suit = t2.suit
; With Data As
(
Select Bldg, Suit, SQFT, Row_Number() Over (Partition By Bldg, Suit Order By Date DESC) As RowID
From YourTableNameHere
)
Select Bldg, Sum(SQFT) As TotalSQFT
From Data
Where RowId = 1
Group By Bldg

Subtract the previous row of data where the id is the same as the row above

I have been trying all afternoon to try and achieve this with no success.
I have a db in with info on customers and the date that they purchase products from the store. It is grouped by a batch ID which I have converted into a date format.
So in my table I now have:
CustomerID|Date
1234 |2011-10-18
1234 |2011-10-22
1235 |2011-11-16
1235 |2011-11-17
What I want to achieve is to see the number of days between the most recent purchase and the last purchase and so on.
For example:
CustomerID|Date |Outcome
1234 |2011-10-18 |
1234 |2011-10-22 | 4
1235 |2011-11-16 |
1235 |2011-11-17 | 1
I have tried joining the table to itself but the problem I have is that I end up joining in the same format. I then tried with my join statement to return where it did <> match date.
Hope this makes sense, any help appreciated. I have searched all the relevant topics on here.
Will there be multiple groups of CustomerID? Or only and always grouped together?
DECLARE #myTable TABLE
(
CustomerID INT,
Date DATETIME
)
INSERT INTO #myTable
SELECT 1234, '2011-10-14' UNION ALL
SELECT 1234, '2011-10-18' UNION ALL
SELECT 1234, '2011-10-22' UNION ALL
SELECT 1234, '2011-10-26' UNION ALL
SELECT 1235, '2011-11-16' UNION ALL
SELECT 1235, '2011-11-17' UNION ALL
SELECT 1235, '2011-11-18' UNION ALL
SELECT 1235, '2011-11-19'
SELECT CustomerID,
MIN(date),
MAX(date),
DATEDIFF(day,MIN(date),MAX(date)) Outcome
FROM #myTable
GROUP BY CustomerID
SELECT a.CustomerID,
a.[Date],
ISNULL(DATEDIFF(DAY, b.[Date], a.[Date]),0) Outcome
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) A
LEFT JOIN
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) B ON a.CustomerID = b.CustomerID AND A.Row = B.Row + 1