How to populate data outcome from multiple criteria within selecting rows - tsql

I am trying to select distinct rows of data from a table (in general), but also within the same select statement, I would like to select distinct rows based on multiple criteria using the same table.
-- logic is:
-- select distinct rows from a table
-- but when there are rows that have the same on all these 7 fields:
-- batch_no, batch_date, accountno, locationid, amount, depid, glenkey
-- Then, select the row that has the later whencreated column data.
-- This is query for sample data
select *
into #temp
from
(values
(56555, '2022-04-01', 48570, 111, 445.00, 217, 1877885, '2022-03-01'),
(45698, '2022-03-01', 62550, 110, 344.59, 216, 1910945, '2022-02-01'),
(45698, '2022-03-01', 62550, 110, 344.59, 216, 1910945, '2022-01-01')
)
t1
(batch_no, batch_date, accountno, locationid, amount, deptid, glenkey, whencreated)
-- logic is:
-- select distinct rows
-- but when there are rows that have the same on all these 7 fields:
-- batch_no, batch_date, accountno, locationid, amount, depid, glenkey
-- then, select the row that has the later whencreated column data.
select distinct batch_no, batch_date, accountno, locationid, amount, deptid, glenkey,
whencreated from #temp
-- Expected outcome (manually created to be used for you):
select *
INTO #temp1
from
(values
(56555, '2022-04-01', 48570, 111, 445.00, 217, 1877885, '2022-03-01'),
(45698, '2022-03-01', 62550, 110, 344.59, 216, 1910945, '2022-02-01')
)
t1
(batch_no, batch_date, accountno, locationid, amount, deptid, glenkey, whencreated)
Expected output:
Do I need to use subquery? How do I write in T-SQL?

Related

Create View with pre defined number of rows

I´ve got the following situation:
I have multiple identical formatted tables (table 1, 2, 3) with a variable amount of rows (1-100).
I want to create a View which adds the flexible amount of "table 1" rows within the first 100 rows of the view, the missing view rows to 100 rows should be filled up with the same pre-defined default row.
"Table 2" rows should then be inserted from the View row 101 and "Table 3" from 201 on ...
My first thought was to simply combine multiple UNION All queries, which would work for once, but the number of rows added after the first table (for example 90) would not be dynamic and exceed the first 100 rows of the view with a growing size of table 1.
Is it possible to "fill up" rows in a view until a specified row number?
Assuming the tables have the columns id, col1 and col2, try
(SELECT id, col1, col2
FROM (SELECT id, col1, col2, FALSE AS o
FROM table1
LIMIT 100
UNION ALL
SELECT 42, 'dummy', 'dummy', TRUE
FROM generate_series(1, 100)) t1
ORDER BY o
LIMIT 100)
UNION ALL
(SELECT id, col1, col2
FROM (SELECT id, col1, col2, FALSE AS o
FROM table2
LIMIT 100
UNION ALL
SELECT 42, 'dummy', 'dummy', TRUE
FROM generate_series(1, 100)) t2
ORDER BY o
LIMIT 100)
UNION ALL ...

PostgreSQL-filter where category is most common value

I'm attempting to use the mode() or most_common_vals() functions as a subquery criteria.
SELECT user_id, COUNT(request_id) AS total
FROM requests
WHERE category = (SELECT mode(category) AS modal_category FROM requests)
GROUP BY user_id
ORDER BY total DESC
LIMIT 5;
However, I continue to receive an error regarding the non-existence of both functions.
If I correctly understand, you need something like this:
with requests (user_id, request_id, category) as (
select 1, 111, 'A' union all
select 1, 111, 'A' union all
select 2, 111, 'A' union all
select 2, 111, 'B' union all
select 1, 111, 'B' union all
select 3, 111, 'B' union all
select 1, 111, 'B' union all
select 1, 111, 'C'
)
-- Below is actual query:
select user_id, COUNT(request_id) AS total
from (
select t.*, rank() over(order by cnt desc) as rnk from (
select requests.*, count(*) over(partition by category) as cnt from requests
) t
) tt
where rnk = 1
group by user_id
order by total desc
limit 5
Here user_id and COUNT(request_id) are calculated only for 'B' category, because it is most common in this example.
Also please note that in case, if there is multiple most common categories, this query produces result from all of those categories.

How can I SUM distinct records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:
The SQL to get this data was just SELECT *
The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
Thanks in advance!
Easy - just divide by the count:
select id, sum(total) / count(id)
from orders
group by id
See live demo.
Also handles any level of duplication, eg triplicates etc.
You can try something like this (with your example):
Table
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ... phrase, we get the distinct id and total.
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
SQLFiddle example
http://sqlfiddle.com/#!15/72639/3
Create custom aggregate:
CREATE OR REPLACE FUNCTION sum_func (
double precision, pg_catalog.anyelement, double precision
)
RETURNS double precision AS
$body$
SELECT case when $3 is not null then COALESCE($1, 0) + $3 else $1 end
$body$
LANGUAGE 'sql';
CREATE AGGREGATE dist_sum (
pg_catalog."any",
double precision)
(
SFUNC = sum_func,
STYPE = float8
);
And then calc distinct sum like:
select dist_sum(distinct id, total)
from orders
SQLFiddle
You can use DISTINCT in your aggregate functions:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id
sql fiddle
In difficult cases:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
I would suggest just use a sub-Query:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
Use below if you want the full total of each duplicate removed:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
Using subselect (http://sqlfiddle.com/#!7/cef1c/51):
select sum(total) from (
select distinct id, total
from orders
)
Using CTE (http://sqlfiddle.com/#!7/cef1c/53):
with distinct_records as (
select distinct id, total from orders
)
select sum(total) from distinct_records;

SQL alias gives "invalide column name" for Group By

I have a problem trying to make an alias for new column and using it in GROUP BY clause:
SELECT TOP 100 Percent
count(id) AS [items_by_day],
(SELECT DATEADD(dd, 0, DATEDIFF(dd, 0, [date]))) AS [date_part]
FROM [MyDB].[dbo].[MyTable]
GROUP BY DAY([date]), MONTH([date]), YEAR([date]), date_part
I get the following error:
Msg 207, Level 16, State 1, Line 5
Invalid column name 'date_part'.
How is it possible to solve the problem?
How about a subquery?
See my demo at sqlfiddle
Select Count(*) as nrOfRecords, sq.[items_by_day], sq.[date_part]
From (
SELECT TOP 100 Percent count(id) AS [items_by_day]
,(Select Dateadd(dd, 0, Datediff(dd, 0, [date]))) AS [date_part]
From [MyTable]
Group By id, date
) as sq
Group by sq.[items_by_day], sq.[date_part]
The part (SELECT DateAdd(... DateDiff(...)) seems to return the plain date. Can you explain what i am missing?
You cannot use a column alias in a GROUP BY, aliases are for display, unless when the alias is in a subquery, in this case , it becomes the column name.

Subtract the previous row of data where the id is the same as the row above

I have been trying all afternoon to try and achieve this with no success.
I have a db in with info on customers and the date that they purchase products from the store. It is grouped by a batch ID which I have converted into a date format.
So in my table I now have:
CustomerID|Date
1234 |2011-10-18
1234 |2011-10-22
1235 |2011-11-16
1235 |2011-11-17
What I want to achieve is to see the number of days between the most recent purchase and the last purchase and so on.
For example:
CustomerID|Date |Outcome
1234 |2011-10-18 |
1234 |2011-10-22 | 4
1235 |2011-11-16 |
1235 |2011-11-17 | 1
I have tried joining the table to itself but the problem I have is that I end up joining in the same format. I then tried with my join statement to return where it did <> match date.
Hope this makes sense, any help appreciated. I have searched all the relevant topics on here.
Will there be multiple groups of CustomerID? Or only and always grouped together?
DECLARE #myTable TABLE
(
CustomerID INT,
Date DATETIME
)
INSERT INTO #myTable
SELECT 1234, '2011-10-14' UNION ALL
SELECT 1234, '2011-10-18' UNION ALL
SELECT 1234, '2011-10-22' UNION ALL
SELECT 1234, '2011-10-26' UNION ALL
SELECT 1235, '2011-11-16' UNION ALL
SELECT 1235, '2011-11-17' UNION ALL
SELECT 1235, '2011-11-18' UNION ALL
SELECT 1235, '2011-11-19'
SELECT CustomerID,
MIN(date),
MAX(date),
DATEDIFF(day,MIN(date),MAX(date)) Outcome
FROM #myTable
GROUP BY CustomerID
SELECT a.CustomerID,
a.[Date],
ISNULL(DATEDIFF(DAY, b.[Date], a.[Date]),0) Outcome
FROM
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) A
LEFT JOIN
(
SELECT ROW_NUMBER() OVER(PARTITION BY [CustomerID] ORDER BY date) Row,
CustomerID,
Date
FROM #myTable
) B ON a.CustomerID = b.CustomerID AND A.Row = B.Row + 1